Following up my last article about retrieving data from the IUCLID 6 database[1] with Python, I have developed another Python script for pulling out IUCLID 6 substance file data via the IUCLID 6 REST API[2]. The endpoint in scope is ‘Short term toxicity to fish’, referring to IUCLID 6 chapter 6.1.1. Considering this endpoint, ECHA has already published the Manual ‘Example use of IUCLID 6 Public REST API for REACH Study Results’[3], but nevertheless the instructions given in this manual have to be implemented into a complete code section to put the data extraction into practice. Without further ado, let’s have a look and a quick walkthrough the Python script.

Like already mentioned in my last IUCLID 6 article, pre-condition is that you have IUCLID 6 installed locally or at a local server and you have login credentials for your IUCLID 6 system. Furthermore, you know the URL of your IUCLID 6 system and set it as basis string for the variable s_REST_API. Before retrieving the acute fish toxicity information from each IUCLID 6 substance file, connection to the IUCLID 6 database is enabled by the request module using your IUCLID 6 login credentials. Again, here is my advice that it is more secure to store your IUCLID 6 login credentials in a separate text file rather than hard coding them in the script. The following is the format in which I stored my credentials in a blank text file (creds.txt in the code) without any headers or prefixes like ‘Username’ or ‘Password’:

MyUserName

MyPassWord

You also must specify your working directory at the beginning so that the login credentials text file is found. The regex patterns 1 and 2 were already defined and explained in the IUCLID 6 estimated quantities script and they are still valid and useful in this code. As mentioned in the ECHA Manual[3], it is crucial to the extraction on the acute fish toxicity data to get both the UUIDs of the substance file(s) where the data is to be extracted from and the UUID(s) of its acute fish toxicity study record(s). That’s why both these regex patterns are used: The pattern1 serves to identify and match the UUID of the substance file(s) and pattern2 should match the UUID(s) of the study records on acute fish toxicity per substance file.

In the list Substances, you enter the exact IUCLID 6 substance file names you would like to retrieve the acute fish toxicity data from. That can be only three substances, but also 300 or even more – Python doesn’t ask questions about redundant tasks 😊 yay!

Subsequently all the IUCLID 6 API requests and information retrieval is done in the function def get_acute_fishtox_by_substance() which works basically the same as in the estimated quantity case. The first part in the function loops over the list of given substance files (Substances), identifies the UUID per substance file and collects the found UUID(s) in the list s_UUID_s_multi

Next, the UUID(s) of the acute fish toxicity endpoints per substance file is achieved by looping over the items in the list s_UUID_s_multi with regex pattern2. The found UUIDs are then also collected in another list named l_UUID_fish.

Based on the list l_UUID_fish, the UUIDs stored in there are used for the third request response3 to retrieve the text provided in this endpoint section per substance file. Finally, the text content is pulled out per acute fish toxicity endpoint and dumped as json data object in a simple text file per IUCLID 6 substance file. As a result, the information from the acute fish toxicity studies per substance file is stored in a separate text file.

#! python3 - getFishfromIUCLID6.py
# Connect to IUCLID6 REST API and collect all acute fish toxicity information of specific substance files,
# print it in a txt file.

import os
import re
import json
import requests
from requests.auth import HTTPBasicAuth

# Specify your working directory.
os.chdir('C:/Path/to/your/working/directory') 

# Store the login credentials separately from script for Security reasons.
with open('creds.txt') as creds:
    credentials = creds.readlines()
    user = credentials[0].strip()
    password = credentials[1].strip()
    
# API address basis for all requests.
s_REST_API = 'https://your-IUCLID-URL/iuclid6-ext/api/ext/v1'

# Regex pattern1 to parse substance UUID from string, pattern2 to get ShortTermToxicityToFish' UUIDs.
pattern1 = '([A-Z0-9\-]+)?[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}'
pattern2 = 'ShortTermToxicityToFish\/([A-Z0-9\-]+)?[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}'

# List of substance file names available for quantity collection from our IUCLID6 system.
Substances = ['Your Substancefile 1', 'Your Substancefile 2', 'Your Substancefile 3']

    
def get_acute_fishtox_by_substance(substance): 
    '''Get all available acute fish toxicity data per substance file in our IUCLID6 system.'''
    json_obj = ''
    for substance in Substances:
        found_substance_files = []
        s_fishtox = []
        
        # First, find substances by their Substance name each in our IUCLID6 database & catch their UUID each. 
        response1 = requests.get(s_REST_API + '/query/iuclid6/bySubstance?doc.type=SUBSTANCE&sub.chemical='
                    + substance,
                    auth = HTTPBasicAuth(user, password), headers={'accept':'application/vnd.iuclid6.ext+json;  
                           type=iuclid6.Document'},verify = False)
        # Load response as json string and determine length of found substance(s).
        s_reply1 = json.loads(response1.text)
        res_len_1 = len(s_reply1['results'])
        print('### size found substances:', res_len_1) # just a response check if substance(s) is found at all.
        found_substance_files.append(s_reply1['results'])
        s_UUID_s_multi = []
        try:
            for i in range(0, res_len_1):
                for found_substance in found_substance_files:
                    s_substance_UUID = re.search(pattern1, s_reply1['results'][i]['uri'])
                    s_UUID_s = s_substance_UUID.group()
                    s_UUID_s_multi.append(s_UUID_s)
            print(s_UUID_s_multi)
        except UnboundLocalError:
            print('No substance file for ' + substance + ' and respective UUID found.')
            continue

        # Second, get endpoint 'ShortTermToxicitytoFish'; request gives back all available entries for acute fish 
        # toxicity study records per substance, each one has an own UUID.
        for s_UUID_s in s_UUID_s_multi:
            response2 = requests.get(s_REST_API + '/raw/SUBSTANCE/' + s_UUID_s + 
                                     '/document/ENDPOINT_STUDY_RECORD.ShortTermToxicityToFish',
                                     auth = HTTPBasicAuth(user, password), 
                                     headers={'accept':'application/vnd.iuclid6.ext+json; 
                                     type=iuclid6.Document'}, verify = False)
            # Collect all found entries of acute fish toxicity study records in a list.
            s_reply2 = json.loads(response2.text)
            res_len_2 = len(s_reply2['results'])

            l_UUID_fish = []
            for i in range(0, res_len_2):
                s_fish_UUID = re.search(pattern2, s_reply2['results'][i]['uri'])
                s_UUID_fish = s_fish_UUID.group().replace('ShortTermToxicityToFish/', '')
                l_UUID_fish.append(s_UUID_fish)
            print(l_UUID_fish)

            # Third, for all identified substances and their associated acute fish toxicity endpoints, make 
            # response3 request to get the text per acute fish toxicity entdpoint and store it in a list. 
            for UUID in l_UUID_fish:
                response3 = requests.get(s_REST_API + '/raw/SUBSTANCE/' + s_UUID_s + 
                            '/document/ENDPOINT_STUDY_RECORD.ShortTermToxicityToFish/' + UUID,
                            auth = HTTPBasicAuth(user, password), 
                            headers={'accept':'application/vnd.iuclid6.ext+json; 
                            type=iuclid6.Document'}, verify = False)
                s_reply3 = json.loads(response3.text)
                s_fishtox.append(s_reply3)
            
            # Finally, dump all acute fishtox information related to each substance in a readable textfile. 
            json_obj = json.dumps(s_fishtox, indent = 4)
            print(len(json_obj))
            if len(json_obj) == 2:
                json_obj = 'The substance file has no data in this IUCLID section.'

            try:
                filename = 'Acute_fish_toxicity_endpoint_' + 
                substance.replace('|', '-').replace('/', '-').replace('*', '-').replace('α', 'a') 
                 + ' Substance_UUID ' + s_UUID_s + '.txt'
                with open(filename, 'w') as f:
                    f.write("\nAcute_fish_toxicity_of_" + substance + ' Substance_UUID ' + s_UUID_s + "\n\n")
                    f.write(json_obj)
                    f.close()
            except FileNotFoundError:
                print('No estimated acute fishtox data found for substance file ' + substance + '.')
                continue
            except UnicodeEncodeError:
                print('Could not write file for ' + substance + '. Please check substance file name.')

if __name__ == '__main__':
    get_acute_fishtox_by_substance(Substances)

When you compare the present script with my script on pulling out estimated quantities from IUCLID 6 substance files, they are very alike.

The major difference is the part of the URL used in the response2 and response3 requests, which is ‘/document/Flexible_RECORD.EstimatedQuantities’ in the quantities script and ‘/document/ENDPOINT_STUDY_RECORD.ShortTermToxicityToFish/’ in the present script.  The rest of the code is almost identical.

That leaves the possibility to navigate to any other URL in the IUCLID 6 substance files with the present Python script, e.g. to the endpoint ‘Long-term toxicity to fish’, ‘Toxicity to microorganisms’ or ‘Acute toxicity: oral’ etc.: For this purpose, only the URL has to be amended in both the response2 and response3 part, as well as some minor changes in the strings when writing the extracted data to a file:

For example, filename = ‘Acute_fish_toxicity_of’ is to be changed to ‘Acute_Daphnia_toxicity_of’ if this kind of data is extracted. For choosing the right URL for each specific endpoint, please refer to the IUCLID 6 Public REST API Manual containing these details. You can try it yourself with an IUCLID 6 endpoint of your interest.

If you have any questions or comments, just feel free to drop me a note below.

Literature & resources:

[1]          IUCLID 6 official website and download: https://iuclid6.echa.europa.eu/de/

[2]          IUCLID 6 Public REST API: https://iuclid6.echa.europa.eu/de/public-api

[3]          IUCLID 6 Public REST API request example: https://iuclid6.echa.europa.eu/documents/21812392/23181267/IUCLID_public_rest_api_eg_rsr.pdf/96d8f1fa-459b-1a3d-00c3-35da3a8991c0

[4]          ‘Fish’ picture source: kaori, www.pixabay.com

Categories: Python

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *