What is your impression: Do you think that we know every hazard of all chemical substances used anywhere in our daily lives? I guess you can imagine that this is not the case, meaning we neither fully know how many of the chemicals we use behave in the environment nor if they are harmful or even toxic to humans, animals, or plants. For a lot of substances, there is sufficient test data available where we can evaluate their toxicological and ecotoxicological profile, however, in some cases we do not have sufficient or any test data of a chemical at hand. In such cases, nevertheless, we would like to estimate – based on the molecule’s structure described by its SMILES code – their physicochemical, (eco-)toxicological and environmental fate properties. The good thing is that there are several Quantitative Structure–Activity Relationship, in short QSAR[1] Tools available that enable calculation of such properties of a molecule, giving us at least a first idea whether a substance is e.g. readily biodegradable or not.

For example, the Estimation Programs Interface Suite, in short EPI Suite[2], is a software developed by the U.S. EPA for estimation of various substance properties, especially for physicochemical and environmental endpoints. Typical physicochemical endpoints of a substance are its melting point, boiling point, vapor pressure, partition coefficient, and water solubility. Considering environmental fate, biodegradation as well as hydrolysis behavior are common properties of interest for a new and untested molecule.

Details on QSAR modules are beyond the scope of this article. I just want to demonstrate a way in Python to use EPI Suite for batch-wise calculations for a set of substances without manually entering the required information in many redundant steps. Thus, in the following article, I will focus on the automated calculation of the biodegradation behavior of various chemical substances in my Python-based solution, using their SMILES codes as input for the each respective calculation. The principle how to do this can be certainly amended for any other property offered for calculation in EPI Suite, for example, adsorption coefficient, or partition coefficient of a substance. The calculation of the property itself will be done in EPI Suite, for biodegradation in its module named BIOWIN. You see a screen print of the BIOWIN module from EPI Suite in Picture 1:

Picture 1: BIOWIN module of EPI Suite used for biodegradation estimation

The handling of the EPI Suite software, respectively its BIOWIN module, will be done by a Python script: The key library used here is pyautogui[3], which enables to mimic a user opening the EPI Suite, navigating to the desired calculation module BIOWIN, and entering the SMILES code of the substance in question. Starting the calculation, copying the result to a Word document, and saving it is also done automatically by pyautogui. All this is done for a given list of SMILES codes hard-coded in a Python script – the bot – which induces calculation in EPI Suite, copies and saves all calculation results to a Word document.

The prerequisites for this automated calculation to be working is that both Microsoft Word (Version Office 365) and EPI Suite v4.11 are installed locally on your computer and that the icons of EPI Suite and Microsoft Word are pinned to the Windows taskbar with a defined and fixed position (e.g. see Picture 2). Of course, you need also Python 3 (including pyautogui and pyperclip[4]) installed to run this script and you will need two monitors for managing both the script and EPI Suite windows (screen 1) and the Word document (screen 2).

Picture 2: Fixed positions of EPI Suite and Word on taskbar

Before going into the code, one important note here is that EPI Suite is a bit picky with SMILES codes: All SMILES codes subject to calculation must be in a format that EPI Suite accepts, although they may be written correctly as e.g. canonical SMILES and accepted in other databases. Unfortunately, EPI Suite does not accept the following types of SMILES codes, such as disconnected SMILES separated by a period (“.”). Details on such technical issues with some SMILES codes are described in the help function of EPI Suite as displayed in Picture 3:

Picture 3: Information in EPI Suite on disconnected and isomeric SMILES

Care must also be taken if a SMILES code contains square brackets [] or charges like e.g., +2 or -3. In such cases, the SMILES code must be translated into a different writing first (without brackets and charges like +2 or -3), otherwise calculation on this SMILES code is not executed, and an error message appears in EPI Suite.

Besides these technical limitations for some SMILES codes not being accepted by EPI Suite, SMILES codes that are not correctly written, e.g., where the ring closure is not correct or invalid bonds like ‘C-V-CCC’ are erroneously contained, will not be calculated. Thus, the user must check such errors in the SMILES codes before using the bot, otherwise the calculation breaks. A helpful website to check if the SMILES codes are correct is the SMILES generator/checker[5].

When above mentioned prerequisites are fulfilled – i.e. the SMILES codes are correct, EPI Suite and Word are installed and fixed to the Windows taskbar – the calculation with the bot can start.

So what does the bot look like?

It is a simple Python script based on pyautogui (see Script 1). We simply start with import of the relevant libraries, that is os, time, pyautogui and pyperclip. The latter one is needed to keep certain characters unchanged that are part of some SMILES codes like ‘#’ when copying them to EPI Suite: Otherwise copy-paste errors can occur so that the copied SMILES code is altered and is incorrect then, leading to an error message in EPI Suite and breaking the calculation.

Script 1: Bot script to open BIOWIN, perform calculation and writing the results to Excel

#! python3 - EPIBot_biodegradation.py

"""Simple bot opening EPI Suite BIOWIN program, copies given SMILES codes into
the calculation window, executes biodegradation calculation and copies results to a Word document.
Caution: This script is optimized for my individual PC desktop and location on icons, thus may have to be adjusted pixel-wise for another desktop. Furthermore, EPI Suite does not accept SMILES codes with square brackets []. Such SMILES codes must be translated into a different writing of the SMILES codes first (without brackets and charges like +2 or -3), otherwise calculation is not executed."""

import os
import time
import pyautogui
import pyperclip

os.chdir('C:/Users/a-kel/OneDrive/Desktop')

# Test data set of 3 SMILES codes, i.e. SMILES of psylocybin, morphine, N-methamphetamine
SMILES_codes =  ['C(CN(C)C)C=1C=2C(NC1)=CC=CC2O', 
                'OC1C2C34C=5C(O2)=C(O)C=CC5CC(C3C=C1)N(C)CC4', 
                'C(C(NC)C)C1=CC=CC=C1']

def calculate_biodegradation(SMILES):
    '''
Function moving over the main monitor, opens EPIWIN and performs biodegradation calculations.
    '''
    if os.path.isfile("C:/Users/a-kel/OneDrive/Dokumente/SMILES.docx"):
        print("Word file already exists")
        os.startfile("C:/Users/a-kel/OneDrive/Dokumente/SMILES.docx")
        time.sleep(5)
    else:
        '''
        To be executed only once, initiating repetitive SMILES Codes calculation.
        '''
        print('Start first calculation with dummy SMILES.')
        pyautogui.moveTo(2479, 2124) # Go to EPI icon in the taskbar and click it
        pyautogui.click() 
        time.sleep(3)          
        # EPISuite has opened
        pyautogui.moveTo(135, 329) # Move to the BIOWIN-BUTTON and click it
        pyautogui.click() 
        time.sleep(3) 
        pyautogui.write('CCCCCC') # Initiate calculation with dummy SMILES
        pyautogui.moveTo(733, 276) # Move to Calculate button and click it
        pyautogui.click()
        time.sleep(2)
        pyautogui.hotkey('ctrl', 'c') # copy biodegradation calculation
        time.sleep(2)
        pyautogui.moveTo(1981, 1142) # Move to dialog window 'OK'
        time.sleep(3)
        pyautogui.click() # actually click 'OK'´, data is saved to clipboard
        # Move to Word icon in the task bar, open a Word Doc and paste info
        pyautogui.moveTo(2385, 2122)
        pyautogui.click() # click the Word icon and wait 5 seconds
        time.sleep(5)
        pyautogui.moveTo(-2476, 156)
        pyautogui.click()
        time.sleep(2)
        pyautogui.hotkey('ctrl', 'v') # paste the calculation result in Word Doc
        time.sleep(2) # wait 2 s, then add 5 'Enter' as separator
        for i in range(0,5):
            pyautogui.hotkey('enter')
        pyautogui.hotkey('ctrl', 's') # Save WordDoc and wait 3 s 
        time.sleep(3) 
        pyautogui.moveTo(-1363, 870) # save under a Name, here default 
        #'SMILES.docx' in working directory
        pyautogui.click()
        time.sleep(3)
        print('Finished first calculation.')
        print('Start all calculations.')
  '''
  To be executed for all SMILES codes in the list after the first 
  calculation has initialized the Word document containing all calculation 
  results.
  '''
    for SMILES_code in SMILES_codes:
        # Now move to the next SMILES entry and repeat the calculation and    
        # writing to Word steps for all SMILES codes in the list SMILES_codes.
        pyautogui.moveTo(2479, 2124) # Go to EPI icon in the taskbar and click it
        pyautogui.click() 
        time.sleep(3)          
        # EPISuite has opened
        pyautogui.moveTo(135, 329)
        pyautogui.click() 
        time.sleep(3) 
        pyperclip.copy(SMILES_code) # pyperclip required for keeping format of 
        # certain chars like # in SMILES code
        pyautogui.hotkey("ctrl", "v")
        pyautogui.moveTo(733, 276)
        pyautogui.click()
        time.sleep(3)
        pyautogui.hotkey('ctrl', 'c')
        pyautogui.moveTo(1981, 1142) 
        pyautogui.click()        
        # Move to Word and open a Word Document for pasting information
        pyautogui.moveTo(2385, 2122) 
        pyautogui.click()
        time.sleep(3)
        pyautogui.hotkey('ctrl', 'v')
        time.sleep(3)
        for i in range(0,5):
            pyautogui.hotkey('enter')
        pyautogui.hotkey('ctrl', 's') # Save WordDoc
    # Information displayed when all calculations are finished
    print('I have finished all biodegradation calculations!')
    pyautogui.alert('I have finished all biodegradation calculations!')

if __name__ == '__main__':
    biodegradation_list = calculate_biodegradation(SMILES_codes)

Then, we enter the SMILES codes subject to calculation via EPI Suite in the list ‘SMILES_codes’. The example list contains only three example SMILES codes of psychotropic, organic substances, but you can also extend this list to 50 or even 100s of SMILES codes. There is no limit in number how many SMILES codes can be calculated by the bot in one batch, but from practical experience I would recommend setting a limit at 40 to 50 SMILES codes to be calculated in a batch. The reason is that if the calculation breaks, maybe because of one incorrect SMILES code in the list, you do not have to restart the whole calculation on a huge list from scratch, but only with a smaller list, and the error cause can be identified quicker as well. Unfortunately, I did not find an appropriate way to interrupt the whole script when things go wrong. Trying to use pyautogui.FAILSAFE = True did not succeed, also KeyboardInterrupt with pressing ctrl + c could not stop the script. The workaround I chose is to start Script 1 in debug mode which enables interruption of the script at any time.

With having the SMILES codes list set up correctly, we look at the main function def calculate_biodegradation(SMILES): At first, it is checked if a Word file for receiving the calculation results exists in the working directory or not. If this is not the case, a first calculation in BIOWIN for n-hexane (‘CCCCCC’) as dummy is performed. Moving the position of the mouse on the screen is always done by pyautogui.moveTo() function, clicking achieved with pyautogui.click() and hotkeys from the keyboard are simulated with the function pyautogui.hotkey(). The crucial thing is that the position in the braces like, e.g., pyautogui.moveTo(135, 329) is exactly the pixel position (given as tuple with x, y position) of the mouse cursor on an individual monitor.  These exact positions must be determined first for each monitor settings to give the exact position of each icon and button to be clicked on the screen and in BIOWIN, otherwise the script fails, and the mouse moves over the screen without hitting its targets. If this weird behavior appears, the script should be stopped immediately in Python. Here is an example script for Python 3, taken from the pyautogui documentation that will constantly print out the position of the mouse cursor (see Script 2):

Script 2: Tracking the mouse cursor position (x,y coordinates)

#! python3
import pyautogui, sys
print('Press Ctrl-C to quit.')
try:
    while True:
        x, y = pyautogui.position()
        positionStr = 'X: ' + str(x).rjust(4) + ' Y: ' + str(y).rjust(4)
        print(positionStr, end='')
        print('\b' * len(positionStr), end='', flush=True)
except KeyboardInterrupt:
    print('\n')

Back to our bot script (Script 1): You can follow along the comments in the code step by step in the main function to see the automated movements induced by pyautogui. When the first calculation with the dummy SMILES code is completed, the mouse is moved to the Word icon in the taskbar, opens a new Word document and the calculation results from BIOWIN are copied to the clipboard and pasted in the document. The Word document is then saved with pyautogui.hotkey('ctrl', 's').

As you see in these steps, the time module is used to manage required delays for each step, e.g., it takes some seconds until the Word document opens and thus, the script has to wait until for a defined time period (achieved with time.sleep()) before continuing with the next step. When the first calculation is completed and the Word document (named per default as “SMILES.docx”), all other SMILES codes are subjected to calculation of the biodegradation.

This is done in the for loop named for SMILES_code in SMILES_codes: Basically, the steps from the first calculation cycle with the dummy SMILES code are repeated. The for loop iterates over the given list, copies each SMILES code to the calculation window of BIOWIN and clicks the calculate button. The calculation result is then copied to the clipboard, pasted in the Word document (which is opened initially, if it already exists but is not open yet). Then, pyautogui virtually presses the keyboard’s “Enter” button five times in the Word document after pasting to create sufficient space between two calculations and saves the file afterwards. When the whole SMILES_codes list is used up for the calculations, the message “I have finished all biodegradation calculations!” is printed to the Python console and in a pop-up window on the main screen as well. The Word file with all calculations is still open for immediate review, but also saved after pasting the last calculation results. Here is the demo video of the bot in action:

Figure 1: Demo video of the bot

The script works fine if all SMILES codes are correct, and the mouse positions are given exactly and correctly for the individual screen. This enables the user to start the script and let the bot do all the calculations (thanks to pyautogui), in combination with EPI Suite, while the user can grab some coffee and wait for the results to be ready and summarized in a Word document 😊.

LITERATURE AND REFERENCES:

[1] QSAR def.: https://en.wikipedia.org/wiki/Quantitative_structure%E2%80%93activity_relationship

[2] EPI Suite: EPI Suite™-Estimation Program Interface | US EPA

[3] pyautogui documentation: https://pyautogui.readthedocs.io/en/latest/

[4] pyperclip documentation: https://pyperclip.readthedocs.io/en/latest/

[5] SMILES Code Checker/Generator: http://www.cheminfo.org/flavor/malaria/Utilities/SMILES_generator___checker/index.html).

Cover picture with the courtesy of www.pixabay.com (author: kiquebg)


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *