Abstract

Per- and polyfluoroalkyl substances, commonly known as PFAS, are a group of thousands of synthetic chemicals manufactured since the 1940s [1, 2]. Their carbon-fluorine bonds, among the strongest single bonds in organic chemistry, with a bond dissociation energy of approximately 485 kJ/mol [3], make them exceptionally resistant to heat, water, and oil, earning them the nickname “forever chemicals”: once released into the environment, they persist for decades or longer [4]. Regulators worldwide are tightening controls, and anyone working with fluorinated compounds needs reliable tools to determine whether a substance falls under the evolving PFAS definitions.

This is a perfect setup for my latest coding project, building a PFAS Screening Tool that is easy to use and can screen a list of substances if they contain PFAS and if so, which kind of PFAS. This article describes the new desktop screening tool built in Python that classifies molecules as EU & OECD PFAS, OECD-only PFAS, or not PFAS — applicable for entire inventories in batch mode, with CAS number resolution and embedded structure reports.

Why Two PFAS Definitions?

Two major regulatory frameworks define what counts as PFAS.

The OECD 2021 revised definition [5] classifies a substance as PFAS if it contains at least one fully fluorinated methyl (CF3) or methylene (CF2) carbon atom. “Fully fluorinated” means the carbon is sp3-hybridised and carries no hydrogen, chlorine, bromine, or iodine.

The ECHA restriction proposal [6] adopts the same structural criterion but adds exclusions for isolated, degradable moieties. A CF3 group bonded to oxygen or nitrogen (CF3-OR or CF3-NRR’), or a CF2 group flanked on both sides by {O, N, S} atoms, may qualify as OECD PFAS but not EU PFAS, provided the R groups are non-fluorinated. The distinction matters for regulatory compliance, product registration, and supply chain management.

The Classification Engine

At the heart of the tool is the classification engine, implemented as Section 1 of the single-file application pfas_screening_app.py. It uses a single external dependency: RDKit [7], the open-source cheminformatics toolkit. It accepts a SMILES string and returns a classification via atom-level molecular inspection, iterating through every atom and asking:

  • Is this carbon sp3-hybridised (saturated)?
  • Does it carry zero hydrogen atoms?
  • Does it have at least two fluorine neighbours?
  • Are any neighbours hydrogen, chlorine, bromine, or iodine?

A carbon passing all checks is a fully fluorinated carbon. If the molecule contains at least one, it is OECD PFAS.

The EU layer is more nuanced. Every fully fluorinated carbon is evaluated individually: the EU exclusion applies only when each one independently qualifies. A CF3 group qualifies if its non-fluorine neighbour is O or N and the attached R group contains no sp3 C-F bonds and no sp2 carbons bearing two or more fluorines. Aromatic rings are unconditionally accepted as R, per the EU definition’s explicit inclusion of “aromatic groups” [6]. For CF2, both flanking non-fluorine neighbours must be heteroatoms (O, N, or S); direct carbon neighbours do not qualify.

For molecules with multiple fully fluorinated carbons, such as CF3-O-aryl-CF2-O-CH3, a breadth-first walk with a sibling-shielding test determines whether each carbon’s neighbours should be evaluated independently or together. This prevents both false exemptions and false positives on complex multi-center fluorinated scaffolds, including fluorinated agrochemicals and pharmaceutical intermediates.

This atom-by-atom approach directly implements the regulatory text. The hybridisation check correctly excludes sp2 carbons: tetrafluoroethylene (the Teflon monomer) has four fluorines but both carbons are sp2-hybridised, so it is correctly classified as not PFAS under the OECD definition [5].

CAS Number Resolution

The tool also accepts CAS numbers, resolved to canonical SMILES via the CAS lookup engine (Section 3 of pfas_screening_app.py), which queries PubChem [8] as its primary source and falls back to the NCI CACTUS resolver if PubChem cannot find an entry. Each lookup uses a single REST request; transient network errors on PubChem trigger up to three retry attempts with exponential backoff. Genuine not-found responses skip retrying and proceed immediately to the CACTUS fallback. The module also handles corporate proxy environments by falling back to an unverified SSL context if needed.

The Batch GUI

The batch GUI (PFAS_BatchScreening, Section 4 of pfas_screening_app.py) handles screening of entire inventories. The user selects an input file — either a plain text, CSV, SMI, or Excel file — and clicks Run Screening. Results populate a scrollable list, colour-coded pink (EU & OECD PFAS), blue (OECD-only), neutral (non-PFAS), or red (error). Selecting any entry shows the full classification and a rendered 2D molecular structure.

Input type is selected via radio buttons: Auto-detect (the default), SMILES, or CAS numbers. In auto-detect mode, the tool examines the first ten entries against a CAS regex pattern to choose the appropriate pipeline. Excel files receive additional handling: columns named SMILES, CAS, and ID/NAME/IDENTIFIER are detected automatically. When both SMILES and CAS columns are present, the GUI enters dual mode — rows with SMILES are screened directly, while CAS-only rows are queued for PubChem resolution.

Processing is non-blocking throughout. SMILES classification uses tkinter’s after() method, one molecule per event-loop cycle. CAS-only mode paces lookups at 800 ms intervals to respect PubChem’s rate limit. In dual mode, CAS resolution runs in a background thread; results are pushed to a thread-safe queue and polled by the main thread every 300 ms, so the interface stays responsive as results arrive.

The Create Report button generates an Excel file with embedded 2D structure images. For Excel input, all original columns are preserved alongside SMILES, Classification, and the structure image. For text/CSV/SMI input, the report contains ID, optionally CAS, SMILES, Classification, and the structure image. Temporary PNG files are cleaned up after the report is written.

Design Decisions Worth Noting

Single-file deployment. All logic — the classification engine, tooltip helper, CAS lookup, and GUI — is consolidated into pfas_screening_app.py. There are no runtime imports from sibling modules. The file can be bundled into a standalone Windows executable using PyInstaller (pyinstaller pfas_screening_app.spec) without any packaging adjustments.

Per-atom EU exemption evaluation. Each fully fluorinated carbon is assessed independently against the EU exemption criteria [6]. This faithfully implements the regulatory text and correctly handles complex scaffolds that a simple count-based rule would misclassify.

Conservative CF2 flanking rule. Only heteroatom flanking groups (O, N, S) qualify for the CF2 exemption — carbon neighbours do not. Ph-CF2-Ph is therefore EU PFAS even though both flanking groups are sp2. This avoids underclassifying diaryl-CF2 compounds that lack the degradable character the EU exclusion is intended to capture.

Non-blocking architecture. Single-mode CAS resolution and SMILES classification both use the event loop. Dual-mode CAS resolution offloads network I/O to a background thread with queue-based polling, so slow PubChem responses for one entry do not block display of results already available.

Testing and Validation

The classification engine includes 23 built-in test cases covering: CF3 bonded to sp3 carbon (EU & OECD PFAS); CF3-O-CH3 (OECD-only); CF3 with Br or Cl (not PFAS) [5]; CF3-O-CF2-CF3 (EU & OECD, R carries CF2); CF3-O-CHF2 (EU & OECD); CF3-O-phenyl and CF3-OH (both OECD-only); Ph-CF2-Ph (EU & OECD, flanking C atoms); CH3-O-CF2-O-CH3 (OECD-only); CF3-O-CF2-O-CH3 (EU & OECD); complex fluorinated agrochemical scaffolds with multiple CF3-O-aryl and CF2-O-aryl groups (OECD-only, every FF carbon individually exempt [6]); and multi-centre fluorinated chains (EU & OECD PFAS). These can be run by executing the standalone classification engine section directly.

Looking Ahead

For desktop and enterprise use, the tool correctly implements both the OECD 2021 [5] and EU ECHA [6] PFAS definitions via atom-level inspection, supports SMILES and CAS input through a responsive non-blocking GUI, handles Excel inventories with mixed SMILES and CAS columns, and produces structured Excel reports with embedded structures. The consolidated single-file architecture makes it straightforward to deploy as a standalone Windows executable. For teams managing chemical inventories under tightening regulations, it provides a fast, transparent, and auditable first screen.


Try it yourself in Python

Download the PFAS Screening Tool Python script from its Bitbucket repository here [9]: PFAS Screening Tool

Prerequisites

Python Packages

PackagePurpose
rdkitCore cheminformatics — SMILES parsing and molecular inspection
Pillow (PIL)Loading rendered molecule images into the GUI
pandasReading Excel input files in batch mode
openpyxlExcel report generation with embedded structure images
XlsxWriterExcel report generation in single-molecule mode

Standard Library (no install needed)

  • tkinter — GUI framework, ships with most Python distributions

Installation

There is no requirements.txt, so install manually:

pip install rdkit Pillow pandas openpyxl XlsxWriter

Note: rdkit is best installed via conda if you run into issues with pip:

conda install -c conda-forge rdkit

Other Requirements

  • Python 3.x (RDKit requires Python 3)
  • Internet access — only needed in batch mode when resolving CAS numbers via the PubChem API
  • Corporate proxy / SSL: if behind a corporate firewall, pubchem_lookup.py attempts to handle SSL certificate issues automatically via create_ssl_context()

Performing the PFAS Screening

When the script is executed, you will see the following GUI:

Picture 1: GUI of the PFAS Screening Script

Then you go through the following steps to perform the screening:

Prepare an input file — supported formats: .txt.csv.smi, or Excel (.xlsx/.xls).

Each row in the input file should contain one SMILES code or one CAS number.

Click Load File and select your file.

Input type is auto-detected (SMILES vs. CAS). You can override this with the radio buttons if needed.

Click Start Screening.

CAS mode: the tool first resolves each CAS number to a SMILES via the PubChem API (~800 ms per entry). Progress is shown while it runs.

SMILES mode: classification runs immediately.

Input Format Tips

Input typeExample
SMILESFC(F)(F)C(F)(F)F
CAS number335-67-1 (PFOA)

For batch files, one entry per line is sufficient. Column headers are optional.

Interpreting Results

The classification is based on atom-level inspection of fully fluorinated carbon groups (CF₂, CF₃ with no H, Cl, Br, or I attached):

  • EU & OECD PFAS — contains at least one such group, and it is not an isolated degradable moiety under EU rules.
  • OECD-only PFAS — contains such a group, but the EU exclusion applies (e.g. a single CF₃ connected via O or N).
  • Not PFAS — no qualifying fully fluorinated carbon found.

Results appear in the list on the left, color-coded by category (see Picture 2).

Picture 2: Screening result display view 1

Click any entry to see the classification detail and render its structure on the right (see Picture 3).

Picture 3: Screening result display view 2

Click Export Report to save an Excel file with all results and embedded structure images. The resulting Excel report looks like shown in Picture 4.

Picture 4: Excel report file from screening result

You can now delete the screening result by clicking the button Clear, enabling another screening with a new CAS/SMILES-containing list. The resulting Excel report is perfect for further analysis and review of the screening results separately from the tool. Please note also the mentioned input format tips above.



Literature & Weblinks

[1] Gaines, L.G.T. (2023). Historical and current usage of per- and polyfluoroalkyl substances (PFAS): A literature review. American Journal of Industrial Medicine, 66(5), 353-378. https://doi.org/10.1002/ajim.23362

[2] Gluge, J., Scheringer, M., Cousins, I.T., et al. (2020). An overview of the uses of per- and polyfluoroalkyl substances (PFAS). Environmental Science: Processes & Impacts, 22(12), 2345-2373.
https://doi.org/10.1039/D0EM00291G

[3] Garg, N.K. et al. (2024). C-F Bond Insertion: An Emerging Strategy for Constructing Fluorinated Molecules. Chemistry — A European Journal, 30(9). https://doi.org/10.1002/chem.202304229

[4] Cousins, I.T., DeWitt, J.C., Gluge, J., et al. (2023). Forever chemicals: the persistent effects of perfluoroalkyl and polyfluoroalkyl substances on human health. eBioMedicine, 95, 104806.
https://doi.org/10.1016/j.ebiom.2023.104806

[5] OECD (2021). Reconciling Terminology of the Universe of Per- and Polyfluoroalkyl Substances: Recommendations and Practical Guidance. ENV/CBC/MONO(2021)25.
https://one.oecd.org/document/ENV/CBC/MONO(2021)25/En/pdf

[6] ECHA (2023). ANNEX XV Restriction Report — Per- and polyfluoroalkyl substances (PFASs). Submitted by Denmark, Germany, the Netherlands, Norway, and Sweden under REACH. European Chemicals Agency, Helsinki. https://echa.europa.eu/documents/10162/f605d4b5-7c17-7414-8823-b49b9fd43aea

[7] Landrum, G. RDKit: Open-Source Cheminformatics Software. https://www.rdkit.org (RRID: SCR_014274)

[8] Kim, S., Chen, J., Cheng, T., et al. (2023). PubChem 2023 update. Nucleic Acids Research, 51(D1), D1373-D1380. https://doi.org/10.1093/nar/gkac956

[9] Source code available on Bitbucket at https://bitbucket.org/nomapps/pfas_screening_tool

[Image] Cover picture generated with GPT Image 1 mini