deepretro.utils.utils_molecule

Molecule utilities for SMILES validation, substructure matching, molecular properties, and ring detection.

Overview

The utils_molecule module provides chemistry-focused helpers used throughout the retrosynthesis pipeline:

  • SMILES validation — Check validity and compare molecules

  • Substructure matching — Query whether one molecule is a substructure of another

  • Molecular properties — Weight, formula, fingerprints

  • Validity checks — Filter LLM-proposed pathways for chemical validity and reject target-matching fragments

  • Ring detection — Detect 7- and 8-member rings in molecules

Function Overview

Function

Purpose

is_valid_smiles

Check if a SMILES string parses to a valid molecule.

substructure_matching

Return 1 if query is a substructure of target, 0 otherwise.

are_molecules_same

Compare two SMILES (canonical form or fingerprint).

validity_check

Filter LLM pathways: keep valid precursors, drop same-as-target or substructures.

calc_mol_wt

Molecular weight from SMILES (returns 0.0 on invalid input).

calc_chemical_formula

Molecular formula from SMILES (returns “N/A” on invalid input).

compute_fingerprint

Morgan fingerprint as a bit vector list.

detect_seven_member_rings

True if molecule contains a 7-member ring.

detect_eight_member_rings

True if molecule contains an 8-member ring.

Usage

from deepretro.utils.utils_molecule import (
    is_valid_smiles,
    substructure_matching,
    validity_check,
    calc_mol_wt,
    calc_chemical_formula,
    detect_seven_member_rings,
)

# Validate SMILES
assert is_valid_smiles("CCO") is True
assert is_valid_smiles("invalid!!!") is False

# Substructure check (benzene in ethylbenzene)
assert substructure_matching("CCc1ccccc1", "c1ccccc1") == 1

# Filter LLM pathways
pathways, explanations, confidence = validity_check(
    molecule="c1ccccc1",
    res_molecules=[["CC(=O)O", "c1ccccc1O"]],
    res_explanations=["ester hydrolysis"],
    res_confidence=[0.8],
)

# Molecular properties
assert calc_mol_wt("CCO") > 0
assert calc_chemical_formula("C") == "CH4"

# Ring detection
assert detect_seven_member_rings("C1CCCCCC1") is True
assert detect_seven_member_rings("C1CCCCC1") is False

API

Molecule helpers for validation, filtering, and simple descriptors.

deepretro.utils.utils_molecule.is_valid_smiles(smiles)[source]

Check whether a SMILES string can be parsed successfully.

Parameters:

smiles (str) – SMILES string to validate.

Returns:

True when the SMILES string parses to an RDKit molecule, otherwise False.

Return type:

bool

Examples

>>> is_valid_smiles("CCO")
True
>>> is_valid_smiles("not_a_smiles")
False
deepretro.utils.utils_molecule.substructure_matching(target_smiles, query_smiles)[source]

Check whether a query molecule is a substructure of a target molecule.

Parameters:
  • target_smiles (str) – SMILES string of the target molecule.

  • query_smiles (str) – SMILES string of the query molecule.

Returns:

1 if the query is a substructure of the target, otherwise 0.

Return type:

int

Examples

>>> substructure_matching("CCc1ccccc1", "c1ccccc1")
1
>>> substructure_matching("CCO", "c1ccccc1")
0
deepretro.utils.utils_molecule.validity_check(molecule, res_molecules, res_explanations, res_confidence)[source]

Filter proposed retrosynthesis pathways down to valid precursor sets.

Parameters:
  • molecule (str) – Target molecule for retrosynthesis.

  • res_molecules (Sequence[Sequence[str] | str]) – Candidate precursor pathways returned by the model.

  • res_explanations (Sequence[str]) – Explanation for each candidate pathway.

  • res_confidence (Sequence[float]) – Confidence score for each candidate pathway.

Returns:

Valid precursor pathways, explanations, and confidence scores. A pathway is kept only when every precursor is valid, is not identical to the target molecule, and is not a substructure of the target.

Return type:

tuple[list[list[str]], list[str], list[float]]

Examples

>>> original_logger = validity_check.__globals__["logger"]
>>> class _SilentLogger:
...     def info(self, *args, **kwargs):
...         pass
...
...     def warning(self, *args, **kwargs):
...         pass
>>> validity_check.__globals__["logger"] = _SilentLogger()
>>> validity_check(
...     molecule="c1ccccc1",
...     res_molecules=[["CCO", "CCCl"]],
...     res_explanations=["valid pathway"],
...     res_confidence=[0.9],
... )
([['CCO', 'CCCl']], ['valid pathway'], [0.9])
>>> validity_check.__globals__["logger"] = original_logger
deepretro.utils.utils_molecule.calc_mol_wt(mol)[source]

Calculate the exact molecular weight for a SMILES string.

Parameters:

mol (str) – SMILES string of the molecule.

Returns:

Exact molecular weight. Returns 0.0 for invalid SMILES.

Return type:

float

Examples

>>> round(calc_mol_wt("CCO"), 3)
46.042
>>> round(calc_mol_wt("C"), 3)
16.031
deepretro.utils.utils_molecule.calc_chemical_formula(mol)[source]

Calculate the molecular formula for a SMILES string.

Parameters:

mol (str) – SMILES string of the molecule.

Returns:

Molecular formula. Returns "N/A" for invalid SMILES.

Return type:

str

Examples

>>> calc_chemical_formula("C")
'CH4'
>>> calc_chemical_formula("CCO")
'C2H6O'
deepretro.utils.utils_molecule.are_molecules_same(smiles1, smiles2)[source]

Check whether two SMILES strings describe the same molecule.

Parameters:
  • smiles1 (str) – SMILES string of the first molecule.

  • smiles2 (str) – SMILES string of the second molecule.

Returns:

True when the molecules are equivalent, otherwise False.

Return type:

bool

Raises:

ValueError – If either SMILES string is invalid.

Examples

>>> are_molecules_same("CCO", "OCC")
True
>>> are_molecules_same("CCO", "c1ccccc1")
False
deepretro.utils.utils_molecule.compute_fingerprint(smiles, radius=2, nBits=2048)[source]

Compute a Morgan fingerprint for a molecule.

Parameters:
  • smiles (str) – SMILES string of the molecule.

  • radius (int, optional) – Fingerprint radius, by default 2.

  • nBits (int, optional) – Number of bits in the fingerprint, by default 2048.

Returns:

Fingerprint bit vector as integers, or None when the SMILES is invalid.

Return type:

list[int] | None

Examples

>>> fingerprint = compute_fingerprint("CCO", radius=2, nBits=16)
>>> len(fingerprint)
16
>>> compute_fingerprint("not_a_smiles") is None
True
deepretro.utils.utils_molecule.detect_seven_member_rings(smiles)[source]

Detect whether a molecule contains a seven-membered ring.

Parameters:

smiles (str) – SMILES string of the molecule.

Returns:

True when at least one seven-membered ring is present.

Return type:

bool

Raises:

ValueError – If the SMILES string is invalid.

Examples

>>> detect_seven_member_rings("C1CCCCCC1")
True
>>> detect_seven_member_rings("C1CCCCC1")
False
deepretro.utils.utils_molecule.detect_eight_member_rings(smiles)[source]

Detect whether a molecule contains an eight-membered ring.

Parameters:

smiles (str) – SMILES string of the molecule.

Returns:

True when at least one eight-membered ring is present.

Return type:

bool

Raises:

ValueError – If the SMILES string is invalid.

Examples

>>> detect_eight_member_rings("C1CCCCCCC1")
True
>>> detect_eight_member_rings("C1CCCCCC1")
False