deepretro.utils.utils_molecule
Molecule utilities for SMILES validation, substructure matching, molecular properties, and ring detection.
Overview
The utils_molecule module provides chemistry-focused helpers used throughout the retrosynthesis pipeline:
SMILES validation — Check validity and compare molecules
Substructure matching — Query whether one molecule is a substructure of another
Molecular properties — Weight, formula, fingerprints
Validity checks — Filter LLM-proposed pathways for chemical validity and reject target-matching fragments
Ring detection — Detect 7- and 8-member rings in molecules
Function Overview
Function |
Purpose |
|---|---|
|
Check if a SMILES string parses to a valid molecule. |
|
Return 1 if query is a substructure of target, 0 otherwise. |
|
Compare two SMILES (canonical form or fingerprint). |
|
Filter LLM pathways: keep valid precursors, drop same-as-target or substructures. |
|
Molecular weight from SMILES (returns 0.0 on invalid input). |
|
Molecular formula from SMILES (returns “N/A” on invalid input). |
|
Morgan fingerprint as a bit vector list. |
|
True if molecule contains a 7-member ring. |
|
True if molecule contains an 8-member ring. |
Usage
from deepretro.utils.utils_molecule import (
is_valid_smiles,
substructure_matching,
validity_check,
calc_mol_wt,
calc_chemical_formula,
detect_seven_member_rings,
)
# Validate SMILES
assert is_valid_smiles("CCO") is True
assert is_valid_smiles("invalid!!!") is False
# Substructure check (benzene in ethylbenzene)
assert substructure_matching("CCc1ccccc1", "c1ccccc1") == 1
# Filter LLM pathways
pathways, explanations, confidence = validity_check(
molecule="c1ccccc1",
res_molecules=[["CC(=O)O", "c1ccccc1O"]],
res_explanations=["ester hydrolysis"],
res_confidence=[0.8],
)
# Molecular properties
assert calc_mol_wt("CCO") > 0
assert calc_chemical_formula("C") == "CH4"
# Ring detection
assert detect_seven_member_rings("C1CCCCCC1") is True
assert detect_seven_member_rings("C1CCCCC1") is False
API
Molecule helpers for validation, filtering, and simple descriptors.
- deepretro.utils.utils_molecule.is_valid_smiles(smiles)[source]
Check whether a SMILES string can be parsed successfully.
- Parameters:
smiles (str) – SMILES string to validate.
- Returns:
Truewhen the SMILES string parses to an RDKit molecule, otherwiseFalse.- Return type:
bool
Examples
>>> is_valid_smiles("CCO") True >>> is_valid_smiles("not_a_smiles") False
- deepretro.utils.utils_molecule.substructure_matching(target_smiles, query_smiles)[source]
Check whether a query molecule is a substructure of a target molecule.
- Parameters:
target_smiles (str) – SMILES string of the target molecule.
query_smiles (str) – SMILES string of the query molecule.
- Returns:
1if the query is a substructure of the target, otherwise0.- Return type:
int
Examples
>>> substructure_matching("CCc1ccccc1", "c1ccccc1") 1 >>> substructure_matching("CCO", "c1ccccc1") 0
- deepretro.utils.utils_molecule.validity_check(molecule, res_molecules, res_explanations, res_confidence)[source]
Filter proposed retrosynthesis pathways down to valid precursor sets.
- Parameters:
molecule (str) – Target molecule for retrosynthesis.
res_molecules (Sequence[Sequence[str] | str]) – Candidate precursor pathways returned by the model.
res_explanations (Sequence[str]) – Explanation for each candidate pathway.
res_confidence (Sequence[float]) – Confidence score for each candidate pathway.
- Returns:
Valid precursor pathways, explanations, and confidence scores. A pathway is kept only when every precursor is valid, is not identical to the target molecule, and is not a substructure of the target.
- Return type:
tuple[list[list[str]], list[str], list[float]]
Examples
>>> original_logger = validity_check.__globals__["logger"] >>> class _SilentLogger: ... def info(self, *args, **kwargs): ... pass ... ... def warning(self, *args, **kwargs): ... pass >>> validity_check.__globals__["logger"] = _SilentLogger() >>> validity_check( ... molecule="c1ccccc1", ... res_molecules=[["CCO", "CCCl"]], ... res_explanations=["valid pathway"], ... res_confidence=[0.9], ... ) ([['CCO', 'CCCl']], ['valid pathway'], [0.9]) >>> validity_check.__globals__["logger"] = original_logger
- deepretro.utils.utils_molecule.calc_mol_wt(mol)[source]
Calculate the exact molecular weight for a SMILES string.
- Parameters:
mol (str) – SMILES string of the molecule.
- Returns:
Exact molecular weight. Returns
0.0for invalid SMILES.- Return type:
float
Examples
>>> round(calc_mol_wt("CCO"), 3) 46.042 >>> round(calc_mol_wt("C"), 3) 16.031
- deepretro.utils.utils_molecule.calc_chemical_formula(mol)[source]
Calculate the molecular formula for a SMILES string.
- Parameters:
mol (str) – SMILES string of the molecule.
- Returns:
Molecular formula. Returns
"N/A"for invalid SMILES.- Return type:
str
Examples
>>> calc_chemical_formula("C") 'CH4' >>> calc_chemical_formula("CCO") 'C2H6O'
- deepretro.utils.utils_molecule.are_molecules_same(smiles1, smiles2)[source]
Check whether two SMILES strings describe the same molecule.
- Parameters:
smiles1 (str) – SMILES string of the first molecule.
smiles2 (str) – SMILES string of the second molecule.
- Returns:
Truewhen the molecules are equivalent, otherwiseFalse.- Return type:
bool
- Raises:
ValueError – If either SMILES string is invalid.
Examples
>>> are_molecules_same("CCO", "OCC") True >>> are_molecules_same("CCO", "c1ccccc1") False
- deepretro.utils.utils_molecule.compute_fingerprint(smiles, radius=2, nBits=2048)[source]
Compute a Morgan fingerprint for a molecule.
- Parameters:
smiles (str) – SMILES string of the molecule.
radius (int, optional) – Fingerprint radius, by default
2.nBits (int, optional) – Number of bits in the fingerprint, by default
2048.
- Returns:
Fingerprint bit vector as integers, or
Nonewhen the SMILES is invalid.- Return type:
list[int] | None
Examples
>>> fingerprint = compute_fingerprint("CCO", radius=2, nBits=16) >>> len(fingerprint) 16 >>> compute_fingerprint("not_a_smiles") is None True
- deepretro.utils.utils_molecule.detect_seven_member_rings(smiles)[source]
Detect whether a molecule contains a seven-membered ring.
- Parameters:
smiles (str) – SMILES string of the molecule.
- Returns:
Truewhen at least one seven-membered ring is present.- Return type:
bool
- Raises:
ValueError – If the SMILES string is invalid.
Examples
>>> detect_seven_member_rings("C1CCCCCC1") True >>> detect_seven_member_rings("C1CCCCC1") False
- deepretro.utils.utils_molecule.detect_eight_member_rings(smiles)[source]
Detect whether a molecule contains an eight-membered ring.
- Parameters:
smiles (str) – SMILES string of the molecule.
- Returns:
Truewhen at least one eight-membered ring is present.- Return type:
bool
- Raises:
ValueError – If the SMILES string is invalid.
Examples
>>> detect_eight_member_rings("C1CCCCCCC1") True >>> detect_eight_member_rings("C1CCCCCC1") False