deepretro.utils.utils_molecule

Molecule utilities for SMILES validation, substructure matching, molecular properties, and ring detection.

Overview

The utils_molecule module provides chemistry-focused helpers used throughout the retrosynthesis pipeline:

  • SMILES validation — Check validity and compare molecules

  • Substructure matching — Query whether one molecule is a substructure of another

  • Molecular properties — Weight, formula, fingerprints

  • Validity checks — Filter LLM-proposed pathways for chemical validity and reject target-matching fragments

  • Ring detection — Detect 7- and 8-member rings in molecules

Function Overview

Function

Purpose

is_valid_smiles

Check if a SMILES string parses to a valid molecule.

substructure_matching

Return 1 if query is a substructure of target, 0 otherwise.

are_molecules_same

Compare two SMILES (canonical form or fingerprint).

validity_check

Filter LLM pathways: keep valid precursors, drop same-as-target or substructures.

calc_mol_wt

Molecular weight from SMILES (returns 0.0 on invalid input).

calc_chemical_formula

Molecular formula from SMILES (returns “N/A” on invalid input).

compute_fingerprint

Morgan fingerprint as a bit vector list.

detect_seven_member_rings

True if molecule contains a 7-member ring.

detect_eight_member_rings

True if molecule contains an 8-member ring.

Usage

from deepretro.utils.utils_molecule import (
    is_valid_smiles,
    substructure_matching,
    validity_check,
    calc_mol_wt,
    calc_chemical_formula,
    detect_seven_member_rings,
)

# Validate SMILES
assert is_valid_smiles("CCO") is True
assert is_valid_smiles("invalid!!!") is False

# Substructure check (benzene in ethylbenzene)
assert substructure_matching("CCc1ccccc1", "c1ccccc1") == 1

# Filter LLM pathways
pathways, explanations, confidence = validity_check(
    molecule="c1ccccc1",
    res_molecules=[["CC(=O)O", "c1ccccc1O"]],
    res_explanations=["ester hydrolysis"],
    res_confidence=[0.8],
)

# Molecular properties
assert calc_mol_wt("CCO") > 0
assert calc_chemical_formula("C") == "CH4"

# Ring detection
assert detect_seven_member_rings("C1CCCCCC1") is True
assert detect_seven_member_rings("C1CCCCC1") is False

API