Stability Checker

Heuristic stability checker for molecules in retrosynthetic pathways.

When an LLM proposes reactant molecules, some of them may be chemically unstable — strained small rings, anti-aromatic systems, reactive intermediates like carbocations or carbenes, etc. This module catches those problems by inspecting the molecular graph with RDKit descriptors and SMARTS pattern matching. No trained model is required.

from deepretro.algorithms import check_molecule_stability

result = check_molecule_stability("c1ccccc1")
print(result["assessment"])       # "Likely stable"
print(result["stability_score"])  # 100

Checks performed

  1. Strained small rings — 3- or 4-membered rings that contain a heteroatom (N, O, S …) are flagged. Aziridines and azetidines are significantly more strained than plain cyclopropane / cyclobutane.

  2. Anti-aromatic motifs — rings whose π-electron count is a multiple of 4 (Hückel 4n rule) are thermodynamically destabilised. Known patterns (cyclobutadiene, cyclooctatetraene, pentalene) are matched via SMARTS, and π electrons are also counted directly for rings of size 4, 8, 12, or 16.

  3. Fused small rings — two rings of ≤ 4 atoms sharing atoms create extreme angle strain (e.g. bicyclo[1.1.0]butane). These systems can be explosive.

  4. Large heterocycles — rings with ≥ 7 atoms containing heteroatoms tend to be conformationally floppy and often unstable. Very large heterocycles (> 10 atoms, ≥ 3 heteroatoms) get an extra penalty.

  5. Carbocations — positively charged carbon centres are reactive intermediates, not isolable species. The checker distinguishes:

    • sp2 ([C+;X3]) — penalised unless stabilised by an adjacent aromatic ring, allylic double bond, or benzylic position.

    • sp ([C+;X2]) — always penalised heavily.

    • Primary vs secondary — primary is worse because fewer alkyl groups donate electron density.

    • Adjacent to EWG — a carbocation next to F, Cl, Br, I or charged N/S/O is the worst case.

  6. Carbenes — a neutral carbon with two bonds and no hydrogens ([C;X2;H0;+0]) is extremely reactive. Extra penalties if the carbene sits inside a 3- or 4-membered ring or is adjacent to an electron-withdrawing group.

  7. Fused cyclopentane + small hetero ring — a 5-membered all-carbon ring sharing atoms with a 3- or 4-membered hetero ring creates significant ring strain.

  8. Physicochemical outliers — extreme logP values with abs(logP) > 10 or too many rotatable bonds (> 15) each incur a small penalty.

  9. Aromatic bonus — aromatic rings stabilise a molecule, so each one adds a small bonus (capped at +15 total).

Scoring

After all checks, the score is clamped to 0–100:

  • ≥ 80"Likely stable"

  • 50–79"Moderately stable"

  • < 50"Potentially unstable"

Entry points

  • check_molecule_stability — analyse a single SMILES and return a 0–100 stability score with an issue list.

  • is_valid_smiles — quick check that a SMILES string parses.

Heuristic stability checker for molecules in retrosynthetic pathways.

When an LLM proposes reactant molecules, some of them may be chemically unstable — strained small rings, anti-aromatic systems, reactive intermediates like carbocations or carbenes, etc. This module catches those problems by inspecting the molecular graph with RDKit descriptors and SMARTS pattern matching. No trained model is required.

Checks performed

  1. Strained small rings — 3- or 4-membered rings that contain a heteroatom (N, O, S …) are flagged. Aziridines and azetidines are significantly more strained than plain cyclopropane / cyclobutane.

  2. Anti-aromatic motifs — rings whose π-electron count is a multiple of 4 (Hückel 4n rule) are thermodynamically destabilised. Known patterns (cyclobutadiene, cyclooctatetraene, pentalene) are matched via SMARTS, and π electrons are also counted directly for rings of size 4, 8, 12, or 16.

  3. Fused small rings — two rings of ≤ 4 atoms sharing atoms create extreme angle strain (e.g. bicyclo[1.1.0]butane). These systems can be explosive.

  4. Large heterocycles — rings with ≥ 7 atoms containing heteroatoms tend to be conformationally floppy and often unstable. Very large heterocycles (> 10 atoms, ≥ 3 heteroatoms) get an extra penalty.

  5. Carbocations — positively charged carbon centres are reactive intermediates, not isolable species. The checker distinguishes:

    • sp2 ([C+;X3]) — penalised unless stabilised by an adjacent aromatic ring, allylic double bond, or benzylic position.

    • sp ([C+;X2]) — always penalised heavily.

    • Primary vs secondary — primary is worse because fewer alkyl groups donate electron density.

    • Adjacent to EWG — a carbocation next to F, Cl, Br, I or charged N/S/O is the worst case.

  6. Carbenes — a neutral carbon with two bonds and no hydrogens ([C;X2;H0;+0]) is extremely reactive. Extra penalties if the carbene sits inside a 3- or 4-membered ring or is adjacent to an electron-withdrawing group.

  7. Fused cyclopentane + small hetero ring — a 5-membered all-carbon ring sharing atoms with a 3- or 4-membered hetero ring creates significant ring strain.

  8. Physicochemical outliers — extreme logP values with abs(logP) > 10 or too many rotatable bonds (> 15) each incur a small penalty.

  9. Aromatic bonus — aromatic rings stabilise a molecule, so each one adds a small bonus (capped at +15 total).

Scoring

After all checks, the score is clamped to 0–100:

  • ≥ 80"Likely stable"

  • 50–79"Moderately stable"

  • < 50"Potentially unstable"

Entry points

  • check_molecule_stability — analyse a single SMILES and return a 0–100 stability score with an issue list.

  • is_valid_smiles — quick check that a SMILES string parses.

deepretro.algorithms.stability_checker.is_valid_smiles(smiles)[source]

Check whether a SMILES string can be parsed by RDKit.

Parameters:

smiles (str) – SMILES string to validate.

Returns:

True if RDKit can build a molecule from smiles.

Return type:

bool

Examples

>>> is_valid_smiles("CCO")
True
>>> is_valid_smiles("not_a_molecule")
False
deepretro.algorithms.stability_checker.check_molecule_stability(smiles)[source]

Assess the stability of a molecule from its SMILES string.

Parses the molecule with RDKit and runs nine heuristic checks (see the module docstring for a full description of each one): strained small rings, anti-aromatic motifs, fused small rings, large heterocycles, carbocations, carbenes, fused cyclopentane systems, physicochemical outliers, and an aromatic-ring bonus. Each problem subtracts from a base score of 100; the final score is clamped to 0–100.

Parameters:

smiles (str) – SMILES string of the molecule to assess.

Returns:

Keys returned:

  • valid_structure (bool) — whether RDKit could parse the SMILES at all.

  • stability_score (int, 0–100) — overall stability rating.

  • issues (list[str]) — plain-English descriptions of every problem found (e.g. "Three-membered heterocycle (potentially unstable)").

  • metrics (dict) — molecular weight, logP, H-bond donors / acceptors, rotatable bonds.

  • ring_data (dict) — ring counts broken down by type (aliphatic / aromatic, carbocycle / heterocycle, bridgehead atoms, etc.).

  • atom_data (dict) — total atoms, bonds, heavy atoms, aromatic vs aliphatic counts.

  • assessment (str) — one of "Likely stable", "Moderately stable", or "Potentially unstable".

Return type:

dict[str, Any]

Examples

>>> res = check_molecule_stability("c1ccccc1")  # benzene
>>> res["assessment"]
'Likely stable'
>>> res = check_molecule_stability("[CH2+]C")    # ethyl cation
>>> res["assessment"]
'Potentially unstable'
deepretro.algorithms.stability_checker.check_carbocations(mol, results, score)[source]

Detect carbocation intermediates and apply score penalties.

Carbocations are positively charged carbon centres i.e. reactive intermediates that cannot be bottled. The function uses SMARTS pattern matching to find them and then checks whether the charge is stabilised by resonance (allylic or benzylic position) or by neighbouring aromatic atoms. Unstabilised and primary carbocations get the heaviest penalties; stabilised ones get a small bonus.

Parameters:
  • mol (Chem.Mol) – RDKit molecule object already parsed from SMILES.

  • results (dict[str, Any]) – Accumulator — detected issues are appended to results["issues"].

  • score (int) – Running stability score to adjust.

Returns:

Updated stability score after carbocation penalties / bonuses.

Return type:

int

Examples

>>> from rdkit import Chem
>>> from deepretro.algorithms import check_carbocations
>>> mol = Chem.MolFromSmiles("[CH2+]C")  # ethyl cation
>>> results = {"issues": []}
>>> new_score = check_carbocations(mol, results, 100)
>>> "Contains primary carbocation (highly unstable)" in results["issues"]
True
>>> new_score < 100
True
deepretro.algorithms.stability_checker.check_carbenes(mol, results, score)[source]

Detect carbene intermediates and apply score penalties.

A carbene is a neutral carbon with only two bonds and no hydrogens i.e. an extremely reactive species that usually exists only as a fleeting intermediate. Additional penalties stack if the carbene is inside a strained 3- or 4-membered ring, or sits next to an electron-withdrawing group (halogens, charged heteroatoms).

Parameters:
  • mol (Chem.Mol) – RDKit molecule object already parsed from SMILES.

  • results (dict[str, Any]) – Accumulator — detected issues are appended to results["issues"].

  • score (int) – Running stability score to adjust.

Returns:

Updated stability score after carbene penalties.

Return type:

int

Examples

>>> from rdkit import Chem
>>> from deepretro.algorithms import check_carbenes
>>> mol = Chem.MolFromSmiles("[C]1CC1")  # carbene in 3-membered ring
>>> results = {"issues": []}
>>> new_score = check_carbenes(mol, results, 100)
>>> any("carbene" in i for i in results["issues"])
True
>>> new_score < 100
True
deepretro.algorithms.stability_checker.check_fused_cyclopentane(mol, atom_rings, results, score)[source]

Detect 5-membered carbon rings fused with small hetero rings.

A cyclopentane ring sharing atoms with a 3- or 4-membered ring that contains a heteroatom (N, O, S …) creates significant angle strain, for example 1,2-epoxycyclopentane. Each such fusion incurs a heavy penalty (-40).

Parameters:
  • mol (Chem.Mol) – RDKit molecule object already parsed from SMILES.

  • atom_rings (tuple) – Ring atom-index tuples from mol.GetRingInfo().AtomRings().

  • results (dict[str, Any]) – Accumulator — detected issues are appended to results["issues"].

  • score (int) – Running stability score to adjust.

Returns:

Updated stability score after fused-ring penalties.

Return type:

int

Examples

>>> from rdkit import Chem
>>> from deepretro.algorithms import check_fused_cyclopentane
>>> mol = Chem.MolFromSmiles("C1CC2OCC12")  # epoxycyclopentane
>>> rings = mol.GetRingInfo().AtomRings()
>>> results = {"issues": []}
>>> new_score = check_fused_cyclopentane(mol, rings, results, 100)
>>> any("strained system" in i for i in results["issues"])
True
>>> new_score < 100
True