deepretro.algorithms.hallucination_checker

Heuristic hallucination checker for retrosynthetic reaction steps. Compares reactant and product molecules to detect structural inconsistencies (atom-count mismatches, ring-size changes, substituent swaps, etc.) and produces a 0–100 hallucination score.

from deepretro.algorithms.hallucination_checker import (
    hallucination_compare_molecules,
    calculate_hallucination_score,
)

issues = hallucination_compare_molecules("c1ccccc1", "c1ccccc1OC")
result = calculate_hallucination_score("c1ccccc1", "c1ccccc1OC")
print(result["score"], result["severity"])

API

Heuristic hallucination checker for retrosynthetic reaction steps.

When a retrosynthetic step (product → reactant) is proposed, the predicted reactant may contain structural mistakes: atoms appearing or vanishing, rings changing size, substituents jumping to a different position on an aromatic ring, and so on.

This module catches those mistakes automatically by comparing the reactant and product. Two main entry points are provided:

  • hallucination_compare_molecules — runs every check and returns a detailed breakdown of what (if anything) looks wrong.

  • calculate_hallucination_score — distils the breakdown into a single 0–100 score (100 = looks fine, 0 = almost certainly hallucinated) with a severity label (low / medium / high / critical).

deepretro.algorithms.hallucination_checker.hallucination_compare_molecules(reactant_smiles, product_smiles)[source]

Compare a reactant and product molecule to detect potential hallucinations.

Given two SMILES strings, this function parses both molecules and checks atom-count consistency, ring-size changes, substituent position swaps, aromaticity shifts, and unnecessary bond formations.

Parameters:
  • reactant_smiles (str) – SMILES string of the reactant molecule.

  • product_smiles (str) – SMILES string of the product molecule.

Returns:

results

  • valid_reactant (bool) — reactant SMILES parsed OK.

  • valid_product (bool) — product SMILES parsed OK.

  • atom_count_consistent (bool) — all elements match.

  • ring_size_changes (list[str]) — rings added/removed.

  • substituent_position_changes (list[dict]) — position swaps.

  • detected_issues (list[str]) — all issues found (empty if clean).

Return type:

dict[str, Any]

Examples

>>> from deepretro.algorithms import hallucination_compare_molecules
>>> res = hallucination_compare_molecules("c1ccccc1", "c1ccccc1OC")
>>> res["valid_reactant"] and res["valid_product"]
True
deepretro.algorithms.hallucination_checker.check_ring_substituent_positions(reactant_mol, product_mol, results)[source]

Detect changes in the position of substituents on aromatic rings.

For each aromatic ring that appears in both the reactant and the product, this function figures out what groups are attached and where (ortho / meta / para). If the same group shows up at a different position in the product, that is flagged, it almost always means the LLM hallucinated the position.

Findings are written directly into results.

Parameters:
  • reactant_mol (rdkit.Chem.Mol) – RDKit molecule object of the reactant.

  • product_mol (rdkit.Chem.Mol) – RDKit molecule object of the product.

  • results (dict) – Results dictionary to update with findings.

Return type:

None

Examples

>>> from rdkit import Chem
>>> r_mol = Chem.MolFromSmiles("c1ccc(O)cc1")   # phenol
>>> p_mol = Chem.MolFromSmiles("c1ccc(O)cc1")   # same phenol
>>> res = {"detected_issues": [], "substituent_position_changes": []}
>>> check_ring_substituent_positions(r_mol, p_mol, res)
>>> res["substituent_position_changes"]
[]
deepretro.algorithms.hallucination_checker.identify_ring_systems(mol)[source]

Identify all ring systems in a molecule and their properties.

Walks the SSSR (Smallest Set of Smallest Rings) that RDKit computes and, for each ring, notes how many atoms it has, which atom indices belong to it, and whether every atom in the ring is aromatic. The matched flag starts as False and is used later when pairing up rings between reactant and product.

Parameters:

mol (rdkit.Chem.Mol) – RDKit molecule object.

Returns:

rings – Each dict has keys id, atoms, size, is_aromatic, and matched.

Return type:

list of dict

Examples

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccccc1")
>>> rings = identify_ring_systems(mol)
>>> len(rings)
1
>>> rings[0]["size"]
6
>>> rings[0]["is_aromatic"]
True
deepretro.algorithms.hallucination_checker.identify_substituents(mol, ring_info)[source]

Identify all substituents attached to a ring and their positions.

Walks the atoms of the ring and, for every neighbour that is not part of the ring, traces out the full substituent group and labels its attachment point as ortho / meta / para (for 6-membered rings) or a numbered position (for other ring sizes).

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule object.

  • ring_info (dict) – Ring descriptor as returned by identify_ring_systems.

Returns:

substituents – Each dict has keys attachment_point, first_atom, atoms, and position.

Return type:

list of dict

Examples

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccc(O)cc1")  # phenol
>>> rings = identify_ring_systems(mol)
>>> subs = identify_substituents(mol, rings[0])
>>> len(subs) >= 1
True
>>> subs[0]["position"] in ("1", "ortho", "meta", "para")
True
deepretro.algorithms.hallucination_checker.determine_ring_position(mol, atom_idx, ring_atoms, ring_size)[source]

Determine the position of a substituent on a ring.

For 6-membered rings uses ortho/meta/para nomenclature. For other ring sizes returns numbered positions.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule object.

  • atom_idx (int) – Index of the ring atom the substituent is bonded to.

  • ring_atoms (set[int]) – All atom indices that belong to the ring.

  • ring_size (int) – Size of the ring.

Returns:

position"ortho", "meta", "para", or a numbered position.

Return type:

str

Examples

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccc(O)cc1")  # phenol
>>> ring_atoms = set(range(6))
>>> pos = determine_ring_position(mol, 3, ring_atoms, 6)
>>> pos in ("1", "ortho", "meta", "para")
True
deepretro.algorithms.hallucination_checker.get_connected_atoms(mol, start_idx, exclude_atoms)[source]

Get all atoms connected to a starting atom, excluding a set of atoms.

Starting from start_idx (typically the first atom outside a ring), this does a breadth-first walk along bonds and collects every atom it reaches. It will not cross into any atom listed in exclude_atoms, this is how we stop at the ring boundary and only get the substituent itself.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule object.

  • start_idx (int) – Atom index to start the walk from.

  • exclude_atoms (set[int]) – Atom indices to treat as barriers (usually the ring atoms).

Returns:

atoms – List of atom indices that form the connected component.

Return type:

list of int

Examples

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccc(OC)cc1")  # methoxybenzene
>>> ring_atoms = set(range(6))
>>> # atom 6 is the O attached to the ring; BFS from there excluding ring
>>> connected = get_connected_atoms(mol, 6, ring_atoms)
>>> len(connected) >= 1
True
deepretro.algorithms.hallucination_checker.get_substituent_signature(mol, substituent)[source]

Generate a signature for a substituent to identify similar groups.

Counts element types in the substituent atoms and returns a sorted dot-separated string (e.g. "C2.O1").

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule object.

  • substituent (dict) – Substituent descriptor (must contain an atoms key with a list of atom indices).

Returns:

signature – Signature string for the substituent.

Return type:

str

Examples

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccc(O)cc1")  # phenol
>>> subst = {"atoms": [6]}  # the oxygen atom
>>> get_substituent_signature(mol, subst)
'O1'
deepretro.algorithms.hallucination_checker.get_friendly_substituent_name(signature)[source]

Convert a substituent signature to a friendly name when possible.

Parameters:

signature (str) – Element-count signature (e.g. "C1", "N1.O2").

Returns:

name – Friendly name (e.g. "Methyl"), or "Group (<signature>)" if no match is found.

Return type:

str

Examples

>>> get_friendly_substituent_name("C1")
'Methyl'
>>> get_friendly_substituent_name("Br1")
'Bromo'
>>> get_friendly_substituent_name("X99")
'Group (X99)'
deepretro.algorithms.hallucination_checker.calculate_hallucination_score(reactant_smiles, product_smiles)[source]

Calculate a hallucination score for a chemical transformation.

This is the high-level entry point. It runs hallucination_compare_molecules under the hood and then converts each kind of issue into a point deduction from a perfect score of 100. Bigger problems cost more points (e.g. a substituent jumping position costs 60, while one extra bond costs only 5). The final score is clamped to 0–100 and labelled with a severity:

  • ≥ 80"low" — looks plausible

  • 40–79"medium" — worth a second look

  • 20–39"high" — likely hallucinated

  • < 20"critical" — almost certainly wrong

If either SMILES string cannot be parsed, the score is 0 / critical.

Parameters:
  • reactant_smiles (str) – SMILES string of the reactant molecule.

  • product_smiles (str) – SMILES string of the product molecule.

Returns:

result – Dictionary with keys score (int, 0–100), severity ("low" / "medium" / "high" / "critical"), penalties (list of str), and message (str).

Return type:

dict

Examples

>>> from deepretro.algorithms import calculate_hallucination_score
>>> result = calculate_hallucination_score("c1ccccc1", "c1ccccc1OC")
>>> result["severity"]
'low'
>>> result["score"] >= 80
True
deepretro.algorithms.hallucination_checker.interpret_score(score)[source]

Turn a numeric hallucination score into a sentence a non-expert can read.

This is called automatically by calculate_hallucination_score to fill the message field, but you can also use it standalone if you already have a score.

Parameters:

score (int) – Hallucination score (0 = worst, 100 = best).

Returns:

message – One-sentence plain-English interpretation.

Return type:

str

Examples

>>> from deepretro.algorithms import interpret_score
>>> interpret_score(95)
'Highly reliable transformation with minimal or no structural inconsistencies'