deepretro.utils

Utility layer for domain features and template-based retrosynthesis integration.

This module group supports three major workflows:

Feature engineering utilities for ML-ready reaction-step vectors.
AiZynthFinder orchestration helpers for template route generation.
Molecule utilities for SMILES validation, substructure matching, and pathway filtering (deepretro.utils.utils_molecule).

Utility Overview

Utility	Purpose
`run_az`	Execute AiZynthFinder and return solved flag + route dictionaries.
`run_az_with_img`	Same as `run_az` plus route images when available.
`is_basic_molecule`	Shortcut heuristic for trivial molecules to bypass heavy search.
`CacheManager` / `make_cache_key`	Explicit in-memory cache primitives for expensive library operations.
`call_LLM` / `llm_pipeline`	LiteLLM-backed retrosynthesis calls, response parsing, and pathway filtering.

AiZynthFinder Integration Notes

run_az and run_az_with_img use environment-configured model paths:

AZ_MODELS_PATH (preferred model-variant path)
AZ_MODEL_CONFIG_PATH (fallback config path)

Behavior highlights:

Auto-bypass for trivial/basic molecules via BASIC_MOLECULES and is_basic_molecule.
Caching via src.cache.cache_results decorator.
Explicit process-local caching helpers live in deepretro.utils.cache when callers need in-memory caching without shared global state.
Returns route dictionaries with metadata and scores from AiZynthFinder.

Example: run template search

from deepretro.utils.az import run_az

solved, routes = run_az("C1CCCCC1", az_model="USPTO")
print(solved, len(routes))

Submodules

API Reference

deepretro.utils.extract_domain_features_single(product_smiles, reactants_smiles)[source]

Extract hand-crafted domain features for one product-reactant pair.

Computes atom-count deltas (C, N, O, Cl, Br), bond/ring/aromaticity deltas, molecular-weight deltas, and absolute counts.

Parameters:

product_smiles (str) – SMILES of the target product.
reactants_smiles (str) – SMILES of the proposed reactants (dot-separated when multiple).

Returns:

features – 1-D feature vector. Returns a NaN vector on any parsing failure, so invalid rows are distinguishable from real data downstream.

Return type:

np.ndarray, shape (NUM_DOMAIN_FEATURES,)

Examples

>>> from deepretro.utils import extract_domain_features_single
>>> feats = extract_domain_features_single("CCO", "CC.O")
>>> feats.shape
(15,)

deepretro.utils.find_optimal_threshold(y_true, probabilities)[source]

Find the classification threshold that maximises F1-score.

Sweeps the precision-recall curve and picks the threshold where the harmonic mean of precision and recall is highest.

Parameters:

y_true (array-like, shape (n_samples,)) – True binary labels (0 or 1).
probabilities (array-like, shape (n_samples,)) – Predicted probabilities for the positive class.

Returns:

threshold (float) – Optimal classification threshold.
f1 (float) – F1-score at the optimal threshold.

Return type:

tuple[float, float]

Examples

>>> import numpy as np
>>> from deepretro.utils.metrics import find_optimal_threshold
>>> y = np.array([0, 0, 1, 1])
>>> proba = np.array([0.1, 0.4, 0.6, 0.9])
>>> thr, f1 = find_optimal_threshold(y, proba)
>>> 0.0 < thr < 1.0
True
>>> f1 > 0.0
True

deepretro.utils.domain_features

Domain feature extraction utilities for reaction-step featurization.

deepretro.utils.domain_features.extract_domain_features_single(product_smiles, reactants_smiles)[source]

Extract hand-crafted domain features for one product-reactant pair.

Computes atom-count deltas (C, N, O, Cl, Br), bond/ring/aromaticity deltas, molecular-weight deltas, and absolute counts.

Parameters:

product_smiles (str) – SMILES of the target product.
reactants_smiles (str) – SMILES of the proposed reactants (dot-separated when multiple).

Returns:

features – 1-D feature vector. Returns a NaN vector on any parsing failure, so invalid rows are distinguishable from real data downstream.

Return type:

np.ndarray, shape (NUM_DOMAIN_FEATURES,)

Examples

>>> from deepretro.utils import extract_domain_features_single
>>> feats = extract_domain_features_single("CCO", "CC.O")
>>> feats.shape
(15,)