deepretro.utils
Utility layer for domain features and template-based retrosynthesis integration.
This module group supports three major workflows:
Feature engineering utilities for ML-ready reaction-step vectors.
AiZynthFinder orchestration helpers for template route generation.
Molecule utilities for SMILES validation, substructure matching, and pathway filtering (
deepretro.utils.utils_molecule).
Utility Overview
Utility |
Purpose |
|---|---|
|
Execute AiZynthFinder and return solved flag + route dictionaries. |
|
Same as |
|
Shortcut heuristic for trivial molecules to bypass heavy search. |
|
Explicit in-memory cache primitives for expensive library operations. |
|
LiteLLM-backed retrosynthesis calls, response parsing, and pathway filtering. |
AiZynthFinder Integration Notes
run_az and run_az_with_img use environment-configured model paths:
AZ_MODELS_PATH(preferred model-variant path)AZ_MODEL_CONFIG_PATH(fallback config path)
Behavior highlights:
Auto-bypass for trivial/basic molecules via
BASIC_MOLECULESandis_basic_molecule.Caching via
src.cache.cache_resultsdecorator.Explicit process-local caching helpers live in
deepretro.utils.cachewhen callers need in-memory caching without shared global state.Returns route dictionaries with metadata and scores from AiZynthFinder.
Example: run template search
from deepretro.utils.az import run_az
solved, routes = run_az("C1CCCCC1", az_model="USPTO")
print(solved, len(routes))
Submodules
API Reference
- deepretro.utils.extract_domain_features_single(product_smiles, reactants_smiles)[source]
Extract hand-crafted domain features for one product-reactant pair.
Computes atom-count deltas (C, N, O, Cl, Br), bond/ring/aromaticity deltas, molecular-weight deltas, and absolute counts.
- Parameters:
product_smiles (str) – SMILES of the target product.
reactants_smiles (str) – SMILES of the proposed reactants (dot-separated when multiple).
- Returns:
features – 1-D feature vector. Returns a NaN vector on any parsing failure, so invalid rows are distinguishable from real data downstream.
- Return type:
np.ndarray, shape (NUM_DOMAIN_FEATURES,)
Examples
>>> from deepretro.utils import extract_domain_features_single >>> feats = extract_domain_features_single("CCO", "CC.O") >>> feats.shape (15,)
- deepretro.utils.find_optimal_threshold(y_true, probabilities)[source]
Find the classification threshold that maximises F1-score.
Sweeps the precision-recall curve and picks the threshold where the harmonic mean of precision and recall is highest.
- Parameters:
y_true (array-like, shape (n_samples,)) – True binary labels (0 or 1).
probabilities (array-like, shape (n_samples,)) – Predicted probabilities for the positive class.
- Returns:
threshold (float) – Optimal classification threshold.
f1 (float) – F1-score at the optimal threshold.
- Return type:
tuple[float, float]
Examples
>>> import numpy as np >>> from deepretro.utils.metrics import find_optimal_threshold >>> y = np.array([0, 0, 1, 1]) >>> proba = np.array([0.1, 0.4, 0.6, 0.9]) >>> thr, f1 = find_optimal_threshold(y, proba) >>> 0.0 < thr < 1.0 True >>> f1 > 0.0 True
deepretro.utils.domain_features
Domain feature extraction utilities for reaction-step featurization.
- deepretro.utils.domain_features.extract_domain_features_single(product_smiles, reactants_smiles)[source]
Extract hand-crafted domain features for one product-reactant pair.
Computes atom-count deltas (C, N, O, Cl, Br), bond/ring/aromaticity deltas, molecular-weight deltas, and absolute counts.
- Parameters:
product_smiles (str) – SMILES of the target product.
reactants_smiles (str) – SMILES of the proposed reactants (dot-separated when multiple).
- Returns:
features – 1-D feature vector. Returns a NaN vector on any parsing failure, so invalid rows are distinguishable from real data downstream.
- Return type:
np.ndarray, shape (NUM_DOMAIN_FEATURES,)
Examples
>>> from deepretro.utils import extract_domain_features_single >>> feats = extract_domain_features_single("CCO", "CC.O") >>> feats.shape (15,)