deepretro.models.hallucination_classifier
XGBoost-based binary classifier for detecting hallucinated retrosynthesis
reactions. Built on DeepChem’s GBDTModel, which
wraps an XGBClassifier and adds automatic early-stopping via an
internal 80/20 train/validation split.
Training a new model
Prepare a CSV with columns product, reactants, and label
(1 = hallucinated, 0 = valid). Then:
from deepretro.data import ReactionDataLoader, stratified_split
from deepretro.models import HallucinationClassifier
# Load and featurize
loader = ReactionDataLoader()
dataset = loader.create_dataset("data/hallucination_dataset.csv")
train, valid, test = stratified_split(dataset)
# Train
clf = HallucinationClassifier(model_dir="my_models/")
clf.fit(train)
# Evaluate (also sets the optimal probability threshold)
scores = clf.evaluate(test)
print(scores)
Saving and loading
The model is auto-saved to model_dir after training. To reload:
clf = HallucinationClassifier(model_dir="my_models/")
clf.load("my_models/")
The saved artifacts include the XGBoost model weights and the optimal classification threshold.
Configuration
No environment variables are required. All paths are passed as
arguments to the constructor and load() / save() methods.
API
XGBoost hallucination classifier built on DeepChem’s GBDTModel.
Provides a single class that handles training, evaluation, threshold
optimisation, single-reaction prediction, and persistence using
DeepChem APIs end-to-end. GBDTModel wraps an XGBClassifier
and adds automatic early-stopping with an 80/20 internal split.
- deepretro.models.hallucination_classifier.probability_scores(dataset, model)[source]
Compute ROC-AUC and optimal threshold from probabilities.
- Parameters:
dataset (Dataset) – Labelled dataset with
yground-truth.model (XGBClassifier) – Fitted sklearn-compatible model with
predict_proba.
- Returns:
Keys:
roc_auc,optimal_threshold,optimal_f1.- Return type:
dict
- deepretro.models.hallucination_classifier.predict_single_reaction(clf, product_smiles, reactants_smiles)[source]
Predict whether a single reaction step is hallucinated.
This is a module-level helper so it can be used independently of the class method.
- Parameters:
clf (HallucinationClassifier) – A fitted classifier instance.
product_smiles (str) – SMILES of the target product.
reactants_smiles (str) – SMILES of the proposed reactants (dot-separated).
- Returns:
result – Keys:
is_hallucination(bool),probability(float). On invalid SMILES anerrorkey is added instead.- Return type:
dict
Examples
>>> from deepretro.models import HallucinationClassifier >>> clf = HallucinationClassifier() >>> clf.load("saved_model/") >>> predict_single_reaction(clf, "CCO", "CC.O") {'is_hallucination': False, 'probability': 0.12} >>> predict_single_reaction(clf, "GARBAGE", "CC.O") {'error': 'Invalid SMILES', 'is_hallucination': None, 'probability': None}
- class deepretro.models.hallucination_classifier.HallucinationClassifier(*args, **kwargs)[source]
Binary classifier for detecting hallucinated retrosynthesis reactions.
Inherits from DeepChem’s
GBDTModelwhich wraps anXGBClassifierand adds automatic early-stopping via an internal 80/20 train/validation split.Training data
Prepare a CSV with at least these columns:
product— SMILES of the target product.reactants— SMILES of proposed reactants (dot-separated for multiple reactants, e.g."CC.O").label—1if the reaction is hallucinated,0if real.
Then load and train:
from deepretro.data import ReactionDataLoader, stratified_split from deepretro.models import HallucinationClassifier loader = ReactionDataLoader() ds = loader.create_dataset("path/to/your_dataset.csv") train, valid, test = stratified_split(ds) clf = HallucinationClassifier(model_dir="my_models/") clf.fit(train) scores = clf.evaluate(test) print(scores)
The trained model is persisted via DeepChem’s standard joblib serialisation. To reload later:
clf = HallucinationClassifier(model_dir="my_models/") clf.load("my_models/")
- param model_dir:
Directory for DeepChem model checkpoints. If
None, a temporary directory is used (seedeepchem.models.Model).- type model_dir:
str, optional
- param early_stopping_rounds:
Rounds for early stopping during
fit(). Default50.- type early_stopping_rounds:
int, optional
- param **xgb_kwargs:
Forwarded to the underlying
XGBClassifierviaGBDTModel. Defaults are tuned for the hallucination detection task.
Examples
>>> from deepretro.models import HallucinationClassifier >>> clf = HallucinationClassifier() >>> clf.threshold 0.5
- __init__(model_dir=None, early_stopping_rounds=50, **xgb_kwargs)[source]
- Parameters:
model_dir (str | None)
early_stopping_rounds (int)
xgb_kwargs (Any)
- Return type:
None
- fit(train_dataset)[source]
Train the model on a DeepChem
Dataset.GBDTModelautomatically performs an internal 80/20 train/validation split for early stopping. The model is auto-saved tomodel_dirafter training.- Parameters:
train_dataset (Dataset) – Training data produced by
deepretro.data.loader.- Return type:
None
Examples
>>> clf.fit(train_ds)
- evaluate(test_dataset, metrics=None)[source]
Evaluate using DeepChem
Metricobjects.Returns label-based metrics, plus probability-based ROC-AUC and the optimal threshold. Updates
self.thresholdto the optimal value and auto-saves the model state.- Parameters:
test_dataset (Dataset) – Held-out test data.
metrics (list of dc.metrics.Metric, optional) – Label-based metrics to compute. If
None, defaults to:[Metric(accuracy_score, name="accuracy"),Metric(f1_score, name="f1")]. Anysklearn.metricsfunction that accepts(y_true, y_pred)can be wrapped withdc.metrics.Metric, e.g.Metric(precision_score, name="precision").
- Returns:
scores – Contains each requested metric name (or
accuracy/f1when defaults are used), plusroc_auc,optimal_threshold, andoptimal_f1.- Return type:
dict
Examples
>>> scores = clf.evaluate(test_ds) >>> scores["roc_auc"] 0.92
- predict_probability(dataset)[source]
Return hallucination probabilities for each sample.
- Parameters:
dataset (Dataset) – Data to score.
- Returns:
probabilities – Probability of the positive class (hallucination).
- Return type:
np.ndarray, shape (n_samples,)
- predict_with_threshold(dataset)[source]
Predict binary labels using the current threshold.
Unlike the inherited
predict()(which returns raw model output), this appliesself.thresholdto produce binary labels.- Parameters:
dataset (Dataset) – Data to classify.
- Returns:
labels (np.ndarray, shape (n_samples,)) – Binary predictions (0 or 1).
probabilities (np.ndarray, shape (n_samples,)) – Hallucination probabilities.
- Return type:
tuple[numpy.ndarray, numpy.ndarray]
- predict_single(product_smiles, reactants_smiles)[source]
Thin wrapper around
predict_single_reaction().- Parameters:
product_smiles (str)
reactants_smiles (str)
- Return type:
dict[str, Any]
- Parameters:
model_dir (str | None)
early_stopping_rounds (int)
xgb_kwargs (Any)