deepretro.models.hallucination_classifier

XGBoost-based binary classifier for detecting hallucinated retrosynthesis reactions. Built on DeepChem’s GBDTModel, which wraps an XGBClassifier and adds automatic early-stopping via an internal 80/20 train/validation split.

Training a new model

Prepare a CSV with columns product, reactants, and label (1 = hallucinated, 0 = valid). Then:

from deepretro.data import ReactionDataLoader, stratified_split
from deepretro.models import HallucinationClassifier

# Load and featurize
loader = ReactionDataLoader()
dataset = loader.create_dataset("data/hallucination_dataset.csv")
train, valid, test = stratified_split(dataset)

# Train
clf = HallucinationClassifier(model_dir="my_models/")
clf.fit(train)

# Evaluate (also sets the optimal probability threshold)
scores = clf.evaluate(test)
print(scores)

Saving and loading

The model is auto-saved to model_dir after training. To reload:

clf = HallucinationClassifier(model_dir="my_models/")
clf.load("my_models/")

The saved artifacts include the XGBoost model weights and the optimal classification threshold.

Configuration

No environment variables are required. All paths are passed as arguments to the constructor and load() / save() methods.

API

XGBoost hallucination classifier built on DeepChem’s GBDTModel.

Provides a single class that handles training, evaluation, threshold optimisation, single-reaction prediction, and persistence using DeepChem APIs end-to-end. GBDTModel wraps an XGBClassifier and adds automatic early-stopping with an 80/20 internal split.

deepretro.models.hallucination_classifier.probability_scores(dataset, model)[source]

Compute ROC-AUC and optimal threshold from probabilities.

Parameters:
  • dataset (Dataset) – Labelled dataset with y ground-truth.

  • model (XGBClassifier) – Fitted sklearn-compatible model with predict_proba.

Returns:

Keys: roc_auc, optimal_threshold, optimal_f1.

Return type:

dict

deepretro.models.hallucination_classifier.predict_single_reaction(clf, product_smiles, reactants_smiles)[source]

Predict whether a single reaction step is hallucinated.

This is a module-level helper so it can be used independently of the class method.

Parameters:
  • clf (HallucinationClassifier) – A fitted classifier instance.

  • product_smiles (str) – SMILES of the target product.

  • reactants_smiles (str) – SMILES of the proposed reactants (dot-separated).

Returns:

result – Keys: is_hallucination (bool), probability (float). On invalid SMILES an error key is added instead.

Return type:

dict

Examples

>>> from deepretro.models import HallucinationClassifier
>>> clf = HallucinationClassifier()
>>> clf.load("saved_model/")                              
>>> predict_single_reaction(clf, "CCO", "CC.O")           
{'is_hallucination': False, 'probability': 0.12}
>>> predict_single_reaction(clf, "GARBAGE", "CC.O")       
{'error': 'Invalid SMILES', 'is_hallucination': None, 'probability': None}
class deepretro.models.hallucination_classifier.HallucinationClassifier(*args, **kwargs)[source]

Binary classifier for detecting hallucinated retrosynthesis reactions.

Inherits from DeepChem’s GBDTModel which wraps an XGBClassifier and adds automatic early-stopping via an internal 80/20 train/validation split.

Training data

Prepare a CSV with at least these columns:

  • product — SMILES of the target product.

  • reactants — SMILES of proposed reactants (dot-separated for multiple reactants, e.g. "CC.O").

  • label1 if the reaction is hallucinated, 0 if real.

Then load and train:

from deepretro.data import ReactionDataLoader, stratified_split
from deepretro.models import HallucinationClassifier

loader = ReactionDataLoader()
ds = loader.create_dataset("path/to/your_dataset.csv")
train, valid, test = stratified_split(ds)

clf = HallucinationClassifier(model_dir="my_models/")
clf.fit(train)
scores = clf.evaluate(test)
print(scores)

The trained model is persisted via DeepChem’s standard joblib serialisation. To reload later:

clf = HallucinationClassifier(model_dir="my_models/")
clf.load("my_models/")
param model_dir:

Directory for DeepChem model checkpoints. If None, a temporary directory is used (see deepchem.models.Model).

type model_dir:

str, optional

param early_stopping_rounds:

Rounds for early stopping during fit(). Default 50.

type early_stopping_rounds:

int, optional

param **xgb_kwargs:

Forwarded to the underlying XGBClassifier via GBDTModel. Defaults are tuned for the hallucination detection task.

Examples

>>> from deepretro.models import HallucinationClassifier
>>> clf = HallucinationClassifier()
>>> clf.threshold
0.5
__init__(model_dir=None, early_stopping_rounds=50, **xgb_kwargs)[source]
Parameters:
  • model_dir (str | None)

  • early_stopping_rounds (int)

  • xgb_kwargs (Any)

Return type:

None

fit(train_dataset)[source]

Train the model on a DeepChem Dataset.

GBDTModel automatically performs an internal 80/20 train/validation split for early stopping. The model is auto-saved to model_dir after training.

Parameters:

train_dataset (Dataset) – Training data produced by deepretro.data.loader.

Return type:

None

Examples

>>> clf.fit(train_ds)  
evaluate(test_dataset, metrics=None)[source]

Evaluate using DeepChem Metric objects.

Returns label-based metrics, plus probability-based ROC-AUC and the optimal threshold. Updates self.threshold to the optimal value and auto-saves the model state.

Parameters:
  • test_dataset (Dataset) – Held-out test data.

  • metrics (list of dc.metrics.Metric, optional) – Label-based metrics to compute. If None, defaults to: [Metric(accuracy_score, name="accuracy"), Metric(f1_score, name="f1")]. Any sklearn.metrics function that accepts (y_true, y_pred) can be wrapped with dc.metrics.Metric, e.g. Metric(precision_score, name="precision").

Returns:

scores – Contains each requested metric name (or accuracy/f1 when defaults are used), plus roc_auc, optimal_threshold, and optimal_f1.

Return type:

dict

Examples

>>> scores = clf.evaluate(test_ds)  
>>> scores["roc_auc"]               
0.92
predict_probability(dataset)[source]

Return hallucination probabilities for each sample.

Parameters:

dataset (Dataset) – Data to score.

Returns:

probabilities – Probability of the positive class (hallucination).

Return type:

np.ndarray, shape (n_samples,)

predict_with_threshold(dataset)[source]

Predict binary labels using the current threshold.

Unlike the inherited predict() (which returns raw model output), this applies self.threshold to produce binary labels.

Parameters:

dataset (Dataset) – Data to classify.

Returns:

  • labels (np.ndarray, shape (n_samples,)) – Binary predictions (0 or 1).

  • probabilities (np.ndarray, shape (n_samples,)) – Hallucination probabilities.

Return type:

tuple[numpy.ndarray, numpy.ndarray]

predict_single(product_smiles, reactants_smiles)[source]

Thin wrapper around predict_single_reaction().

Parameters:
  • product_smiles (str)

  • reactants_smiles (str)

Return type:

dict[str, Any]

save(save_dir)[source]

Save model and threshold via DeepChem’s joblib persistence.

Parameters:

save_dir (str) – Directory to write artifacts into.

Return type:

None

Examples

>>> clf.save("saved_model/")  
load(save_dir)[source]

Reload a previously saved model.

Parameters:

save_dir (str) – Directory containing saved artifacts.

Return type:

None

Examples

>>> clf = HallucinationClassifier()
>>> clf.load("saved_model/")  
Parameters:
  • model_dir (str | None)

  • early_stopping_rounds (int)

  • xgb_kwargs (Any)