deepretro.featurizers

Feature engineering helpers for reaction-step ML workflows.

Overview

The deepretro.featurizers package currently exposes a single DeepChem-compatible featurizer:

  • ReactionStepFeaturizer for product/reactant reaction-step pairs.

API

Featurizers for reaction-step data.

class deepretro.featurizers.ReactionStepFeaturizer(*args, **kwargs)[source]

Featurize a reaction step (product + reactants) into a flat numeric vector.

Concatenates three parts:

  1. CircularFingerprint (Morgan/ECFP) for the product — size bits

  2. CircularFingerprint (Morgan/ECFP) for the reactants — size bits

  3. 15 hand-crafted domain features (optional)

Parameters:
  • radius (int, optional (default 2)) – Morgan fingerprint radius. radius=2 corresponds to ECFP4.

  • size (int, optional (default 2048)) – Fingerprint bit length for each molecule.

  • use_domain_features (bool, optional (default True)) – If True, appends 15 domain features (atom/bond/ring/MW deltas).

Notes

This class requires RDKit to be installed.

Examples

>>> from deepretro.featurizers.reactionstep import ReactionStepFeaturizer
>>> featurizer = ReactionStepFeaturizer(radius=2, size=2048)
>>> reactions = [("CCO", "CC.O"), ("c1ccccc1", "c1ccccc1.Cl")]
>>> X = featurizer.featurize(reactions)
>>> X.shape
(2, 4111)
__init__(radius=2, size=2048, use_domain_features=True)[source]
Parameters:
  • radius (int)

  • size (int)

  • use_domain_features (bool)

Return type:

None

property feature_dim: int

Total length of one feature vector.

Returns:

dim2 * size + 15 when use_domain_features=True, 2 * size otherwise.

Return type:

int