deepretro.utils.llm

LiteLLM-backed single-step retrosynthesis utilities.

The LLM code is split into three layers:

  • deepretro.utils.llm is the public workflow facade. It exposes prompt selection, model calls, response parsing, JSON validation, pathway filtering, and the llm_pipeline() orchestration function.

  • deepretro.utils.llm_interface owns provider-specific behavior. The LLMInterface base class defines prompt construction, completion parameter construction, API calling, and response parsing. Concrete implementations handle Claude-style, OpenAI-style, DeepSeek-style, and generic responses.

  • deepretro.utils.llm_helpers contains model normalization and small parsing helpers shared by the interface layer.

Provider Interfaces

Use create_llm_interface() when code needs direct access to the provider-specific interface:

from deepretro.utils.llm_interface import LLMRequest, create_llm_interface

interface = create_llm_interface("openai/gpt-4o-mini")
request = LLMRequest(
    molecule="CCO",
    model="openai/gpt-4o-mini",
    max_output_tokens=2048,
    enable_thinking=False,
)

messages = interface.build_messages(request)
params = interface.build_completion_params(request, messages)

The factory returns one of these implementations:

Interface

Provider family

Parser behavior

AnthropicLLM

Anthropic / Claude

Requires <cot> content with at least one <thinking> entry and a JSON payload. JSON is accepted in <json> tags or as fenced/raw JSON after </cot> because Claude can return either shape.

OpenAILLM

OpenAI

Extracts tagged, fenced, or raw JSON and does not return thinking steps.

DeepSeekLLM

DeepSeek

Extracts optional <think> content plus tagged, fenced, or raw JSON.

GenericLLM

Fallback

Uses the Claude-style parser for compatible providers.

Public Workflow

Most callers should use the facade functions in deepretro.utils.llm:

from deepretro.utils.llm import call_LLM, parse_response, validate_split_json

status, response_text = call_LLM(
    molecule="CCO",
    model="openai/gpt-4o-mini",
    max_output_tokens=2048,
    enable_thinking=False,
)

if status == 200:
    parse_status, thinking_steps, json_content = parse_response(
        response_text,
        "openai/gpt-4o-mini",
    )
    if parse_status == 200:
        validate_split_json(json_content)

llm_pipeline() combines the same steps into the end-to-end flow:

from deepretro.utils.llm import llm_pipeline

pathways, explanations, confidence = llm_pipeline(
    molecule="CCO",
    model="openai/gpt-4o-mini",
    stability_check=False,
    hallucination_check=False,
    max_output_tokens=2048,
    enable_thinking=False,
)

Model Selection

Model identifiers are normalized before calling LiteLLM:

  • OpenAI models can be passed with a LiteLLM prefix, for example openai/gpt-4o-mini, or as names recognized by deepretro.utils.variables.OPENAI_MODELS.

  • OpenAI chat models use max_completion_tokens and receive a deterministic seed.

  • OpenAI reasoning models, such as gpt-5 and o-series models, use reasoning_effort when reasoning controls are enabled. Their output-token budget is raised to a provider-safe minimum.

  • Anthropic Claude 4 Opus/Sonnet models also receive reasoning_effort when reasoning controls are enabled. Their output-token budget is raised to the same provider-safe minimum, and their temperature is set to 1 for reasoning calls.

  • DeepSeek aliases such as fireworks/deepseek-v3p2 are normalized to the preferred Fireworks-hosted DeepSeek R1 model.

  • A :adv suffix, such as openai/gpt-4o-mini:adv, selects the advanced prompt mode unless an explicit prompt_mode argument is provided.

Testing

Most tests in this module do not require live LLM credentials. They exercise provider selection, completion-parameter construction, parser behavior, JSON validation, and pipeline orchestration using local test doubles.

The focused test file also includes two slow Anthropic integration tests. When ANTHROPIC_API_KEY is configured, they call the live server with the real retrosynthesis prompt for aspirin in both standard and advanced prompt modes. Those tests assert that the response contains <cot> / <thinking> tags, that JSON can be extracted from either tagged or fenced output, and that validate_split_json() can convert the payload into aligned pathways, explanations, and confidence scores.

Run the focused test file with:

uv run --project deepretro pytest deepretro/tests/test_llm.py -q

Run only the live Anthropic prompt checks with:

ANTHROPIC_API_KEY=... uv run --project deepretro pytest \
    deepretro/tests/test_llm.py::test_live_anthropic_retrosynthesis_prompt_returns_parseable_tagged_json -q

API

Helper utilities for DeepRetro LLM provider handling.

class deepretro.utils.llm_helpers.ChatMessage[source]

Chat message sent to LiteLLM.

role: str
content: str
class deepretro.utils.llm_helpers.ModelSelection(raw_model, completion_model, prompt_mode, family, provider, output_token_param, supports_reasoning_effort, supports_seed, requires_temperature_one)[source]

Normalized model configuration derived from a model identifier.

Parameters:
  • raw_model (str)

  • completion_model (str)

  • prompt_mode (Literal['standard', 'advanced'])

  • family (Literal['deepseek', 'openai', 'default'])

  • provider (Literal['anthropic', 'openai', 'deepseek'])

  • output_token_param (Literal['max_tokens', 'max_completion_tokens'])

  • supports_reasoning_effort (bool)

  • supports_seed (bool)

  • requires_temperature_one (bool)

raw_model

Original model identifier provided by the caller.

Type:

str

completion_model

Model identifier passed to LiteLLM after alias normalization.

Type:

str

prompt_mode

Prompt variant selected for the call.

Type:

{“standard”, “advanced”}

family

Prompt and parser family for the model.

Type:

{“deepseek”, “openai”, “default”}

provider

Provider inferred from the model name.

Type:

{“anthropic”, “openai”, “deepseek”}

output_token_param

Token-limit keyword expected by the provider.

Type:

{“max_tokens”, “max_completion_tokens”}

supports_reasoning_effort

Whether the model supports a reasoning_effort control.

Type:

bool

supports_seed

Whether deterministic seed should be sent.

Type:

bool

requires_temperature_one

Whether the model should be called with temperature 1.

Type:

bool

raw_model: str
completion_model: str
prompt_mode: Literal['standard', 'advanced']
family: Literal['deepseek', 'openai', 'default']
provider: Literal['anthropic', 'openai', 'deepseek']
output_token_param: Literal['max_tokens', 'max_completion_tokens']
supports_reasoning_effort: bool
supports_seed: bool
requires_temperature_one: bool
__init__(raw_model, completion_model, prompt_mode, family, provider, output_token_param, supports_reasoning_effort, supports_seed, requires_temperature_one)
Parameters:
  • raw_model (str)

  • completion_model (str)

  • prompt_mode (Literal['standard', 'advanced'])

  • family (Literal['deepseek', 'openai', 'default'])

  • provider (Literal['anthropic', 'openai', 'deepseek'])

  • output_token_param (Literal['max_tokens', 'max_completion_tokens'])

  • supports_reasoning_effort (bool)

  • supports_seed (bool)

  • requires_temperature_one (bool)

Return type:

None

deepretro.utils.llm_helpers.split_prompt_mode(model)[source]

Split a model identifier from the optional :adv prompt suffix.

Parameters:

model (str) – Model identifier, optionally suffixed with :adv.

Returns:

Base model identifier and prompt mode.

Return type:

tuple[str, PromptMode]

Examples

>>> split_prompt_mode("openai/gpt-4o-mini:adv")
('openai/gpt-4o-mini', 'advanced')
>>> split_prompt_mode("claude-opus-4-6")
('claude-opus-4-6', 'standard')
deepretro.utils.llm_helpers.strip_provider_prefix(model)[source]

Return the provider-independent model name when a known prefix exists.

Parameters:

model (str) – Model identifier that may include a LiteLLM provider prefix.

Returns:

Model identifier with known openai/ or anthropic/ prefix removed.

Return type:

str

Examples

>>> strip_provider_prefix("openai/gpt-4o-mini")
'gpt-4o-mini'
>>> strip_provider_prefix("fireworks_ai/accounts/fireworks/models/deepseek-r1")
'fireworks_ai/accounts/fireworks/models/deepseek-r1'
deepretro.utils.llm_helpers.looks_like_openai_model(model)[source]

Return whether a model identifier appears to be an OpenAI model.

Parameters:

model (str) – Model identifier to inspect.

Returns:

True for OpenAI-style model identifiers.

Return type:

bool

Examples

>>> looks_like_openai_model("openai/gpt-4o-mini")
True
>>> looks_like_openai_model("claude-opus-4-6")
False
deepretro.utils.llm_helpers.looks_like_openai_reasoning_model(model)[source]

Return whether a model is an OpenAI reasoning-capable model.

Parameters:

model (str) – Model identifier to inspect.

Returns:

True for OpenAI reasoning-model families.

Return type:

bool

Examples

>>> looks_like_openai_reasoning_model("openai/gpt-5")
True
>>> looks_like_openai_reasoning_model("openai/gpt-4o-mini")
False
deepretro.utils.llm_helpers.looks_like_anthropic_reasoning_model(model)[source]

Return whether a model is an Anthropic reasoning-capable model.

Parameters:

model (str) – Model identifier to inspect.

Returns:

True for Anthropic reasoning-model families.

Return type:

bool

Examples

>>> looks_like_anthropic_reasoning_model("anthropic/claude-sonnet-4-6")
True
>>> looks_like_anthropic_reasoning_model("claude-3-5-haiku-20241022")
False
deepretro.utils.llm_helpers.infer_provider(model)[source]

Infer the provider from a model identifier.

Parameters:

model (str) – Model identifier to classify.

Returns:

Inferred provider name.

Return type:

ProviderName

Examples

>>> infer_provider("openai/gpt-4o-mini")
'openai'
>>> infer_provider("claude-opus-4-6")
'anthropic'
deepretro.utils.llm_helpers.normalize_completion_model(model, provider)[source]

Normalize provider aliases before a LiteLLM completion call.

Parameters:
  • model (str) – Model identifier supplied by a caller.

  • provider (ProviderName) – Provider inferred for the model.

Returns:

LiteLLM-compatible model identifier.

Return type:

str

Examples

>>> normalize_completion_model("openai/gpt-4o-mini", "openai")
'openai/gpt-4o-mini'
>>> normalize_completion_model("fireworks/deepseek-v3p2", "deepseek")
'fireworks_ai/accounts/fireworks/models/deepseek-r1'
deepretro.utils.llm_helpers.resolve_model_selection(model, prompt_mode=None)[source]

Resolve provider, prompt, and capability settings for a model.

Parameters:
  • model (str) – Model identifier, optionally suffixed with :adv.

  • prompt_mode ({"standard", "advanced"}, optional) – Explicit prompt-mode override.

Returns:

Normalized model configuration.

Return type:

ModelSelection

Examples

>>> selection = resolve_model_selection("openai/gpt-4o-mini:adv")
>>> (selection.provider, selection.prompt_mode)
('openai', 'advanced')
deepretro.utils.llm_helpers.resolve_output_token_limit(selection, max_output_tokens, enable_thinking)[source]

Return a provider-safe output token budget.

Parameters:
  • selection (ModelSelection) – Normalized model configuration.

  • max_output_tokens (int) – Requested output token limit.

  • enable_thinking (bool) – Whether reasoning controls are enabled.

Returns:

Token limit adjusted for reasoning-capable models when needed.

Return type:

int

Examples

>>> selection = resolve_model_selection("openai/gpt-5")
>>> resolve_output_token_limit(selection, 32, enable_thinking=True)
8192
deepretro.utils.llm_helpers.build_completion_params(model, messages, max_completion_tokens, temperature, enable_thinking=True, thinking_effort='medium', metadata=None)[source]

Assemble provider-aware keyword arguments for litellm.completion.

Parameters:
  • model (str) – Model identifier, optionally with a LiteLLM provider prefix.

  • messages (list[ChatMessage]) – Conversation to send to the model.

  • max_completion_tokens (int) – Requested maximum output tokens.

  • temperature (float) – Sampling temperature for providers that allow it. OpenAI reasoning models and Anthropic Claude 4 reasoning-capable models are sent temperature=1 when reasoning controls are enabled.

  • enable_thinking (bool, optional) – Whether to send reasoning controls for supported models.

  • thinking_effort ({"low", "medium", "high", "max"}, optional) – Reasoning effort sent to supported models.

  • metadata (dict[str, Any], optional) – Optional LiteLLM metadata for callbacks.

Returns:

Keyword arguments for litellm.completion.

Return type:

dict[str, Any]

Examples

>>> messages = [{"role": "user", "content": "Reply OK"}]
>>> params = build_completion_params("openai/gpt-4o-mini", messages, 16, 0.0)
>>> (params["model"], params["max_completion_tokens"], params["seed"])
('openai/gpt-4o-mini', 16, 42)
>>> params = build_completion_params("anthropic/claude-sonnet-4-6", messages, 16, 0.2)
>>> (params["max_tokens"], params["temperature"], params["reasoning_effort"])
(8192, 1, 'medium')
deepretro.utils.llm_helpers.coerce_response_text(content)[source]

Convert LiteLLM response content to a plain string.

Parameters:

content (Any) – Response content returned by a LiteLLM message.

Returns:

Plain text response content.

Return type:

str

Examples

>>> coerce_response_text("OK")
'OK'
>>> coerce_response_text([{"text": "O"}, {"text": "K"}])
'OK'
deepretro.utils.llm_helpers.strip_code_fences(text)[source]

Remove one surrounding Markdown code fence from a payload.

Parameters:

text (str) – Text that may be wrapped in a Markdown code fence.

Returns:

Unwrapped text when a fence is present, otherwise stripped input text.

Return type:

str

Examples

>>> strip_code_fences('```json\n{"data": []}\n```')
'{"data": []}'
>>> strip_code_fences('{"data": []}')
'{"data": []}'
deepretro.utils.llm_helpers.extract_tag_content(text, tag)[source]

Extract content between matching XML-like tags.

Parameters:
  • text (str) – Text containing optional XML-like tags.

  • tag (str) – Tag name to extract.

Returns:

Tag body when found, otherwise None.

Return type:

str | None

Examples

>>> extract_tag_content("<json>{}</json>", "json")
'{}'
>>> extract_tag_content("missing", "json") is None
True
deepretro.utils.llm_helpers.extract_json_payload(response_text)[source]

Extract tagged, fenced, or raw JSON-like payload from model text.

Parameters:

response_text (str) – Raw model response text.

Returns:

Extracted JSON-like payload, or None when no payload is found.

Return type:

str | None

Examples

>>> extract_json_payload('<json>{"data": []}</json>')
'{"data": []}'
>>> extract_json_payload('No JSON here') is None
True
deepretro.utils.llm_helpers.is_enabled(flag)[source]

Normalize string and boolean feature flags.

Note:

This function is added for backwards compatibility, ideally we should not have this as type safety is altered with this, we should remove this once we make sure type safety is maintained.

param flag:

Feature flag value.

type flag:

str or bool

returns:

Normalized boolean value.

rtype:

bool

Examples

>>> is_enabled("True")
True
>>> is_enabled(False)
False
Parameters:

flag (str | bool)

Return type:

bool