deepretro.utils.cache

Explicit in-memory caching utilities for expensive library operations.

Overview

The cache module provides:

CacheEntry: public record describing a cached value, its expiry deadline, and optional eviction tag
CacheManager: process-local cache with tag support, TTL, and statistics
make_args_hash(): deterministic argument hashing used by cache keys
make_cache_key(): deterministic cache keys for explicit cache lookups

The cache only lives for the current Python process. If the process exits, the cached values are discarded. CacheManager uses plain Python data structures without locking, so one instance should not be shared across threads unless the caller adds synchronization.

Key Format

make_cache_key() returns keys in the form v<version>:<namespace>:<64-char-sha256>. For example:

from deepretro.utils.cache import make_args_hash, make_cache_key

print(make_args_hash("CCO", az_model="USPTO"))
# 6ad01e27a3a319962ad084787e060ab0fa0e661cc7d3e018e96747b06f7bacf7

print(make_cache_key("run_az", "CCO", az_model="USPTO", version=1))
# v1:run_az:6ad01e27a3a319962ad084787e060ab0fa0e661cc7d3e018e96747b06f7bacf7

Algorithm Notes

make_args_hash() serializes args and kwargs into a stable payload, tries JSON first for common serializable values, and falls back to pickle for complex Python objects. CacheManager stores each key in _entries and maintains a secondary tag -> set[key] mapping so one tag can evict multiple keys with evict_tag().

Usage

Create and pass cache objects explicitly:

from deepretro.utils.cache import CacheManager, make_cache_key

cache = CacheManager()
key = make_cache_key("call_llm", "CCO", model="claude-opus-4-6", version=1)
cache_miss = object()
result = cache.get(key, default=cache_miss)

if result is cache_miss:
    result = {"molecule": "CCO", "model": "claude-opus-4-6"}
    cache.set(key, result, expire=3600, tag="CCO")

# Evict all entries for a molecule from this cache instance
cache.evict_tag("CCO")

# Inspect helper methods directly during tests or diagnostics
cache.purge_expired_entries()
print(cache.estimate_size_bytes())

# Clear this cache instance
cache.clear()

# Inspect statistics
stats = cache.stats()
print(stats.hits, stats.misses, stats.num_entries)

Tag Semantics

A tag is an arbitrary group label attached to one or more cache keys when calling cache.set(..., tag="..."). Tags are useful when many cached values should be invalidated together, such as all results derived from one molecule or all outputs from one model configuration.

API

deepretro.utils.cache.make_args_hash(*args, **kwargs)[source]

Generate a deterministic hash of arguments for cache keying.

Tries JSON first for common types; falls back to pickle for complex objects.

Examples

>>> make_args_hash("CCO", az_model="USPTO")
'6ad01e27a3a319962ad084787e060ab0fa0e661cc7d3e018e96747b06f7bacf7'

Parameters:

args (Any)
kwargs (Any)

Return type:

str

deepretro.utils.cache.make_cache_key(namespace, *args, version=1, **kwargs)[source]

Build a deterministic cache key for a namespaced operation.

Parameters:

namespace (str) – Stable operation name, such as "run_az".
*args (Any) – Positional arguments that affect the cached result.
version (int, optional) – Cache version. Bump when behavior changes and old entries should be invalidated, by default 1.
**kwargs (Any) – Keyword arguments that affect the cached result.

Returns:

A deterministic key suitable for CacheManager.get and set.

Return type:

str

Examples

>>> make_cache_key("run_az", "CCO", az_model="USPTO", version=1)
'v1:run_az:6ad01e27a3a319962ad084787e060ab0fa0e661cc7d3e018e96747b06f7bacf7'

class deepretro.utils.cache.CacheEntry(value, expires_at, tag)[source]

Single in-memory cache entry.

Each entry stores a cached payload, an optional expiry deadline measured with time.monotonic(), and an optional tag used for group invalidation. Tags let callers associate multiple cache keys with the same logical input such as one molecule, model configuration, or request family.

Parameters:

value (Any)
expires_at (float | None)
tag (str | None)

value

Cached payload returned by CacheManager.get.

Type:: Any

expires_at

time.monotonic() deadline when the key becomes stale. None means the entry does not expire automatically.

Type:: float | None

tag

Optional group label attached when calling cache.set(..., tag=...). All keys written with the same tag can be removed together with CacheManager.evict_tag, which is useful when multiple cached values should be invalidated as one group.

Type:: str | None

class deepretro.utils.cache.CacheStats(hits, misses, size_bytes, num_entries)[source]

Snapshot of live cache statistics returned by CacheManager.stats().

The reported values describe the cache after expired entries have been purged. They are intended for diagnostics and monitoring rather than exact process-memory accounting.

Parameters:

hits (int)
misses (int)
size_bytes (int)
num_entries (int)

hits

Number of successful CacheManager.get lookups.

Type:: int

misses

Number of failed CacheManager.get lookups, including expired keys.

Type:: int

size_bytes

Shallow approximation of the live cache footprint in bytes. The estimate includes the top-level entry and tag dictionaries, their keys, and the immediate cached values, but does not traverse referenced objects recursively.

Type:: int

num_entries

Number of live entries remaining after expired values are purged. This reflects the keys that still participate in lookups and tag eviction.

Type:: int

class deepretro.utils.cache.CacheManager[source]

Process-local in-memory cache manager with tag support and TTL.

Each instance owns two in-memory indexes: _entries maps cache keys to CacheEntry objects, and _tags maps each tag to the set of keys currently carrying that tag. get removes expired keys lazily, evict_tag removes every key associated with a tag, and stats first purges expired values so the reported counts reflect live entries only.

The cache is process-local and not thread-safe. Reuse the same CacheManager instance only when callers intentionally want to share state.

Examples

>>> cache = CacheManager()
>>> key = make_cache_key("call_llm", "CCO", model="gpt-5.4", version=1)
>>> miss = object()
>>> cache.get(key, default=miss) is miss
True
>>> cache.set(key, {"molecule": "CCO"}, expire=300, tag="molecule:CCO")
>>> cache.get(key)
{'molecule': 'CCO'}
>>> cache.evict_tag("molecule:CCO")
1

__init__()[source]

Initialize an empty in-memory cache.

Return type:: None

purge_if_expired(key)[source]

Remove a key if its expiry deadline has passed.

Parameters:: key (str) – Cache key to inspect.
Returns:: True when the key existed and was removed because it had expired, otherwise False.
Return type:: bool

delete_key(key)[source]

Remove a key from the cache and tag index if present.

Parameters:: key (str) – Cache key to remove.
Returns:: True when an entry was removed, otherwise False.
Return type:: bool

purge_expired_entries()[source]

Remove every expired entry currently stored in the cache.

This is useful before inspecting cache size or exporting diagnostics.

Return type:: None

estimate_size_bytes()[source]

Return a shallow approximation of the current in-memory cache size.

The estimate includes the top-level dictionaries, keys, tag sets, and the immediate cached values. Referenced objects are not traversed recursively.

Return type:: int

get(key, default=<object object>)[source]

Retrieve a value by key.

Parameters:

key (str) – Cache key.
default (Any, optional) – Value returned when the key is not cached, by default an internal sentinel object.

Returns:

Cached value, or default if not found.

Return type:

Any

Examples

>>> cache = CacheManager()
>>> miss = object()
>>> cache.get("missing", default=miss) is miss
True

set(key, value, *, expire=None, tag=None)[source]

Store a value with optional TTL and tag.

Parameters:

key (str) – Cache key.
value (Any) – Value to store.
expire (float | None, optional) – Time-to-live in seconds. None means no expiry.
tag (str | None, optional) – Optional group label for later eviction via evict_tag. Multiple keys may share the same tag.

Return type:

None

Examples

>>> cache = CacheManager()
>>> cache.set("demo", {"smiles": "CCO"}, expire=60, tag="molecule:CCO")

evict_tag(tag)[source]

Remove all live entries with the given tag.

Parameters:: tag (str) – Group label identifying entries to remove. A single tag may be attached to multiple cache keys.
Returns:: Number of entries evicted.
Return type:: int

Examples

>>> cache = CacheManager()
>>> cache.set("a", 1, tag="batch:1")
>>> cache.set("b", 2, tag="batch:1")
>>> cache.evict_tag("batch:1")
2

clear()[source]

Remove all entries from this cache instance.

Return type:: None

stats()[source]

Return cache statistics.

Returns:: A snapshot containing hit count, miss count, shallow byte estimate, and live entry count.
Return type:: CacheStats

Examples

>>> cache = CacheManager()
>>> cache.set("demo", 1)
>>> stats = cache.stats()
>>> (stats.hits, stats.misses, stats.num_entries)
(0, 0, 1)

close()[source]

Release cache contents for compatibility with previous callers.

Return type:: None