Skip to content

[FEATURE] Stateful Feature Storage and Iterative Update Logic for Incremental Learning #9

@GongJr0

Description

@GongJr0

Feature Details

Design and implement a stateful feature store that (a) caches per-ticker feature artifacts (lags, masks, embeddings indices, scalers’ states, selected lags, etc.), and (b) supports incremental updates as new observations arrive, without recomputing the full history. The store should be durable (disk-backed), versioned, and safe to read concurrently during training.

Core goals:

  • $O(\Delta t)$ append-only updates for rolling windows and statistics.
  • Deterministic, reproducible snapshots for training/checkpoint alignment.
  • Clean invalidation when upstream raw data changes (cache busting via checksum/version).

Affected Modules

As stated in the parent issue.

Implementation Checklist

  • Define per-ticker state payload (e.g., latest window buffers, masks, selected lag set, scaler states ($P^2$ / $IQR$ / $\bar{X}$ / $\sigma^2$), embedding ID maps, last-ingested timestamp).
  • Add a version field and a content hash (e.g., SHA256 of raw source metadata & feature config) for invalidation.
  • Implement disk-backed store
  • Rolling window maintenance
  • Incremental scalers
  • Lag selection refresh cadence
  • Detect upstream raw-data changes
  • Unit tests:
    • Cold start → fit state → update with Δt > 0 → outputs match full recompute on the same window.
    • Concurrency: writer during reader; readers see consistent snapshots.
    • Invalidation: modify raw data → state invalidates and recomputes.
    • Serialization parity across platforms (endian, dtype, version).

Limitations

As stated in the parent issue.

Metadata

Metadata

Assignees

Labels

featureImplementation tracking for approved features

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions