Skip to content

[Shared Utils & Models] Implement tabular Q approximator with Numba acceleration #26

@saviornt

Description

@saviornt

Description

Create the core tabular Q-function implementation using NumPy arrays + Numba @njit for hot paths. Include Bellman update, ε-greedy action selection, and epsilon decay logic. Keep everything pure (no I/O) so it's reusable across edge and central.

Why: Tabular is the fastest MVP path for small discrete spaces and allows quick validation of the math.

Type

  • Feature

Focus Area (pick one)

  • Shared Utils & Models

Priority

  • Critical

Acceptance Criteria

  • TabularQ class with Numba-accelerated update and select_action methods
  • Pure functions in core.py: update_q_value, select_action_greedy, decay_epsilon
  • Exponential epsilon decay configurable via config
  • Convergence test passes on a simple discrete toy env (e.g. 16-state grid world or frozen-lake-like)
  • >90% test coverage for core logic and approximator
  • Google-style docstrings with types, params, returns, and small examples
  • MyPy strict passes

Blocker / Dependencies

  • [Shared Utils & Models] Create Q-Learning config & types (Pydantic v2)

Notes / Links

  • Target: shared/src/learning/q_learning/approximators/tabular.py
  • Use @njit on update and selection; cache properties where sensible

Metadata

Metadata

Assignees

Labels

needs-triageNew issue that hasn't been reviewed/prioritized yettaskGeneral work item (implementation, setup, cleanup) – most common label

Projects

Status

Manual QA Testing

Relationships

None yet

Development

No branches or pull requests

Issue actions