-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
needs-triageNew issue that hasn't been reviewed/prioritized yetNew issue that hasn't been reviewed/prioritized yettaskGeneral work item (implementation, setup, cleanup) – most common labelGeneral work item (implementation, setup, cleanup) – most common label
Milestone
Description
Description
Create the core tabular Q-function implementation using NumPy arrays + Numba @njit for hot paths. Include Bellman update, ε-greedy action selection, and epsilon decay logic. Keep everything pure (no I/O) so it's reusable across edge and central.
Why: Tabular is the fastest MVP path for small discrete spaces and allows quick validation of the math.
Type
- Feature
Focus Area (pick one)
- Shared Utils & Models
Priority
- Critical
Acceptance Criteria
-
TabularQclass with Numba-accelerated update and select_action methods - Pure functions in core.py: update_q_value, select_action_greedy, decay_epsilon
- Exponential epsilon decay configurable via config
- Convergence test passes on a simple discrete toy env (e.g. 16-state grid world or frozen-lake-like)
- >90% test coverage for core logic and approximator
- Google-style docstrings with types, params, returns, and small examples
- MyPy strict passes
Blocker / Dependencies
- [Shared Utils & Models] Create Q-Learning config & types (Pydantic v2)
Notes / Links
- Target: shared/src/learning/q_learning/approximators/tabular.py
- Use @njit on update and selection; cache properties where sensible
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
needs-triageNew issue that hasn't been reviewed/prioritized yetNew issue that hasn't been reviewed/prioritized yettaskGeneral work item (implementation, setup, cleanup) – most common labelGeneral work item (implementation, setup, cleanup) – most common label
Projects
Status
Manual QA Testing