[Appliance Core] Implement dummy environment & basic Q-Learning agent

## Description

Build the initial Appliance-side Q-Learning agent and a dummy network environment so full episodes can be run locally on the Pi (before real telemetry integration).

Why: Validates agent composition with shared core and allows early loop testing.

## Type

- [x] Feature

## Focus Area (pick one)

- [x] Appliance Core (Pi edge)

## Priority

- [x] Critical

## Acceptance Criteria

- [ ] `DummyNetworkEnv` with discrete state/action space and basic reward function
- [ ] `ApplianceQAgent` composes shared TabularQ + replay buffer + config
- [ ] Runs 5000+ steps/episodes end-to-end without crashing
- [ ] Logs episode stats (total reward, length, avg reward)
- [ ] Uses only shared types/config
- [ ] Simple test script shows reward trend improvement over episodes

## Blocker / Dependencies

- Phase 1 shared core issues

## Notes / Links

- Target files: appliance/src/learning/q_learning/agent.py and dummy_env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Appliance Core] Implement dummy environment & basic Q-Learning agent #28

Description

Type

Focus Area (pick one)

Priority

Acceptance Criteria

Blocker / Dependencies

Notes / Links

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Appliance Core] Implement dummy environment & basic Q-Learning agent #28

Description

Description

Type

Focus Area (pick one)

Priority

Acceptance Criteria

Blocker / Dependencies

Notes / Links

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions