refactor: ML baseline into src-layout package with DQN online training by smly · Pull Request #87 · smly/RiichiEnv

smly · 2026-02-07T17:35:12Z

Summary

Restructure demos/ml_baseline from flat scripts into a proper Python package with src/riichienv_ml/ layout, enabling clean imports and pip install -e .
Replace Actor-Critic (AWAC) online training with DQN + CQL, using Ray distributed workers for self-play data collection
Add configurable model/dataset/encoder classes via YAML configs and importlib-based dynamic loading
Add feature ablation datasets: DiscardHistoryEncoder (78ch) and DiscardHistoryShantenEncoder (94ch) alongside the baseline (74ch)
Add ChannelAttention (SE-block) to ResNet backbone for improved feature weighting
Add Boltzmann (top-p nucleus sampling) exploration as configurable alternative to epsilon-greedy
Add riichi declaration sequence validation test

…l.py

…ipeline

…on sequence

Copilot

Pull request overview

Refactors the demos/ml_baseline ML baseline into an installable src/-layout Python package (riichienv-ml) and updates the RL training stack to DQN(+CQL) with Ray-based self-play workers, plus adds a regression test for MJAI riichi reach→dahai sequencing.

Changes:

Convert the baseline demo into a riichienv-ml package with YAML+pydantic config and importlib-based dynamic class loading.
Replace the prior online AWAC flow with DQN(+CQL) online training using Ray workers and add new encoders/datasets for feature ablations.
Add/refresh training scripts/docs and add a riichi action-sequence validation test.

Reviewed changes

Copilot reviewed 27 out of 35 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`tests/env/rule_validation/test_riichi_sequence.py`	Adds regression tests for MJAI reach→dahai behavior and riichi-stage legal action handling.
`demos/ml_baseline/uv.lock`	Updates lockfile for renamed package and platform markers for CUDA-related deps.
`demos/ml_baseline/unified_model.py`	Removes legacy unified actor/critic model implementation.
`demos/ml_baseline/train_online.py`	Removes legacy online training entry point script.
`demos/ml_baseline/src/riichienv_ml/utils.py`	Introduces shared utility `AverageMeter`.
`demos/ml_baseline/src/riichienv_ml/training/ray_actor.py`	Adds Ray self-play worker for collecting transitions with configurable exploration.
`demos/ml_baseline/src/riichienv_ml/training/online_trainer.py`	Adds online DQN(+CQL) training loop coordinating Ray workers and evaluation.
`demos/ml_baseline/src/riichienv_ml/training/learner.py`	Adds DQN(+CQL) learner and checkpoint compatibility loading logic.
`demos/ml_baseline/src/riichienv_ml/training/grp_trainer.py`	Refactors GRP trainer to package imports and configurable hyperparams/output path.
`demos/ml_baseline/src/riichienv_ml/training/cql_trainer.py`	Refactors offline CQL trainer to config-driven dataset/model loading and adds grad clipping.
`demos/ml_baseline/src/riichienv_ml/training/buffer.py`	Simplifies replay buffer to a single TensorDict replay buffer.
`demos/ml_baseline/src/riichienv_ml/training/__init__.py`	Declares training package.
`demos/ml_baseline/src/riichienv_ml/models/cql_model.py`	Adds ResNet+SE backbone and Q-network for offline/online training.
`demos/ml_baseline/src/riichienv_ml/models/grp_model.py`	Adds GRP rank model and reward predictor used for reward shaping.
`demos/ml_baseline/src/riichienv_ml/models/mortal_model.py`	Adds Mortal model wrapper/engine code for integration/experimentation.
`demos/ml_baseline/src/riichienv_ml/models/__init__.py`	Declares models package.
`demos/ml_baseline/src/riichienv_ml/data/grp_dataset.py`	Adds GRP dataset for rank prediction training.
`demos/ml_baseline/src/riichienv_ml/data/cql_dataset.py`	Adds offline replay parsing dataset and new observation encoders/datasets for ablations.
`demos/ml_baseline/src/riichienv_ml/data/__init__.py`	Declares data package.
`demos/ml_baseline/src/riichienv_ml/config.py`	Adds pydantic config models, YAML loader, and dynamic class import helper.
`demos/ml_baseline/src/riichienv_ml/__init__.py`	Declares `riichienv_ml` package.
`demos/ml_baseline/scripts/train_online.py`	Adds config-driven CLI entry point for online training.
`demos/ml_baseline/scripts/train_grp.py`	Adds config-driven CLI entry point for GRP training.
`demos/ml_baseline/scripts/train_cql.py`	Adds config-driven CLI entry point for offline CQL training.
`demos/ml_baseline/configs/baseline.yml`	Adds baseline YAML configuration for 74ch setup.
`demos/ml_baseline/configs/discard_history.yml`	Adds 78ch discard-history ablation YAML configuration.
`demos/ml_baseline/configs/discard_history_shanten.yml`	Adds 94ch discard-history+shanten ablation YAML configuration.
`demos/ml_baseline/ray_actor.py`	Removes legacy Ray worker implementation.
`demos/ml_baseline/learner.py`	Removes legacy AWAC learner implementation.
`demos/ml_baseline/cql_model.py`	Removes legacy CQL model implementation.
`demos/ml_baseline/cql_dataset.py`	Removes legacy offline dataset implementation.
`demos/ml_baseline/actor_model.py`	Removes legacy actor-only network implementation.
`demos/ml_baseline/pyproject.toml`	Renames project to `riichienv-ml`, adds build-system + src-layout packaging config, and adds new deps.
`demos/ml_baseline/README.md`	Updates documentation to new package structure and config-driven scripts.
`demos/ml_baseline/.gitignore`	Ignores checkpoints, wandb runs, venv, caches, and egg-info outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

demos/ml_baseline/src/riichienv_ml/models/mortal_model.py

demos/ml_baseline/src/riichienv_ml/data/cql_dataset.py

tests/env/rule_validation/test_riichi_sequence.py

demos/ml_baseline/src/riichienv_ml/training/online_trainer.py

demos/ml_baseline/src/riichienv_ml/data/cql_dataset.py

demos/ml_baseline/src/riichienv_ml/models/mortal_model.py

…arameter

…an availability

…ahai sequence

…assertion message in Riichi sequence test

…ations

…rmance and stability

smly added 3 commits February 7, 2026 12:27

feat: implement ML training pipeline with src layout

b30b909

feat: add traceback import for improved error handling in mortal_mode…

2a786c6

…l.py

feat: update encoder integration and configuration for DQN training p…

6971d73

…ipeline

smly self-assigned this Feb 7, 2026

smly added the enhancement New feature or request label Feb 7, 2026

smly added this to the v0.3.0 milestone Feb 7, 2026

refactor: remove unused imports and variables in test for Riichi acti…

835b85f

…on sequence

smly requested a review from Copilot February 7, 2026 18:38

Copilot started reviewing on behalf of smly February 7, 2026 18:38 View session

Copilot AI reviewed Feb 7, 2026

View reviewed changes

smly added 7 commits February 7, 2026 18:47

fix: update _react_batch method to handle invisible_obs as optional p…

cc1c661

…arameter

feat: shard files across DataLoader workers to avoid duplicated work

24253a2

refactor: improve test documentation for riichi_stage behavior in ank…

5b5c5ca

…an availability

feat: add assertions for riichi declaration and scoring after reach+d…

79c9ba1

…ahai sequence

refactor: update working directory path for Ray workers and simplify …

f0c1a0d

…assertion message in Riichi sequence test

fix: detach final_reward tensor to ensure compatibility with CPU oper…

49a0c61

…ations

refactor: update training configuration parameters for improved perfo…

c587388

…rmance and stability

smly merged commit ae6e65f into main Feb 8, 2026
4 checks passed

smly deleted the refactor/ml-demo-src-layout branch February 8, 2026 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: ML baseline into src-layout package with DQN online training#87

refactor: ML baseline into src-layout package with DQN online training#87
smly merged 11 commits intomainfrom
refactor/ml-demo-src-layout

smly commented Feb 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smly commented Feb 7, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant