refactor: ML baseline into src-layout package with DQN online training#87
Merged
refactor: ML baseline into src-layout package with DQN online training#87
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors the demos/ml_baseline ML baseline into an installable src/-layout Python package (riichienv-ml) and updates the RL training stack to DQN(+CQL) with Ray-based self-play workers, plus adds a regression test for MJAI riichi reach→dahai sequencing.
Changes:
- Convert the baseline demo into a
riichienv-mlpackage with YAML+pydantic config andimportlib-based dynamic class loading. - Replace the prior online AWAC flow with DQN(+CQL) online training using Ray workers and add new encoders/datasets for feature ablations.
- Add/refresh training scripts/docs and add a riichi action-sequence validation test.
Reviewed changes
Copilot reviewed 27 out of 35 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
tests/env/rule_validation/test_riichi_sequence.py |
Adds regression tests for MJAI reach→dahai behavior and riichi-stage legal action handling. |
demos/ml_baseline/uv.lock |
Updates lockfile for renamed package and platform markers for CUDA-related deps. |
demos/ml_baseline/unified_model.py |
Removes legacy unified actor/critic model implementation. |
demos/ml_baseline/train_online.py |
Removes legacy online training entry point script. |
demos/ml_baseline/src/riichienv_ml/utils.py |
Introduces shared utility AverageMeter. |
demos/ml_baseline/src/riichienv_ml/training/ray_actor.py |
Adds Ray self-play worker for collecting transitions with configurable exploration. |
demos/ml_baseline/src/riichienv_ml/training/online_trainer.py |
Adds online DQN(+CQL) training loop coordinating Ray workers and evaluation. |
demos/ml_baseline/src/riichienv_ml/training/learner.py |
Adds DQN(+CQL) learner and checkpoint compatibility loading logic. |
demos/ml_baseline/src/riichienv_ml/training/grp_trainer.py |
Refactors GRP trainer to package imports and configurable hyperparams/output path. |
demos/ml_baseline/src/riichienv_ml/training/cql_trainer.py |
Refactors offline CQL trainer to config-driven dataset/model loading and adds grad clipping. |
demos/ml_baseline/src/riichienv_ml/training/buffer.py |
Simplifies replay buffer to a single TensorDict replay buffer. |
demos/ml_baseline/src/riichienv_ml/training/__init__.py |
Declares training package. |
demos/ml_baseline/src/riichienv_ml/models/cql_model.py |
Adds ResNet+SE backbone and Q-network for offline/online training. |
demos/ml_baseline/src/riichienv_ml/models/grp_model.py |
Adds GRP rank model and reward predictor used for reward shaping. |
demos/ml_baseline/src/riichienv_ml/models/mortal_model.py |
Adds Mortal model wrapper/engine code for integration/experimentation. |
demos/ml_baseline/src/riichienv_ml/models/__init__.py |
Declares models package. |
demos/ml_baseline/src/riichienv_ml/data/grp_dataset.py |
Adds GRP dataset for rank prediction training. |
demos/ml_baseline/src/riichienv_ml/data/cql_dataset.py |
Adds offline replay parsing dataset and new observation encoders/datasets for ablations. |
demos/ml_baseline/src/riichienv_ml/data/__init__.py |
Declares data package. |
demos/ml_baseline/src/riichienv_ml/config.py |
Adds pydantic config models, YAML loader, and dynamic class import helper. |
demos/ml_baseline/src/riichienv_ml/__init__.py |
Declares riichienv_ml package. |
demos/ml_baseline/scripts/train_online.py |
Adds config-driven CLI entry point for online training. |
demos/ml_baseline/scripts/train_grp.py |
Adds config-driven CLI entry point for GRP training. |
demos/ml_baseline/scripts/train_cql.py |
Adds config-driven CLI entry point for offline CQL training. |
demos/ml_baseline/configs/baseline.yml |
Adds baseline YAML configuration for 74ch setup. |
demos/ml_baseline/configs/discard_history.yml |
Adds 78ch discard-history ablation YAML configuration. |
demos/ml_baseline/configs/discard_history_shanten.yml |
Adds 94ch discard-history+shanten ablation YAML configuration. |
demos/ml_baseline/ray_actor.py |
Removes legacy Ray worker implementation. |
demos/ml_baseline/learner.py |
Removes legacy AWAC learner implementation. |
demos/ml_baseline/cql_model.py |
Removes legacy CQL model implementation. |
demos/ml_baseline/cql_dataset.py |
Removes legacy offline dataset implementation. |
demos/ml_baseline/actor_model.py |
Removes legacy actor-only network implementation. |
demos/ml_baseline/pyproject.toml |
Renames project to riichienv-ml, adds build-system + src-layout packaging config, and adds new deps. |
demos/ml_baseline/README.md |
Updates documentation to new package structure and config-driven scripts. |
demos/ml_baseline/.gitignore |
Ignores checkpoints, wandb runs, venv, caches, and egg-info outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…assertion message in Riichi sequence test
…rmance and stability
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
demos/ml_baselinefrom flat scripts into a proper Python package withsrc/riichienv_ml/layout, enabling clean imports andpip install -e .importlib-based dynamic loadingDiscardHistoryEncoder(78ch) andDiscardHistoryShantenEncoder(94ch) alongside the baseline (74ch)