Skip to content

refactor: ML baseline into src-layout package with DQN online training#87

Merged
smly merged 11 commits intomainfrom
refactor/ml-demo-src-layout
Feb 8, 2026
Merged

refactor: ML baseline into src-layout package with DQN online training#87
smly merged 11 commits intomainfrom
refactor/ml-demo-src-layout

Conversation

@smly
Copy link
Owner

@smly smly commented Feb 7, 2026

Summary

  • Restructure demos/ml_baseline from flat scripts into a proper Python package with src/riichienv_ml/ layout, enabling clean imports and pip install -e .
  • Replace Actor-Critic (AWAC) online training with DQN + CQL, using Ray distributed workers for self-play data collection
  • Add configurable model/dataset/encoder classes via YAML configs and importlib-based dynamic loading
  • Add feature ablation datasets: DiscardHistoryEncoder (78ch) and DiscardHistoryShantenEncoder (94ch) alongside the baseline (74ch)
  • Add ChannelAttention (SE-block) to ResNet backbone for improved feature weighting
  • Add Boltzmann (top-p nucleus sampling) exploration as configurable alternative to epsilon-greedy
  • Add riichi declaration sequence validation test

@smly smly self-assigned this Feb 7, 2026
@smly smly added the enhancement New feature or request label Feb 7, 2026
@smly smly added this to the v0.3.0 milestone Feb 7, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the demos/ml_baseline ML baseline into an installable src/-layout Python package (riichienv-ml) and updates the RL training stack to DQN(+CQL) with Ray-based self-play workers, plus adds a regression test for MJAI riichi reach→dahai sequencing.

Changes:

  • Convert the baseline demo into a riichienv-ml package with YAML+pydantic config and importlib-based dynamic class loading.
  • Replace the prior online AWAC flow with DQN(+CQL) online training using Ray workers and add new encoders/datasets for feature ablations.
  • Add/refresh training scripts/docs and add a riichi action-sequence validation test.

Reviewed changes

Copilot reviewed 27 out of 35 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/env/rule_validation/test_riichi_sequence.py Adds regression tests for MJAI reach→dahai behavior and riichi-stage legal action handling.
demos/ml_baseline/uv.lock Updates lockfile for renamed package and platform markers for CUDA-related deps.
demos/ml_baseline/unified_model.py Removes legacy unified actor/critic model implementation.
demos/ml_baseline/train_online.py Removes legacy online training entry point script.
demos/ml_baseline/src/riichienv_ml/utils.py Introduces shared utility AverageMeter.
demos/ml_baseline/src/riichienv_ml/training/ray_actor.py Adds Ray self-play worker for collecting transitions with configurable exploration.
demos/ml_baseline/src/riichienv_ml/training/online_trainer.py Adds online DQN(+CQL) training loop coordinating Ray workers and evaluation.
demos/ml_baseline/src/riichienv_ml/training/learner.py Adds DQN(+CQL) learner and checkpoint compatibility loading logic.
demos/ml_baseline/src/riichienv_ml/training/grp_trainer.py Refactors GRP trainer to package imports and configurable hyperparams/output path.
demos/ml_baseline/src/riichienv_ml/training/cql_trainer.py Refactors offline CQL trainer to config-driven dataset/model loading and adds grad clipping.
demos/ml_baseline/src/riichienv_ml/training/buffer.py Simplifies replay buffer to a single TensorDict replay buffer.
demos/ml_baseline/src/riichienv_ml/training/__init__.py Declares training package.
demos/ml_baseline/src/riichienv_ml/models/cql_model.py Adds ResNet+SE backbone and Q-network for offline/online training.
demos/ml_baseline/src/riichienv_ml/models/grp_model.py Adds GRP rank model and reward predictor used for reward shaping.
demos/ml_baseline/src/riichienv_ml/models/mortal_model.py Adds Mortal model wrapper/engine code for integration/experimentation.
demos/ml_baseline/src/riichienv_ml/models/__init__.py Declares models package.
demos/ml_baseline/src/riichienv_ml/data/grp_dataset.py Adds GRP dataset for rank prediction training.
demos/ml_baseline/src/riichienv_ml/data/cql_dataset.py Adds offline replay parsing dataset and new observation encoders/datasets for ablations.
demos/ml_baseline/src/riichienv_ml/data/__init__.py Declares data package.
demos/ml_baseline/src/riichienv_ml/config.py Adds pydantic config models, YAML loader, and dynamic class import helper.
demos/ml_baseline/src/riichienv_ml/__init__.py Declares riichienv_ml package.
demos/ml_baseline/scripts/train_online.py Adds config-driven CLI entry point for online training.
demos/ml_baseline/scripts/train_grp.py Adds config-driven CLI entry point for GRP training.
demos/ml_baseline/scripts/train_cql.py Adds config-driven CLI entry point for offline CQL training.
demos/ml_baseline/configs/baseline.yml Adds baseline YAML configuration for 74ch setup.
demos/ml_baseline/configs/discard_history.yml Adds 78ch discard-history ablation YAML configuration.
demos/ml_baseline/configs/discard_history_shanten.yml Adds 94ch discard-history+shanten ablation YAML configuration.
demos/ml_baseline/ray_actor.py Removes legacy Ray worker implementation.
demos/ml_baseline/learner.py Removes legacy AWAC learner implementation.
demos/ml_baseline/cql_model.py Removes legacy CQL model implementation.
demos/ml_baseline/cql_dataset.py Removes legacy offline dataset implementation.
demos/ml_baseline/actor_model.py Removes legacy actor-only network implementation.
demos/ml_baseline/pyproject.toml Renames project to riichienv-ml, adds build-system + src-layout packaging config, and adds new deps.
demos/ml_baseline/README.md Updates documentation to new package structure and config-driven scripts.
demos/ml_baseline/.gitignore Ignores checkpoints, wandb runs, venv, caches, and egg-info outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@smly smly merged commit ae6e65f into main Feb 8, 2026
4 checks passed
@smly smly deleted the refactor/ml-demo-src-layout branch February 8, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant