feat: improve ml baseline and native environment interaction by smly · Pull Request #84 · smly/RiichiEnv

smly · 2026-02-03T17:36:38Z

Resolves #34

Updated the ML baseline components (buffer.py, cql_dataset.py, cql_model.py, learner.py, ray_actor.py, train_online.py, unified_model.py) in demos/ml_baseline for enhanced performance and functionality.
Modified native observation, state handling, and Yaku checking logic in native/src to support improved data for ML models.
Revised docs/FEATURE_ENCODING.md to reflect changes in feature representation for the ML models.
Synchronized Python type stub src/riichienv/_riichienv.pyi with native API updates.

…dd bankruptcy check in game state

Copilot

Pull request overview

This PR implements comprehensive feature extraction APIs for Mahjong domain features to support ML model training, addressing issue #34. The changes span both native Rust code and Python ML baseline components, introducing new observation encoding methods, yaku (winning hand) possibility checking, shanten calculation, and state tracking for riichi discards and hand discards.

Changes:

Added extensive feature encoding methods to observation (74-channel standard encoding plus specialized encodings for yaku possibility, discard history decay, shanten efficiency, etc.)
Implemented rule-based yaku possibility checker with 21 yaku types
Added shanten calculation and tile efficiency features
Enhanced game state tracking with riichi_sutehais and last_tedashis fields
Migrated ML training algorithm from PPO to AWAC (off-policy actor-critic)
Updated documentation to reflect all 74 encoding channels and alternative encoding methods

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
native/src/yaku_checker.rs	New module for rule-based yaku possibility detection (21 yaku types)
native/src/shanten.rs	New module for shanten calculation and tile efficiency features
native/src/observation.rs	Added 9 new encoding methods (encode_yaku_possibility, encode_discard_history_decay, etc.) and expanded standard encoding to 74 channels
native/src/state/mod.rs	Added riichi_sutehais and last_tedashis tracking fields
native/src/state/legal_actions.rs	Modified ankan logic for post-riichi scenarios
native/src/lib.rs	Registered new shanten and yaku_checker modules
native/src/tests.rs	Added tobi (bankruptcy) game ending tests
src/riichienv/_riichienv.pyi	Added type hints for new encoding methods
docs/FEATURE_ENCODING.md	Comprehensive documentation of all 74 channels plus alternative encodings
demos/ml_baseline/unified_model.py	Maintained backward compatibility with legacy 46-channel encoding
demos/ml_baseline/train_online.py	Migrated to AWAC algorithm with unified buffer
demos/ml_baseline/ray_actor.py	Updated workers to use legacy encoding
demos/ml_baseline/learner.py	Implemented AWAC with dynamic CQL alpha scheduling
demos/ml_baseline/cql_model.py	Expanded Q-Network to support 110-channel spatial + 3025 non-spatial features
demos/ml_baseline/cql_dataset.py	Added encode_legacy method for backward compatibility
demos/ml_baseline/buffer.py	Switched from prioritized to standard replay buffer
tests/env/test_honba_reset.py	Updated test hands to use correct tile IDs and added tobi prevention
tests/env/actions/test_action_to_mjai.py	Added new observation constructor parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/env/test_honba_reset.py

native/src/yaku_checker.rs

native/src/observation.rs

native/src/yaku_checker.rs

Copilot · 2026-02-06T00:54:09Z

native/src/state/legal_actions.rs

+                    if let Some(t) = self.drawn_tile {
+                        let t34 = t / 4;
+                        if counts[t34 as usize] == 4 {
+                            // Check waits
+                            let mut hand_pre = self.players[pid_us].hand.clone();
+                            if let Some(pos) = hand_pre.iter().position(|&x| x == t) {
+                                hand_pre.remove(pos);
+                            }
+                            let calc_pre = crate::agari_calculator::AgariCalculator::new(
+                                hand_pre,
+                                self.players[pid_us].melds.clone(),
+                            );
+                            let mut waits_pre = calc_pre.get_waits();
+                            waits_pre.sort();

-                        let mut hand_post = self.players[pid_us].hand.clone();
-                        hand_post.retain(|&x| x / 4 != t34);
-                        let mut melds_post = self.players[pid_us].melds.clone();
-                        let lowest = t34 * 4;
-                        melds_post.push(Meld::new(
-                            MeldType::Angang,
-                            vec![lowest, lowest + 1, lowest + 2, lowest + 3],
-                            false,
-                            -1,
-                        ));
-                        let calc_post =
-                            crate::agari_calculator::AgariCalculator::new(hand_post, melds_post);
-                        let mut waits_post = calc_post.get_waits();
-                        waits_post.sort();
+                            let mut hand_post = self.players[pid_us].hand.clone();
+                            hand_post.retain(|&x| x / 4 != t34);
+                            let mut melds_post = self.players[pid_us].melds.clone();
+                            let lowest = t34 * 4;
+                            melds_post.push(Meld::new(
+                                MeldType::Angang,
+                                vec![lowest, lowest + 1, lowest + 2, lowest + 3],
+                                false,
+                                -1,
+                            ));
+                            let calc_post = crate::agari_calculator::AgariCalculator::new(
+                                hand_post, melds_post,
+                            );
+                            let mut waits_post = calc_post.get_waits();
+                            waits_post.sort();


The Ankan after riichi logic has a subtle bug. Line 154 removes one copy of the drawn tile from hand_pre to simulate discarding it, but then the waits calculation should be done on the 13-tile hand (before drawing). However, hand_post on line 164 retains all tiles except those used in the kan, which would be a 10-tile hand (14 - 4). This asymmetry in the comparison may lead to incorrect wait calculation.

The correct approach: Since the player has 14 tiles (13 + drawn), for the "pre-kan" state they should check waits with the 13-tile hand (excluding the tile they would discard). For the "post-kan" state, they remove all 4 tiles for kan and check waits on the remaining hand. Both should be calculated from the same baseline state for a fair comparison.

@copilot Looking at the code:

self.players[pid_us].hand contains 14 tiles (13 + drawn tile)

counts[t34 as usize] == 4 means all 4 copies of tile type t34 are in hand

drawn_tile is one of those 4 tiles

hand_pre calculation:

Starts with 14-tile hand

Removes ONE instance of t (the drawn tile)

Results in 13 tiles, but still contains 3 copies of tile type t34

Waits are calculated on this 13-tile hand

hand_post calculation:

Starts with 14-tile hand

Removes ALL tiles where x / 4 == t34 (all 4 copies)

Results in 10 tiles

Adds ankan meld (4 tiles as a meld)

Waits are calculated on 10-tile hand + 1 meld

The comparison is actually correct because:

Pre-kan scenario: Player discards the drawn tile → 13-tile hand with 3 of the same tile remaining

Post-kan scenario: Player declares ankan → 10-tile hand + ankan meld

Both represent valid game states after the player's action. The key insight is that in the pre-kan case, the 3 remaining tiles of type t34 are still in the closed hand and contribute to wait calculation. In the post-kan case, those 4 tiles are now in a meld and don't contribute to the closed hand waits.

The logic appears correct for the standard Riichi Mahjong rule: ankan after riichi is only allowed if it doesn't change the waits.

demos/ml_baseline/buffer.py

demos/ml_baseline/unified_model.py

… comments and imports

…ultiple files

…tion.rs

…cker.rs

…in learner and buffer

Copilot · 2026-02-06T01:46:10Z

@smly I've opened a new pull request, #86, to work on those changes. Once the pull request is ready, I'll request review from you.

feat: refine native observation, state, and yaku checking logic

bc134df

smly self-assigned this Feb 3, 2026

smly added the enhancement New feature or request label Feb 3, 2026

smly added this to the v0.3.0 milestone Feb 3, 2026

smly added 4 commits February 4, 2026 18:58

feat: enhance weight loading logic for UnifiedNetwork and QNetwork, a…

4494c9d

…dd bankruptcy check in game state

feat: implement shanten number calculation and related functions

f5d81cc

refactor: apply cargo fmt

ffa7d02

feat: remove commented-out fields in GameState

758a53b

smly requested a review from Copilot February 6, 2026 00:47

Copilot started reviewing on behalf of smly February 6, 2026 00:47 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

smly added 6 commits February 6, 2026 00:56

feat: clean up buffer.py and unified_model.py by removing unnecessary…

b26cb67

… comments and imports

feat: remove unnecessary comments and debug print statements across m…

1f32c97

…ultiple files

feat: update tile encoding comment for clarity in observation.rs

8ca8461

feat: update Chiitoitsu yaku comment and logic for clarity in observa…

30461b5

…tion.rs

feat: fix tile type checks for yaku possibility functions in yaku_che…

1ea1dac

…cker.rs

feat: simplify update_critic method and remove unused priority logic …

e93fab2

…in learner and buffer

Copilot AI mentioned this pull request Feb 6, 2026

Clarify Ankan wait calculation logic - no code changes needed #86

Closed

smly merged commit a1e3d55 into main Feb 6, 2026
4 checks passed

smly deleted the feat/add-feature-encodes branch February 6, 2026 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve ml baseline and native environment interaction#84

feat: improve ml baseline and native environment interaction#84
smly merged 11 commits intomainfrom
feat/add-feature-encodes

smly commented Feb 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

smly Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smly commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

smly Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smly commented Feb 3, 2026 •

edited

Loading