-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description
The test TestPlanningTripleDataset::test_split_independence in tests/data/test_planning_dataset.py is flaky due to random seed sensitivity in dataset splitting.
Error
FAILED tests/data/test_planning_dataset.py::TestPlanningTripleDataset::test_split_independence
AssertionError: Train ratio 0.5711529184756392 not ~0.7
assert 0.65 < 0.5711529184756392
Root Cause
The test expects exact split ratios (train ~0.7, val ~0.15, test ~0.15) but random sampling can cause variance. The test uses:
assert 0.65 < train_ratio < 0.75, f"Train ratio {train_ratio} not ~0.7"However, with small dataset sizes or insufficient random seeding, the actual ratio can fall outside this range.
Impact
- Severity: Low
- Scope: Test infrastructure only
- User Impact: None - does not affect production code
- Merge Impact: Does not block merges (24/25 tests pass)
Discovered In
- PR: Phase 1b.1: Sync dataset-planning with main infrastructure #18 (Phase 1b.1: Sync dataset-planning with main infrastructure)
- Context: Branch merge validation testing
- Test Command:
pytest tests/data/test_planning_dataset.py -v
Suggested Fixes
-
Widen tolerance: Change assertions to allow more variance
assert 0.60 < train_ratio < 0.80, f"Train ratio {train_ratio} not ~0.7"
-
Fix random seed: Ensure deterministic seeding before split
random.seed(42) torch.manual_seed(42)
-
Use larger sample: Increase dataset size for split test to reduce variance
-
Statistical approach: Use confidence intervals instead of hard thresholds
Related
- Tests:
tests/data/test_planning_dataset.py::TestPlanningTripleDataset::test_split_independence - Merge PRs: Phase 1a.1: Merge 3-level hierarchy into dataset-planning baseline #16, Phase 1a.2: Merge 3-level hierarchy into dataset-causal baseline #17, Phase 1b.1: Sync dataset-planning with main infrastructure #18, Phase 1b.2: Sync dataset-causal with main infrastructure #19
Labels
- bug
- tests
- flaky-test
- low-priority
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels