Skip to content

Flaky test: test_split_independence in planning dataset tests #20

@research-developer

Description

@research-developer

Description

The test TestPlanningTripleDataset::test_split_independence in tests/data/test_planning_dataset.py is flaky due to random seed sensitivity in dataset splitting.

Error

FAILED tests/data/test_planning_dataset.py::TestPlanningTripleDataset::test_split_independence
AssertionError: Train ratio 0.5711529184756392 not ~0.7
assert 0.65 < 0.5711529184756392

Root Cause

The test expects exact split ratios (train ~0.7, val ~0.15, test ~0.15) but random sampling can cause variance. The test uses:

assert 0.65 < train_ratio < 0.75, f"Train ratio {train_ratio} not ~0.7"

However, with small dataset sizes or insufficient random seeding, the actual ratio can fall outside this range.

Impact

  • Severity: Low
  • Scope: Test infrastructure only
  • User Impact: None - does not affect production code
  • Merge Impact: Does not block merges (24/25 tests pass)

Discovered In

Suggested Fixes

  1. Widen tolerance: Change assertions to allow more variance

    assert 0.60 < train_ratio < 0.80, f"Train ratio {train_ratio} not ~0.7"
  2. Fix random seed: Ensure deterministic seeding before split

    random.seed(42)
    torch.manual_seed(42)
  3. Use larger sample: Increase dataset size for split test to reduce variance

  4. Statistical approach: Use confidence intervals instead of hard thresholds

Related

Labels

  • bug
  • tests
  • flaky-test
  • low-priority

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions