Skip to content

feat: iterative LOC bounds refinement with --min-loc and --max-loc #6

@vitali87

Description

@vitali87

Summary

Add optional iterative refinement so the LLM re-plans until every group satisfies user-supplied [min_loc, max_loc] bounds. Currently max_loc is a soft limit (warnings only) and there is no min_loc. This proposal introduces both bounds and an EM-style feedback loop that converges on a valid plan.

Motivation

When splitting large PRs, groups that are too small create noise for reviewers, and groups that are too large defeat the purpose of splitting. Users should be able to express both a floor and a ceiling on group size and have the tool automatically re-plan until all groups comply.

Proposed Behaviour

  1. New CLI options: --min-loc (default 50) and --max-loc (existing, default 400).
  2. After the initial plan (single-shot or chunked), a refinement loop runs:
    • Detect violations: identify groups outside [min_loc, max_loc].
    • Build a focused prompt: include the current group catalog, a structured violation report, and only the diff hunks belonging to violating groups (keeps context small).
    • LLM re-plans: returns a complete updated plan (all groups, not just changed ones).
    • Repeat until no violations remain or iteration limit is reached.
  3. --max-refinement-iterations (default 3) caps the loop. If violations persist after exhaustion, the tool logs a warning and proceeds with the best-effort result.
  4. Validation: --min-loc must be strictly less than --max-loc; the tool errors out otherwise.

Affected Areas

Area Change
constants.py DEFAULT_MIN_LOC, DEFAULT_MAX_REFINEMENT_ITERATIONS, ViolationType enum
types_defs.py LOCViolation named tuple
config.py min_loc, max_refinement_iterations fields with cross-field validation
schemas.py min_loc field on SplitPlan (backward compatible default)
logs.py Refinement lifecycle log messages
exceptions.py MIN_LOC_GE_MAX_LOC error message
prompts.py Updated system prompt mentioning both bounds; new refinement prompt template
validator.py detect_loc_violations() for structured violation detection; updated validate_loc_bounds
chunker.py format_violation_report() formatter
client.py _build_violation_diff() and _refine_groups() core loop; called after initial plan
cli.py --min-loc and --max-refinement-iterations options

Open Questions

  • Should the refinement loop be opt-in (e.g. --refine) or always active when bounds are supplied?
  • Should there be a --strict flag that errors out instead of proceeding when refinement cannot converge?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions