Skip to content

Investigate and optimize APE-RV LLM coordinate mapping (37% no_match rate) #43

@phtcosta

Description

@phtcosta

Problem

APE-RV's LLM integration (aperv:sata_mop_llm) has a 37.3% no_match rate — 3,554 of 9,525 LLM calls fail to map coordinates to a ModelAction. Each no_match wastes 1-3s of LLM overhead without benefit, and the fallback algorithmic action may be suboptimal.

Impact: exp3 showed aperv:sata_mop_llm (27.60% method coverage) did NOT outperform the non-LLM baseline aperv:sata_mop_v1 (28.35%, p=0.014). Reducing no_match from 37% to <20% could unlock the LLM's potential.

Root causes identified (from architectural analysis in docs/20260318_aperv_coordenadas_gh46.md):

  1. Timing gap: LLM sees a fresh screenshot but matching uses stale bounds from an earlier UIAutomator dump
  2. Over-abstraction: GUITreeBuilder filters widgets that exist in the accessibility tree
  3. Prompt format: Current prompt may not be optimal for 4B model (Qwen3-VL)
  4. Matching algorithm: Fixed tolerances may miss edge cases

Approach

Create a new aperv-llm-validation module that replicates the APE-RV LLM pipeline offline (ImageProcessor, ApePromptBuilder, ToolCallParser, CoordinateNormalizer, mapToModelAction) against 468 existing screenshots with UIAutomator ground truth. This enables:

  1. Phase B (Prompt Optimization): Test 5-6 prompt variants with reasoning parameter to understand LLM intent. Identify the optimal prompt for the 4B model.
  2. Phase A (Replay Forensic): Classify each exp3 no_match by root cause using trace replay.
  3. Phase A' (Ground Truth): Re-run subset with enriched logging + artifact preservation.
  4. Phase C (Matching Improvements): Improve the matching algorithm based on data from A/A'/B.

Success Criteria

Metric Baseline (exp3) Target
no_match rate 37.3% <20%
APKs with 100% no_match 8 0
match rate 62.1% >80%

References

  • Plan: docs/20260318_aperv_coordenadas_gh46.md
  • Calibration dependency: docs/20260318_rvape_calibracao.md (MICRO phase blocked on this)
  • Prior visual grounding work: docs/vision/ (rvsec-vision-llm benchmark results)
  • Exp3 results: data/results/exp3_*/

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions