-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
fix_arrow_precedence performs regex-based rewriting over raw SQL text. This works for common cases but is structurally brittle because it is not token/AST-aware.
Affected code
src/datafusion_integration/preprocess.rs(fix_arrow_precedence,LEFT_ARROW_PATTERN,RIGHT_ARROW_PATTERN)
Current status
- Existing tests cover many happy/idempotency paths.
- Quick live checks did not reproduce literal/comment corruption in simple probes.
- This is filed as hardening (low priority), not a confirmed semantic bug.
Why track this
Regex-over-raw-SQL transforms can accidentally match inside contexts that should be inert (string literals/comments) or miss edge syntax forms. Even if currently okay, this is fragile and benefits from explicit guardrails.
Proposed improvements
- Add explicit regression tests for:
- arrow-like text in single-quoted literals
- arrow-like text in
--and/* ... */comments - unusual whitespace/operator layouts
- If edge breakage appears, migrate to tokenizer/AST-aware rewriting for precedence wrapping.
- Keep idempotency guarantees.
Related issue
Related selector-side robustness problem (validated in live run) is tracked here:
That issue is about selector representability/validation; this issue is about SQL preprocessing robustness.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels