-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
extract_table_names mishandles qualified table references (schema.table), capturing only the first identifier after FROM/JOIN.
Affected code
src/datafusion_integration/preprocess.rs(extract_table_names)
Root cause
The implementation takes the next Token::Word after FROM/JOIN and pushes it directly.
For qualified names like public.pods, token stream is effectively:
Word(public)PeriodWord(pods)
Current code records public as the table name.
Why this matters
extract_table_names feeds table-aware JSON column detection (preprocess_sql_with_registry).
If table extraction is wrong:
- registry lookups miss actual table metadata,
- table-specific/custom JSON columns may not be recognized,
- dot-notation preprocessing can diverge from intended behavior.
Impact
Medium correctness/robustness issue, especially for qualified references and custom JSON columns.
Proposed fix
Preferred:
- Use SQL AST extraction (from parsed
Statement) to collectTableFactorobject names and select the relation/table component correctly.
Minimal fallback:
2. Improve token-based extraction to consume dotted identifiers and use the last segment (catalog.schema.table -> table).
Tests to add
SELECT * FROM public.pods->podsSELECT * FROM postgres.public.pods->podsSELECT * FROM public.pods p JOIN public.services s ON ...->pods,services- Quoted identifier variants where supported.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels