Skip to content

extract_table_names should handle qualified table names (schema.table) #74

@ndenev

Description

@ndenev

Summary

extract_table_names mishandles qualified table references (schema.table), capturing only the first identifier after FROM/JOIN.

Affected code

  • src/datafusion_integration/preprocess.rs (extract_table_names)

Root cause

The implementation takes the next Token::Word after FROM/JOIN and pushes it directly.
For qualified names like public.pods, token stream is effectively:

  • Word(public)
  • Period
  • Word(pods)

Current code records public as the table name.

Why this matters

extract_table_names feeds table-aware JSON column detection (preprocess_sql_with_registry).
If table extraction is wrong:

  • registry lookups miss actual table metadata,
  • table-specific/custom JSON columns may not be recognized,
  • dot-notation preprocessing can diverge from intended behavior.

Impact

Medium correctness/robustness issue, especially for qualified references and custom JSON columns.

Proposed fix

Preferred:

  1. Use SQL AST extraction (from parsed Statement) to collect TableFactor object names and select the relation/table component correctly.

Minimal fallback:
2. Improve token-based extraction to consume dotted identifiers and use the last segment (catalog.schema.table -> table).

Tests to add

  1. SELECT * FROM public.pods -> pods
  2. SELECT * FROM postgres.public.pods -> pods
  3. SELECT * FROM public.pods p JOIN public.services s ON ... -> pods, services
  4. Quoted identifier variants where supported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions