Skip to content

Opt 7: Computed index columns for expression-based predicates#13

Open
carli2 wants to merge 19 commits intomasterfrom
worktree-computed-index-cols
Open

Opt 7: Computed index columns for expression-based predicates#13
carli2 wants to merge 19 commits intomasterfrom
worktree-computed-index-cols

Conversation

@carli2
Copy link
Contributor

@carli2 carli2 commented Mar 3, 2026

Summary

  • Add storage/compute_index.go: helpers to classify sub-expressions as "raw dataset" (only row params + pure functions) or "independent" (no row params), evaluate independent exprs at plan time, and build compute lambdas for index columns
  • Extend storage/analyzer.go: extractBoundaries now detects (op rawDataset independentExpr) patterns and emits columnboundaries with mapCols/mapFn so the planner can use computed index columns for e.g. WHERE YEAR(date_col) = 2024
  • Extend storage/index.go: add ColMapCols/ColMapFn to StorageIndex; new colGetter struct supports both raw and computed columns; buildGetters() replaces the old raw []ColumnStorage argument; getDeltaColValue() handles computed delta rows
  • Fix storage/shard.go: restore the allFound safety guard in rebuild()'s eager index path, now extended to check source columns for computed-column indexes
  • Update storage/scan_helper.go: encodeScmer uses Proc pointer address for stable canonical names; DeclarationForValue for native Go functions
  • Add tests/96_computed_index_cols.yaml: 30 test cases covering YEAR/MONTH/DAY/HOUR/MINUTE/SECOND/QUARTER/WEEK/DAYOFWEEK/WEEKDAY in SELECT, WHERE, ORDER BY, GROUP BY, EXTRACT

Test plan

  • python3 run_sql_tests.py tests/96_computed_index_cols.yaml — 30/30 pass
  • python3 run_sql_tests.py tests/88_exclusive_bounds.yaml — passes individually
  • python3 run_sql_tests.py tests/89_native_index.yaml — passes individually
  • python3 run_sql_tests.py tests/90_eager_index_rebuild.yaml — passes individually
  • CI GitHub Actions green

🤖 Generated with Claude Code

carli2 and others added 19 commits March 3, 2026 00:15
- extract_date: extend with QUARTER, WEEK, DAYOFWEEK, WEEKDAY fields
- sql-builtins: add YEAR/MONTH/DAY/HOUR/MINUTE/SECOND/DAYOFMONTH/DAYOFWEEK/WEEKDAY/WEEK/QUARTER shortcuts
- tests/96_computed_index_cols.yaml: 30 tests covering date functions in SELECT/WHERE/GROUP BY/ORDER BY

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-index expressions like YEAR(dt), MONTH(dt) etc. as hidden virtual
columns in StorageIndex so that WHERE YEAR(date_col) = 2024 can use an
index rather than doing a full scan.

Key changes:
- storage/compute_index.go (new): isRawDataset/isIndependent classifiers,
  evalIndependentScmer, canonicalColName, buildComputedFn helpers
- storage/index.go: add ColMapCols/ColMapFn to StorageIndex; new colGetter
  struct with computed-column support; buildGetters() replaces raw slice
  in buildIndex(); getDeltaColValue() for computed delta values
- storage/analyzer.go: extractBoundaries detects (op rawDataset independentExpr)
  patterns and emits computed column boundaries with mapCols/mapFn
- storage/shard.go: restore allFound guard in rebuild() eager-index path,
  now extended to check computed-column source columns too
- storage/scan_helper.go: encodeScmer uses Proc pointer address for unique
  canonical names; DeclarationForValue for native Go functions
- scm/: NthLocalVar/Proc accessors and jit updates for scmer rework

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick 6b8f3a4: when a table has PK + UNIQUE email, the recursive
ProcessUniqueCollision call (idx=1) was unconditionally unlocking
t.uniquelock held by idx=0, then idx=0's defer also unlocked →
fatal double-unlock crash. Fix: wrap recursive flush calls in recovery
closures that clear uniquelockHeld=false before re-panicking.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of the temp-column/shard eviction race fix from MP-stabilize:
- cache.go: 1s minimum lifetime for newly registered items
- scan_helper.go: touchTempColumns now touches ALL temp columns
- shard/database/partition/index/storage: lastAccessed → atomic uint64 UnixNano

Fixes: GROUP BY after shard eviction → "index out of range [N] with length 2"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace golang.org/x/text with our fork carli2/text@fix-collator-data-race.
The upstream Collator.CompareString/_Compare mutates shared _iter[0]/_iter[1]
fields, causing a data race when a single Collator is used as a Less()
closure across parallel shard scans.

Fix: iter variables are now goroutine-local (stack-allocated) in Compare,
CompareString, getColElems and getColElemsString. Collator is immutable
after construction. Upstream PR: golang/text#60

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…com)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The golang.org/x/text fork (carli2/text) was requiring go 1.25.0
which caused CI to install Go 1.25.0 while local development uses
Go 1.24.x. This version mismatch was causing test failures on CI
(tests 14, 16, 28, 33) that passed locally.

Fix: downgrade the fork's go.mod to 1.24.0, push new commit, and
update the replace directive to pin the new fork commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename Gauges nav to Status; redirect / to #gauges for admin users
- Add root password warning to gauges, databases, users views with consistent styling
- Move Back link into centered content area
- Fix shard view: expose pivot values from partition schema for range display
- Fix dashboard_json_array for empty lists (was producing "[nil]")
- Fix routing timing: call route() after whoami response so isAdmin is correct
- Add TRUNCATE [TABLE] tbl as alias for DELETE in SQL and PSQL parsers
- Add test case tests/96_truncate.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…actBoundaries

The Scheme optimizer replaces lambda parameter symbol references with
NthLocalVar(i) in filter lambda bodies. Previously extractBoundaries only
checked symbolmapping (symbol → column), so optimized lambdas never
produced boundaries → no indexes were ever adaptively created.

Fix:
- Add resolveColVar helper that checks both symbolmapping (symbol lookup)
  and NthLocalVar(i) → conditionCols[i] (bound parameter by index)
- Replace all direct IsSymbol+symbolmapping lookups in traverseCondition
  with resolveColVar for: equal?/equal??, </<=, >/>= , nil?, contains?, strlike
- Fix extractConstant for (outer NthLocalVar(i)) case: previously called
  mustSymbolValue which panics on NthLocalVar; now dispatches on IsSymbol
  vs IsNthLocalVar and reads from p.En.VarsNumbered for free outer variables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Scheme optimizer replaces lambda parameter symbol references with
NthLocalVar(i) in filter lambda bodies. Without this fix, optimized
lambdas never produced boundaries → no indexes were ever adaptively
created for optimized queries.

- Add resolveColVar helper: checks symbolmapping (symbol→col) and
  NthLocalVar(i)→conditionCols[i] (bound parameter by index)
- Fix extractConstant for (outer NthLocalVar(i)): dispatch on IsSymbol
  vs IsNthLocalVar, read from p.En.VarsNumbered for outer free vars
- Replace all direct IsSymbol+symbolmapping lookups in traverseCondition
  with resolveColVar for: equal?/equal??, </<=, >/>= , nil?, contains?, strlike

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove debug fmt.Printf calls (were executing I/O on every query plan)
- Replace symbolmapping map allocation with inline linear scan in resolveColVar
  (no heap alloc; params are typically 2-5 elements, linear scan wins)
- Use SymbolEquals() instead of String()=="outer"/"list" (avoids string alloc)
- Precompute scm.Equal(lower,upper) before sort to avoid recomputation per comparison

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-equality

When all WHERE conditions are point lookups (equality), the sort columns
from ORDER BY are appended to the adaptive index boundaries. This causes
the shard to build/use an index (eq_col..., sort_col...) that covers both
filtering and ordering — rows come out of iterateIndex already sorted, so
the cross-shard globalqueue merge only needs to merge pre-sorted runs
instead of sorting from scratch.

Only string (simple column) sort cols are handled; lambda sort cols are
left as a TODO (would require computed index treatment).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the sort-col boundary appending to cover lambda sort expressions,
not just plain string column names. A sort lambda like (lambda (ts) (year ts))
is treated identically to a computed index column in extractBoundaries:
isRawDataset guards that it only depends on row params, then canonicalColName
and buildComputedFn build the (.year(ts), mapCols, mapFn) boundary entry.

This enables ORDER BY YEAR(col) with an equality filter to produce an
adaptive index (eq_col, .year(col)) that covers both filter and sort order.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enables bool-valued computed boundaries (e.g. lower=true, upper=true
for a standalone boolean rawDataset predicate like contains?).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New setting IndexThreshold (default 5): when a shard has fewer rows than
the threshold, skip creating a new adaptive StorageIndex and do a direct
full scan instead. At 5 rows a binary search costs more than a linear
scan, and the 4 heap allocations for the index object dominate.

Exposed in dashboard Settings under "Query Optimizer".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove contains? special range rule (was wrong for sparse IN-lists and
  OR-merging could span too wide)
- Add isRawDataset fallback at end of traverseCondition: any pure
  row-column expression without a matching comparison operator becomes a
  computed bool column boundary {true, true}
- Add isRawDataset check at start of or branch: if the whole OR is a
  pure row-column expression, index it as a computed bool col instead of
  widening sub-ranges (avoids false positives from range merging)
- Remove debug prints from storage.go, scan.go, compute_proxy.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- isRawDataset: rewrite using DeclarationForValue + Foldable flag instead
  of ad-hoc symbol allow/deny lists; handles tagFunc-resolved builtins
- isRawDataset: handle !list special form (optimizer stack-allocated list)
  as foldable when count == len(items)-3 and all value exprs are rawDataset
- analyzer: replace v[0].SymbolEquals(name) with funcIs() helper that falls
  back to DeclarationForValue, fixing contains?/</>/= after optimization
- analyzer: OR-branch checks isRawDataset first → bool computed col instead
  of range merging (fixes sparse IN-lists spanning too wide a range)
- scm: mark list as Foldable=true (constant inputs → deterministic output)
- scm: add DeoptimizeExpr to rewrite !list → (list ...) before buildComputedFn
  so lambdas don't depend on VarsNumbered slots beyond their params
- scan_helper: normalize !list in encodeScmerToString → stable canonical index
  names regardless of which stack slot the optimizer chose
- scm: add IsNativeFunc() for tagFunc/tagFuncEnv detection
- settings: add ScanDebugging bool — logs db+table+boundaries+index per scan
- dashboard: expose ScanDebugging toggle in Debugging settings group
- scm: remove stale OUTER-DBG prints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant