Opt 7: Computed index columns for expression-based predicates#13
Open
Opt 7: Computed index columns for expression-based predicates#13
Conversation
- extract_date: extend with QUARTER, WEEK, DAYOFWEEK, WEEKDAY fields - sql-builtins: add YEAR/MONTH/DAY/HOUR/MINUTE/SECOND/DAYOFMONTH/DAYOFWEEK/WEEKDAY/WEEK/QUARTER shortcuts - tests/96_computed_index_cols.yaml: 30 tests covering date functions in SELECT/WHERE/GROUP BY/ORDER BY Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-index expressions like YEAR(dt), MONTH(dt) etc. as hidden virtual columns in StorageIndex so that WHERE YEAR(date_col) = 2024 can use an index rather than doing a full scan. Key changes: - storage/compute_index.go (new): isRawDataset/isIndependent classifiers, evalIndependentScmer, canonicalColName, buildComputedFn helpers - storage/index.go: add ColMapCols/ColMapFn to StorageIndex; new colGetter struct with computed-column support; buildGetters() replaces raw slice in buildIndex(); getDeltaColValue() for computed delta values - storage/analyzer.go: extractBoundaries detects (op rawDataset independentExpr) patterns and emits computed column boundaries with mapCols/mapFn - storage/shard.go: restore allFound guard in rebuild() eager-index path, now extended to check computed-column source columns too - storage/scan_helper.go: encodeScmer uses Proc pointer address for unique canonical names; DeclarationForValue for native Go functions - scm/: NthLocalVar/Proc accessors and jit updates for scmer rework Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick 6b8f3a4: when a table has PK + UNIQUE email, the recursive ProcessUniqueCollision call (idx=1) was unconditionally unlocking t.uniquelock held by idx=0, then idx=0's defer also unlocked → fatal double-unlock crash. Fix: wrap recursive flush calls in recovery closures that clear uniquelockHeld=false before re-panicking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of the temp-column/shard eviction race fix from MP-stabilize: - cache.go: 1s minimum lifetime for newly registered items - scan_helper.go: touchTempColumns now touches ALL temp columns - shard/database/partition/index/storage: lastAccessed → atomic uint64 UnixNano Fixes: GROUP BY after shard eviction → "index out of range [N] with length 2" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace golang.org/x/text with our fork carli2/text@fix-collator-data-race. The upstream Collator.CompareString/_Compare mutates shared _iter[0]/_iter[1] fields, causing a data race when a single Collator is used as a Less() closure across parallel shard scans. Fix: iter variables are now goroutine-local (stack-allocated) in Compare, CompareString, getColElems and getColElemsString. Collator is immutable after construction. Upstream PR: golang/text#60 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…com) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The golang.org/x/text fork (carli2/text) was requiring go 1.25.0 which caused CI to install Go 1.25.0 while local development uses Go 1.24.x. This version mismatch was causing test failures on CI (tests 14, 16, 28, 33) that passed locally. Fix: downgrade the fork's go.mod to 1.24.0, push new commit, and update the replace directive to pin the new fork commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename Gauges nav to Status; redirect / to #gauges for admin users - Add root password warning to gauges, databases, users views with consistent styling - Move Back link into centered content area - Fix shard view: expose pivot values from partition schema for range display - Fix dashboard_json_array for empty lists (was producing "[nil]") - Fix routing timing: call route() after whoami response so isAdmin is correct - Add TRUNCATE [TABLE] tbl as alias for DELETE in SQL and PSQL parsers - Add test case tests/96_truncate.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…actBoundaries The Scheme optimizer replaces lambda parameter symbol references with NthLocalVar(i) in filter lambda bodies. Previously extractBoundaries only checked symbolmapping (symbol → column), so optimized lambdas never produced boundaries → no indexes were ever adaptively created. Fix: - Add resolveColVar helper that checks both symbolmapping (symbol lookup) and NthLocalVar(i) → conditionCols[i] (bound parameter by index) - Replace all direct IsSymbol+symbolmapping lookups in traverseCondition with resolveColVar for: equal?/equal??, </<=, >/>= , nil?, contains?, strlike - Fix extractConstant for (outer NthLocalVar(i)) case: previously called mustSymbolValue which panics on NthLocalVar; now dispatches on IsSymbol vs IsNthLocalVar and reads from p.En.VarsNumbered for free outer variables Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Scheme optimizer replaces lambda parameter symbol references with NthLocalVar(i) in filter lambda bodies. Without this fix, optimized lambdas never produced boundaries → no indexes were ever adaptively created for optimized queries. - Add resolveColVar helper: checks symbolmapping (symbol→col) and NthLocalVar(i)→conditionCols[i] (bound parameter by index) - Fix extractConstant for (outer NthLocalVar(i)): dispatch on IsSymbol vs IsNthLocalVar, read from p.En.VarsNumbered for outer free vars - Replace all direct IsSymbol+symbolmapping lookups in traverseCondition with resolveColVar for: equal?/equal??, </<=, >/>= , nil?, contains?, strlike Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove debug fmt.Printf calls (were executing I/O on every query plan) - Replace symbolmapping map allocation with inline linear scan in resolveColVar (no heap alloc; params are typically 2-5 elements, linear scan wins) - Use SymbolEquals() instead of String()=="outer"/"list" (avoids string alloc) - Precompute scm.Equal(lower,upper) before sort to avoid recomputation per comparison Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-equality When all WHERE conditions are point lookups (equality), the sort columns from ORDER BY are appended to the adaptive index boundaries. This causes the shard to build/use an index (eq_col..., sort_col...) that covers both filtering and ordering — rows come out of iterateIndex already sorted, so the cross-shard globalqueue merge only needs to merge pre-sorted runs instead of sorting from scratch. Only string (simple column) sort cols are handled; lambda sort cols are left as a TODO (would require computed index treatment). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the sort-col boundary appending to cover lambda sort expressions, not just plain string column names. A sort lambda like (lambda (ts) (year ts)) is treated identically to a computed index column in extractBoundaries: isRawDataset guards that it only depends on row params, then canonicalColName and buildComputedFn build the (.year(ts), mapCols, mapFn) boundary entry. This enables ORDER BY YEAR(col) with an equality filter to produce an adaptive index (eq_col, .year(col)) that covers both filter and sort order. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enables bool-valued computed boundaries (e.g. lower=true, upper=true for a standalone boolean rawDataset predicate like contains?). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New setting IndexThreshold (default 5): when a shard has fewer rows than the threshold, skip creating a new adaptive StorageIndex and do a direct full scan instead. At 5 rows a binary search costs more than a linear scan, and the 4 heap allocations for the index object dominate. Exposed in dashboard Settings under "Query Optimizer". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove contains? special range rule (was wrong for sparse IN-lists and
OR-merging could span too wide)
- Add isRawDataset fallback at end of traverseCondition: any pure
row-column expression without a matching comparison operator becomes a
computed bool column boundary {true, true}
- Add isRawDataset check at start of or branch: if the whole OR is a
pure row-column expression, index it as a computed bool col instead of
widening sub-ranges (avoids false positives from range merging)
- Remove debug prints from storage.go, scan.go, compute_proxy.go
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- isRawDataset: rewrite using DeclarationForValue + Foldable flag instead of ad-hoc symbol allow/deny lists; handles tagFunc-resolved builtins - isRawDataset: handle !list special form (optimizer stack-allocated list) as foldable when count == len(items)-3 and all value exprs are rawDataset - analyzer: replace v[0].SymbolEquals(name) with funcIs() helper that falls back to DeclarationForValue, fixing contains?/</>/= after optimization - analyzer: OR-branch checks isRawDataset first → bool computed col instead of range merging (fixes sparse IN-lists spanning too wide a range) - scm: mark list as Foldable=true (constant inputs → deterministic output) - scm: add DeoptimizeExpr to rewrite !list → (list ...) before buildComputedFn so lambdas don't depend on VarsNumbered slots beyond their params - scan_helper: normalize !list in encodeScmerToString → stable canonical index names regardless of which stack slot the optimizer chose - scm: add IsNativeFunc() for tagFunc/tagFuncEnv detection - settings: add ScanDebugging bool — logs db+table+boundaries+index per scan - dashboard: expose ScanDebugging toggle in Debugging settings group - scm: remove stale OUTER-DBG prints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
storage/compute_index.go: helpers to classify sub-expressions as "raw dataset" (only row params + pure functions) or "independent" (no row params), evaluate independent exprs at plan time, and build compute lambdas for index columnsstorage/analyzer.go:extractBoundariesnow detects(op rawDataset independentExpr)patterns and emitscolumnboundarieswithmapCols/mapFnso the planner can use computed index columns for e.g.WHERE YEAR(date_col) = 2024storage/index.go: addColMapCols/ColMapFntoStorageIndex; newcolGetterstruct supports both raw and computed columns;buildGetters()replaces the old raw[]ColumnStorageargument;getDeltaColValue()handles computed delta rowsstorage/shard.go: restore theallFoundsafety guard inrebuild()'s eager index path, now extended to check source columns for computed-column indexesstorage/scan_helper.go:encodeScmeruses Proc pointer address for stable canonical names;DeclarationForValuefor native Go functionstests/96_computed_index_cols.yaml: 30 test cases covering YEAR/MONTH/DAY/HOUR/MINUTE/SECOND/QUARTER/WEEK/DAYOFWEEK/WEEKDAY in SELECT, WHERE, ORDER BY, GROUP BY, EXTRACTTest plan
python3 run_sql_tests.py tests/96_computed_index_cols.yaml— 30/30 passpython3 run_sql_tests.py tests/88_exclusive_bounds.yaml— passes individuallypython3 run_sql_tests.py tests/89_native_index.yaml— passes individuallypython3 run_sql_tests.py tests/90_eager_index_rebuild.yaml— passes individually🤖 Generated with Claude Code