Opt 7: Computed index columns for expression-based predicates by carli2 · Pull Request #13 · launix-de/memcp

carli2 · 2026-03-03T11:33:12Z

Summary

Add storage/compute_index.go: helpers to classify sub-expressions as "raw dataset" (only row params + pure functions) or "independent" (no row params), evaluate independent exprs at plan time, and build compute lambdas for index columns
Extend storage/analyzer.go: extractBoundaries now detects (op rawDataset independentExpr) patterns and emits columnboundaries with mapCols/mapFn so the planner can use computed index columns for e.g. WHERE YEAR(date_col) = 2024
Extend storage/index.go: add ColMapCols/ColMapFn to StorageIndex; new colGetter struct supports both raw and computed columns; buildGetters() replaces the old raw []ColumnStorage argument; getDeltaColValue() handles computed delta rows
Fix storage/shard.go: restore the allFound safety guard in rebuild()'s eager index path, now extended to check source columns for computed-column indexes
Update storage/scan_helper.go: encodeScmer uses Proc pointer address for stable canonical names; DeclarationForValue for native Go functions
Add tests/96_computed_index_cols.yaml: 30 test cases covering YEAR/MONTH/DAY/HOUR/MINUTE/SECOND/QUARTER/WEEK/DAYOFWEEK/WEEKDAY in SELECT, WHERE, ORDER BY, GROUP BY, EXTRACT

Test plan

python3 run_sql_tests.py tests/96_computed_index_cols.yaml — 30/30 pass
python3 run_sql_tests.py tests/88_exclusive_bounds.yaml — passes individually
python3 run_sql_tests.py tests/89_native_index.yaml — passes individually
python3 run_sql_tests.py tests/90_eager_index_rebuild.yaml — passes individually
CI GitHub Actions green

🤖 Generated with Claude Code

- extract_date: extend with QUARTER, WEEK, DAYOFWEEK, WEEKDAY fields - sql-builtins: add YEAR/MONTH/DAY/HOUR/MINUTE/SECOND/DAYOFMONTH/DAYOFWEEK/WEEKDAY/WEEK/QUARTER shortcuts - tests/96_computed_index_cols.yaml: 30 tests covering date functions in SELECT/WHERE/GROUP BY/ORDER BY Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Auto-index expressions like YEAR(dt), MONTH(dt) etc. as hidden virtual columns in StorageIndex so that WHERE YEAR(date_col) = 2024 can use an index rather than doing a full scan. Key changes: - storage/compute_index.go (new): isRawDataset/isIndependent classifiers, evalIndependentScmer, canonicalColName, buildComputedFn helpers - storage/index.go: add ColMapCols/ColMapFn to StorageIndex; new colGetter struct with computed-column support; buildGetters() replaces raw slice in buildIndex(); getDeltaColValue() for computed delta values - storage/analyzer.go: extractBoundaries detects (op rawDataset independentExpr) patterns and emits computed column boundaries with mapCols/mapFn - storage/shard.go: restore allFound guard in rebuild() eager-index path, now extended to check computed-column source columns too - storage/scan_helper.go: encodeScmer uses Proc pointer address for unique canonical names; DeclarationForValue for native Go functions - scm/: NthLocalVar/Proc accessors and jit updates for scmer rework Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Cherry-pick 6b8f3a4: when a table has PK + UNIQUE email, the recursive ProcessUniqueCollision call (idx=1) was unconditionally unlocking t.uniquelock held by idx=0, then idx=0's defer also unlocked → fatal double-unlock crash. Fix: wrap recursive flush calls in recovery closures that clear uniquelockHeld=false before re-panicking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Cherry-pick of the temp-column/shard eviction race fix from MP-stabilize: - cache.go: 1s minimum lifetime for newly registered items - scan_helper.go: touchTempColumns now touches ALL temp columns - shard/database/partition/index/storage: lastAccessed → atomic uint64 UnixNano Fixes: GROUP BY after shard eviction → "index out of range [N] with length 2" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace golang.org/x/text with our fork carli2/text@fix-collator-data-race. The upstream Collator.CompareString/_Compare mutates shared _iter[0]/_iter[1] fields, causing a data race when a single Collator is used as a Less() closure across parallel shard scans. Fix: iter variables are now goroutine-local (stack-allocated) in Compare, CompareString, getColElems and getColElemsString. Collator is immutable after construction. Upstream PR: golang/text#60 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…com) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The golang.org/x/text fork (carli2/text) was requiring go 1.25.0 which caused CI to install Go 1.25.0 while local development uses Go 1.24.x. This version mismatch was causing test failures on CI (tests 14, 16, 28, 33) that passed locally. Fix: downgrade the fork's go.mod to 1.24.0, push new commit, and update the replace directive to pin the new fork commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Rename Gauges nav to Status; redirect / to #gauges for admin users - Add root password warning to gauges, databases, users views with consistent styling - Move Back link into centered content area - Fix shard view: expose pivot values from partition schema for range display - Fix dashboard_json_array for empty lists (was producing "[nil]") - Fix routing timing: call route() after whoami response so isAdmin is correct - Add TRUNCATE [TABLE] tbl as alias for DELETE in SQL and PSQL parsers - Add test case tests/96_truncate.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…actBoundaries The Scheme optimizer replaces lambda parameter symbol references with NthLocalVar(i) in filter lambda bodies. Previously extractBoundaries only checked symbolmapping (symbol → column), so optimized lambdas never produced boundaries → no indexes were ever adaptively created. Fix: - Add resolveColVar helper that checks both symbolmapping (symbol lookup) and NthLocalVar(i) → conditionCols[i] (bound parameter by index) - Replace all direct IsSymbol+symbolmapping lookups in traverseCondition with resolveColVar for: equal?/equal??, </<=, >/>= , nil?, contains?, strlike - Fix extractConstant for (outer NthLocalVar(i)) case: previously called mustSymbolValue which panics on NthLocalVar; now dispatches on IsSymbol vs IsNthLocalVar and reads from p.En.VarsNumbered for free outer variables Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Scheme optimizer replaces lambda parameter symbol references with NthLocalVar(i) in filter lambda bodies. Without this fix, optimized lambdas never produced boundaries → no indexes were ever adaptively created for optimized queries. - Add resolveColVar helper: checks symbolmapping (symbol→col) and NthLocalVar(i)→conditionCols[i] (bound parameter by index) - Fix extractConstant for (outer NthLocalVar(i)): dispatch on IsSymbol vs IsNthLocalVar, read from p.En.VarsNumbered for outer free vars - Replace all direct IsSymbol+symbolmapping lookups in traverseCondition with resolveColVar for: equal?/equal??, </<=, >/>= , nil?, contains?, strlike Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove debug fmt.Printf calls (were executing I/O on every query plan) - Replace symbolmapping map allocation with inline linear scan in resolveColVar (no heap alloc; params are typically 2-5 elements, linear scan wins) - Use SymbolEquals() instead of String()=="outer"/"list" (avoids string alloc) - Precompute scm.Equal(lower,upper) before sort to avoid recomputation per comparison Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-equality When all WHERE conditions are point lookups (equality), the sort columns from ORDER BY are appended to the adaptive index boundaries. This causes the shard to build/use an index (eq_col..., sort_col...) that covers both filtering and ordering — rows come out of iterateIndex already sorted, so the cross-shard globalqueue merge only needs to merge pre-sorted runs instead of sorting from scratch. Only string (simple column) sort cols are handled; lambda sort cols are left as a TODO (would require computed index treatment). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend the sort-col boundary appending to cover lambda sort expressions, not just plain string column names. A sort lambda like (lambda (ts) (year ts)) is treated identically to a computed index column in extractBoundaries: isRawDataset guards that it only depends on row params, then canonicalColName and buildComputedFn build the (.year(ts), mapCols, mapFn) boundary entry. This enables ORDER BY YEAR(col) with an equality filter to produce an adaptive index (eq_col, .year(col)) that covers both filter and sort order. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Enables bool-valued computed boundaries (e.g. lower=true, upper=true for a standalone boolean rawDataset predicate like contains?). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

New setting IndexThreshold (default 5): when a shard has fewer rows than the threshold, skip creating a new adaptive StorageIndex and do a direct full scan instead. At 5 rows a binary search costs more than a linear scan, and the 4 heap allocations for the index object dominate. Exposed in dashboard Settings under "Query Optimizer". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove contains? special range rule (was wrong for sparse IN-lists and OR-merging could span too wide) - Add isRawDataset fallback at end of traverseCondition: any pure row-column expression without a matching comparison operator becomes a computed bool column boundary {true, true} - Add isRawDataset check at start of or branch: if the whole OR is a pure row-column expression, index it as a computed bool col instead of widening sub-ranges (avoids false positives from range merging) - Remove debug prints from storage.go, scan.go, compute_proxy.go Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- isRawDataset: rewrite using DeclarationForValue + Foldable flag instead of ad-hoc symbol allow/deny lists; handles tagFunc-resolved builtins - isRawDataset: handle !list special form (optimizer stack-allocated list) as foldable when count == len(items)-3 and all value exprs are rawDataset - analyzer: replace v[0].SymbolEquals(name) with funcIs() helper that falls back to DeclarationForValue, fixing contains?/</>/= after optimization - analyzer: OR-branch checks isRawDataset first → bool computed col instead of range merging (fixes sparse IN-lists spanning too wide a range) - scm: mark list as Foldable=true (constant inputs → deterministic output) - scm: add DeoptimizeExpr to rewrite !list → (list ...) before buildComputedFn so lambdas don't depend on VarsNumbered slots beyond their params - scan_helper: normalize !list in encodeScmerToString → stable canonical index names regardless of which stack slot the optimizer chose - scm: add IsNativeFunc() for tagFunc/tagFuncEnv detection - settings: add ScanDebugging bool — logs db+table+boundaries+index per scan - dashboard: expose ScanDebugging toggle in Debugging settings group - scm: remove stale OUTER-DBG prints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

carli2 and others added 19 commits March 3, 2026 00:15

Update golang.org/x/text fork pin to amended commit (cphaensch@gmail.…

19762d7

…com) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

run_sql_tests: show ⚠️ instead of ❌ for noncritical failures

fa12d20

storage/analyzer: document why extractBoundaries sorts columns

f99727c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

storage/analyzer: extractConstant handles bool literals

c491947

Enables bool-valued computed boundaries (e.g. lower=true, upper=true for a standalone boolean rawDataset predicate like contains?). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opt 7: Computed index columns for expression-based predicates#13

Opt 7: Computed index columns for expression-based predicates#13
carli2 wants to merge 19 commits intomasterfrom
worktree-computed-index-cols

carli2 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carli2 commented Mar 3, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant