Skip to content

Filtering on computed columns#6

Open
AivanF wants to merge 8 commits intomainfrom
aivan/2026-03-30-Formula-Nesting-Transform
Open

Filtering on computed columns#6
AivanF wants to merge 8 commits intomainfrom
aivan/2026-03-30-Formula-Nesting-Transform

Conversation

@AivanF
Copy link
Copy Markdown
Collaborator

@AivanF AivanF commented Mar 30, 2026

Summary by CodeRabbit

  • New Features

    • Filters can use LIKE / NOT LIKE and may reference computed fields or inline transform expressions; such predicates run as post-filters after transforms.
    • Added "last"/latest-measure semantics and improved time-shift/change to fetch adjacent-period values outside a restricted date range.
  • Bug Fixes

    • Reject nested self-join transforms to prevent invalid computations.
  • Documentation

    • Expanded guides for filters, computed-column/post-filter semantics, transforms, and fields/datasource capabilities.
  • Tests

    • Added integration and SQL generator tests for transforms, post-filters, and last-measure behavior.
  • Chores

    • Added python-dateutil dependency.

@AivanF AivanF requested a review from ZmeiGorynych March 30, 2026 13:17
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds support for filters that reference computed fields or inline transform expressions by extracting those transforms into hidden computed fields, marking such filters as post-filters, applying them in an outer SQL wrapper, and enhancing self-join/time-shift/change generation, SQL generation, enums, tests, and documentation.

Changes

Cohort / File(s) Summary
Documentation
\.claude/skills/slayer-query.md, CLAUDE.md, README.md, docs/concepts/formulas.md, docs/index.md
Replaced string helper examples with like / not like; documented that filters may reference fields-defined computed columns or inline transform expressions which are auto-extracted into hidden computed fields and applied as post-filters. Minor transform docs reorder and feature-list updates.
Core Formula Model
slayer/core/formula.py
Added ParsedFilter.is_post_filter: bool = False; preprocesses like / not like into internal calls; extended AST-to-SQL to support arithmetic ast.BinOp and internal __like__/__notlike__ rendering.
Query Engine
slayer/engine/query_engine.py
Added _extract_filter_transforms() to rewrite filters and emit extra computed fields; _enrich flattens those transforms; _classify_filters accepts computed_names and marks computed refs as post-filters; time-dimension resolution tweaks; nested self-join validation; default change/change_pct offset behavior changed.
SQL Generator
slayer/sql/generator.py
Introduced _generate_shifted_base() and dialect-aware time shifting; row-number joins for row-based shifts and equality joins for calendar shifts; refactored self-join layering; excluded post-filters from inner WHERE/HAVING and wrapped SQL with outer WHERE for is_post_filter; treated DataType.LAST base agg as MAX while generating latest-value semantics elsewhere.
Enums
slayer/core/enums.py
Added DataType.LAST and included it in aggregation-like types and python_type mapping.
Public API Docstring
slayer/mcp/server.py
Updated query() docstring examples to use like/not like and to note filters may reference computed fields or include inline transforms.
Dependencies
pyproject.toml
Added dependency python-dateutil = ">=2.8".
Examples
examples/embedded/verify.py
Renamed "Lag" labeling to time_shift (row-based, previous period) and updated assertion names.
Tests
tests/test_integration.py, tests/test_integration_postgres.py, tests/test_sql_generator.py
Added extensive tests for time_shift/change/change_pct with date_range, post-filter and inline-transform filter behavior, nesting-guard for self-joins, DataType.LAST semantics, and adjusted existing expectations and imports.

Sequence Diagram

sequenceDiagram
    autonumber
    actor Client
    participant QueryEngine as Query Engine
    participant FilterExtractor as Filter Extractor
    participant FieldFlattener as Field Flattener
    participant FilterClassifier as Filter Classifier
    participant SQLGenerator as SQL Generator
    participant Database as Database

    Client->>QueryEngine: Submit SlayerQuery (includes filters)
    QueryEngine->>FilterExtractor: _extract_filter_transforms(filter_str)
    alt inline transforms present
        FilterExtractor->>FilterExtractor: Parse AST, detect transform calls
        FilterExtractor->>QueryEngine: Return rewritten_filter + generated_fields
        QueryEngine->>FieldFlattener: Flatten generated_fields into enriched fields
    else no inline transforms
        FilterExtractor->>QueryEngine: Return original_filter
    end

    QueryEngine->>FilterClassifier: _classify_filters(filters, measure_names, computed_names)
    FilterClassifier->>FilterClassifier: Mark filters referencing computed_names as is_post_filter
    FilterClassifier->>SQLGenerator: Pass classified filters

    SQLGenerator->>SQLGenerator: Build base CTEs and shifted bases (row or calendar joins)
    SQLGenerator->>SQLGenerator: Exclude post-filters from inner WHERE/HAVING
    SQLGenerator->>SQLGenerator: Wrap result with outer SELECT ... WHERE <post-filters>
    SQLGenerator->>Database: Execute SQL
    Database->>Client: Return results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through filters, found hidden threads,

extracted transforms from predicates' beds,
I shifted months and joined them row by row,
wrapped post-filters so the results could show,
a nimble rabbit makes queries grow!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Filtering on computed columns' directly and clearly summarizes the main change—introducing support for filters that reference computed/derived field values, which is the core feature across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 81.97% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aivan/2026-03-30-Formula-Nesting-Transform

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_integration.py (1)

347-377: ⚠️ Potential issue | 🔴 Critical

Pipeline failure: Ambiguous column reference in self-join queries.

This test (test_time_shift_row_based) fails due to the same SQL generation issue. When the generator creates self-join CTEs, the ORDER BY clause references orders.created_at without qualifying which CTE it belongs to, causing ambiguity.

The test logic is correct — the fix is needed in the generator.

🧹 Nitpick comments (1)
slayer/sql/generator.py (1)

458-468: Inconsistency: change/change_pct in _build_transform_sql but also in self-join path.

The _build_transform_sql method now includes change and change_pct (lines 464-468), but these transforms are also in _SELF_JOIN_TRANSFORMS (line 32) and handled via the self-join CTE path in deferred_self_joins.

This creates potential confusion: if a change transform somehow reaches _build_transform_sql, it would generate a LAG-based window function instead of a self-join. The _SELF_JOIN_TRANSFORMS check at line 227-228 should prevent this, but having both implementations is fragile.

Consider either removing the window-function implementations here or adding a guard similar to time_shift (line 458-459).

♻️ Suggested guard for consistency
         elif t.transform == "change":
+            # Should be handled by self-join CTE path, not window function
+            raise ValueError("change should not reach _build_transform_sql; it uses self-join CTE")
-            return f"{measure} - LAG({measure}, {t.offset}) OVER ({order_clause})"
         elif t.transform == "change_pct":
+            raise ValueError("change_pct should not reach _build_transform_sql; it uses self-join CTE")
-            lag = f"LAG({measure}, {t.offset}) OVER ({order_clause})"
-            return f"CASE WHEN {lag} != 0 THEN ({measure} - {lag}) * 1.0 / {lag} END"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 458 - 468, The file has duplicate
handling for "change" and "change_pct" in _build_transform_sql while those
transforms are also listed in _SELF_JOIN_TRANSFORMS and handled via
deferred_self_joins; update _build_transform_sql to avoid generating
window-function SQL for these by either removing the "change" and "change_pct"
branches or adding a guard like the existing time_shift case (raise
ValueError("change/change_pct should not reach _build_transform_sql; it uses
self-join CTE")) so that _SELF_JOIN_TRANSFORMS + deferred_self_joins remain the
single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/sql/generator.py`:
- Around line 352-361: The JOIN uses t.granularity directly which can be None
for calendar-based shifts; compute the effective join granularity like in the
deferred_self_joins loop (e.g., set join_granularity = t.granularity or, if None
and a date_range/time dimension exists, use the time dimension's granularity),
then pass that join_granularity into _build_time_shift_join instead of
t.granularity; update the time_shifts loop (the block building from_clause and
calling _build_time_shift_join) to compute and use the same logic so
calendar-based shifted CTEs join on time buckets rather than row numbers.
- Line 363: The ORDER BY expressions become ambiguous after adding LEFT JOINs
for time_shifts because unqualified column names (e.g., "{col_name}") can exist
in both final_cte and shifted_{name} CTEs; update the code that builds the ORDER
BY (the section assembling outer_select/from_clause/sql) to fully qualify each
ORDER BY column with the correct table/CTE alias (use final_cte alias for
original columns and shifted_{name} alias for shift columns) so generated SQL
uses alias."final_cte"."col" or "shifted_{name}"."col" as appropriate; locate
where ORDER BY is constructed (near where sql =
f"{cte_clause}\n{outer_select}\n{from_clause}") and change the column formatting
logic to prepend the proper CTE alias based on whether the column comes from the
base result or a time_shift.
- Around line 317-336: The time_shift CTEs currently just SELECT * (shift_name)
and never call _generate_shifted_base, and the final_cte reassignment at the end
clobbers the rn_cte_name; to fix, for each time_shift entry call
_generate_shifted_base(t, base_cte=final_cte,
available_aliases=available_aliases, rn_cte_name=rn_cte_name) (or similar
signature) to produce the shifted CTE SQL and append that to cte_strs instead of
the dummy SELECT *; ensure row_based_shifts use the rn_cte_name as the base when
calling _generate_shifted_base (i.e., set base_cte = rn_cte_name if rn_cte_name
was created) and remove or move the final assignment final_cte = ctes[-1][0] so
it does not overwrite the rn_cte_name before generating shifted CTEs.

In `@tests/test_integration_postgres.py`:
- Around line 285-305: The test fails from ambiguous column names when
generating self-joins for multiple time_shift/date-range shifts; update the
query generator so that when processing TimeDimension and time_shift expressions
(used by SlayerQuery/TimeDimension/ColumnRef/time_shift) it always qualifies
column references with the correct table alias and creates distinct aliases for
each self-join (e.g., t0, t1, t2) used in JOINs and SELECTs; ensure the
generator uses those unique aliases consistently in JOIN ON clauses and in
projected field references so multiple shifts (as exercised by
test_multiple_date_range_shifts and executed via pg_env.execute) do not produce
ambiguous column errors.
- Around line 225-245: The ORDER BY generation in slayer/sql/generator.py is
emitting unqualified orders (e.g., orders.created_at) causing ambiguity when
self-join CTEs exist; update the function that builds the ORDER BY clause (the
code that references time_col / ORDER BY generation in generator.py) to qualify
columns using the appropriate CTE/source alias (for example prefix with the base
CTE alias or the unique table alias used for the selected projection) or use the
final query's unique alias mapping, ensuring every column in ORDER BY is emitted
as alias.column (e.g., base.orders.created_at or <source_alias>.created_at) when
self-joins/CTEs are present; adjust any helper like build_order_clause /
render_order_by / time_col handling to retrieve and use the resolved source
alias from the query context before emitting the ORDER BY expression.

In `@tests/test_sql_generator.py`:
- Around line 498-509: The failure occurs because time_shift formulas (handled
where Field.formula containing "time_shift" is parsed) bypass the CTE generation
that creates "shifted_base_" blocks; update the time_shift handling in
SQLGenerator (the method that builds field expressions for Field(formula=...) /
the time shift code path) to reuse the same shifted CTE creation logic used for
date-range shifting: detect time_shift(...) in the field parser, register/emit a
shifted_base_ CTE referencing the model's base query (same pattern as the date
range shift code), and ensure the shifted CTE is included in the final SQL
before field substitution so tests like test_no_date_range_no_shift see
"shifted_base_" and no BETWEEN date filters.
- Around line 447-464: The test fails because time_shift transforms (identified
by time_shifts = [t for t in enriched.transforms if t.transform ==
"time_shift"]) are not invoking the date-range shifting path; the code path
handling time_shifts returns "SELECT * FROM {final_cte}" instead of calling
_generate_shifted_base(). Update the time_shift handling logic in
slayer/sql/generator.py to reuse _generate_shifted_base() (the same routine used
by deferred_self_joins) for producing the shifted CTEs: detect time_shift
transforms from enriched.transforms, compute the appropriate shift offsets, and
call _generate_shifted_base(final_cte, shift_spec, ...) so the shifted CTE
contains the adjusted date_range bounds (e.g., producing 2024-02-01/2024-02-29
for a -1 month shift).

---

Nitpick comments:
In `@slayer/sql/generator.py`:
- Around line 458-468: The file has duplicate handling for "change" and
"change_pct" in _build_transform_sql while those transforms are also listed in
_SELF_JOIN_TRANSFORMS and handled via deferred_self_joins; update
_build_transform_sql to avoid generating window-function SQL for these by either
removing the "change" and "change_pct" branches or adding a guard like the
existing time_shift case (raise ValueError("change/change_pct should not reach
_build_transform_sql; it uses self-join CTE")) so that _SELF_JOIN_TRANSFORMS +
deferred_self_joins remain the single source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8eb7856d-2ae8-4bc7-86aa-8b3564c6c8ae

📥 Commits

Reviewing files that changed from the base of the PR and between ac10960 and 94b8fd4.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .claude/skills/slayer-query.md
  • CLAUDE.md
  • README.md
  • docs/concepts/formulas.md
  • docs/concepts/terminology.md
  • docs/index.md
  • examples/embedded/verify.py
  • pyproject.toml
  • slayer/core/formula.py
  • slayer/engine/query_engine.py
  • slayer/mcp/server.py
  • slayer/sql/generator.py
  • tests/test_integration.py
  • tests/test_integration_postgres.py
  • tests/test_sql_generator.py

@AivanF AivanF force-pushed the aivan/2026-03-30-Formula-Nesting-Transform branch from 94b8fd4 to 2ad17c9 Compare March 30, 2026 13:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
slayer/sql/generator.py (2)

356-356: Move import re to the top of the file.

This import is already present at line 606 in _build_where_and_having(). Having it at the module level would be cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` at line 356, Move the local import of the regex
module out of the function and into module-level imports: remove the in-function
"import re" and add "import re" at the top of the file so the module-level
import is used by _build_where_and_having and any other helpers; update
references in _build_where_and_having() (and any nearby functions) to use the
top-level re symbol.

110-111: Move imports to the top of the file.

Per coding guidelines, imports should be placed at the top of files. datetime, timedelta, and relativedelta are imported inside the method.

♻️ Suggested refactor

Add to top of file with other imports:

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta

Then remove lines 110-111 from _shift_date().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 110 - 111, The imports for datetime,
timedelta and relativedelta are declared inside the _shift_date() method; move
"from datetime import datetime, timedelta" and "from dateutil.relativedelta
import relativedelta" up to the module-level imports at the top of the file and
remove those two lines from inside the _shift_date() function so the function
uses the module-level imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@slayer/sql/generator.py`:
- Line 356: Move the local import of the regex module out of the function and
into module-level imports: remove the in-function "import re" and add "import
re" at the top of the file so the module-level import is used by
_build_where_and_having and any other helpers; update references in
_build_where_and_having() (and any nearby functions) to use the top-level re
symbol.
- Around line 110-111: The imports for datetime, timedelta and relativedelta are
declared inside the _shift_date() method; move "from datetime import datetime,
timedelta" and "from dateutil.relativedelta import relativedelta" up to the
module-level imports at the top of the file and remove those two lines from
inside the _shift_date() function so the function uses the module-level imports.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cc59f16-cb61-43c6-8500-c65c11c43826

📥 Commits

Reviewing files that changed from the base of the PR and between 94b8fd4 and 2ad17c9.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • .claude/skills/slayer-query.md
  • CLAUDE.md
  • README.md
  • docs/concepts/formulas.md
  • docs/index.md
  • examples/embedded/verify.py
  • pyproject.toml
  • slayer/core/formula.py
  • slayer/engine/query_engine.py
  • slayer/mcp/server.py
  • slayer/sql/generator.py
  • tests/test_integration.py
  • tests/test_integration_postgres.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (4)
  • CLAUDE.md
  • pyproject.toml
  • slayer/mcp/server.py
  • docs/index.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • README.md
  • .claude/skills/slayer-query.md
  • examples/embedded/verify.py
  • tests/test_sql_generator.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
tests/test_integration.py (1)

514-537: Test name may be misleading given date_range behavior.

The test is named test_forward_row_shift_with_date_range, but per the generator logic at lines 310-315, when date_range is present and no explicit granularity is given, the time dimension's granularity is used for a calendar-based join (not row-based).

Consider renaming to test_forward_shift_with_date_range or adding a clarifying comment that the join becomes calendar-based due to the date_range.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_integration.py` around lines 514 - 537, The test name
test_forward_row_shift_with_date_range is misleading because when a
TimeDimension includes a date_range the generator uses calendar-based joins (see
TimeDimension handling and related generator logic), so either rename the test
to test_forward_shift_with_date_range or update the test's docstring/comment to
state explicitly that the presence of date_range forces a calendar-based join
(affecting time_shift semantics); locate the test function
test_forward_row_shift_with_date_range and perform the rename or modify the
docstring accordingly to clarify behavior.
slayer/sql/generator.py (3)

416-436: Post-filter implementation uses string manipulation instead of sqlglot AST.

Per coding guidelines, SQL generation should use sqlglot AST building instead of string concatenation. The current approach with regex substitution for column qualification and string wrapping is brittle and may have edge cases with special characters in column names.

Additionally, import re should be at the top of the file (also appears at line 637).

💡 Suggested direction

Consider using sqlglot's exp.Select and exp.Subquery to build the outer wrapper:

# Parse the inner SQL and wrap it
inner = sqlglot.parse_one(sql, dialect=self.dialect)
subquery = exp.Subquery(this=inner, alias=exp.to_identifier("_filtered"))
where_expr = sqlglot.parse_one(where_clause, dialect=self.dialect)
outer = exp.Select().select(exp.Star()).from_(subquery).where(where_expr)
sql = outer.sql(dialect=self.dialect, pretty=True)

152-153: Imports should be at the top of the file per coding guidelines.

The datetime and dateutil.relativedelta imports are placed inside the method. Per coding guidelines, imports should be at the top of files.

♻️ Proposed fix

Move imports to top of file (after line 13):

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta

Then update the method:

     `@staticmethod`
     def _shift_date(date: str, offset: int, granularity: str) -> str:
         """Shift a date string by offset units of granularity."""
-        from datetime import datetime, timedelta
-        from dateutil.relativedelta import relativedelta
-
         dt = datetime.strptime(date[:10], "%Y-%m-%d")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 152 - 153, Move the inline imports
"from datetime import datetime, timedelta" and "from dateutil.relativedelta
import relativedelta" out of the method and place them at the top of
slayer/sql/generator.py (with the other imports, e.g., after existing imports
around line 13), then remove the redundant local import lines inside the method
so only the module-level imports are used; this keeps the symbols datetime,
timedelta, and relativedelta available to the function without in-method
imports.

91-92: Silent fallback to "month" may mask configuration errors.

When no granularity is found (which "shouldn't happen" per the comment), the code silently defaults to "month". Consider logging a warning or raising an error to surface misconfiguration early rather than producing potentially incorrect results.

💡 Suggested improvement
             if not gran:
-                gran = "month"  # Shouldn't happen — transforms require a time dim
+                # Shouldn't happen — transforms require a time dim
+                logger.warning(
+                    "No granularity found for transform %s; defaulting to 'month'",
+                    transform.name,
+                )
+                gran = "month"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 91 - 92, The code silently defaults
gran to "month" when the variable gran is falsy (the line with if not gran: gran
= "month"), which can mask configuration errors; change this to surface the
issue by either logging a clear warning with context (e.g., include the
transform name/column/time dimension) via the module logger before falling back,
or preferably raise a descriptive exception (ValueError/RuntimeError)
referencing the missing time granularity so callers fail fast; update the logic
around the gran variable (the if not gran block) to implement one of these two
behaviors and include contextual info in the message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@slayer/sql/generator.py`:
- Around line 152-153: Move the inline imports "from datetime import datetime,
timedelta" and "from dateutil.relativedelta import relativedelta" out of the
method and place them at the top of slayer/sql/generator.py (with the other
imports, e.g., after existing imports around line 13), then remove the redundant
local import lines inside the method so only the module-level imports are used;
this keeps the symbols datetime, timedelta, and relativedelta available to the
function without in-method imports.
- Around line 91-92: The code silently defaults gran to "month" when the
variable gran is falsy (the line with if not gran: gran = "month"), which can
mask configuration errors; change this to surface the issue by either logging a
clear warning with context (e.g., include the transform name/column/time
dimension) via the module logger before falling back, or preferably raise a
descriptive exception (ValueError/RuntimeError) referencing the missing time
granularity so callers fail fast; update the logic around the gran variable (the
if not gran block) to implement one of these two behaviors and include
contextual info in the message.

In `@tests/test_integration.py`:
- Around line 514-537: The test name test_forward_row_shift_with_date_range is
misleading because when a TimeDimension includes a date_range the generator uses
calendar-based joins (see TimeDimension handling and related generator logic),
so either rename the test to test_forward_shift_with_date_range or update the
test's docstring/comment to state explicitly that the presence of date_range
forces a calendar-based join (affecting time_shift semantics); locate the test
function test_forward_row_shift_with_date_range and perform the rename or modify
the docstring accordingly to clarify behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40663c99-2ddf-4ae3-a0fd-4bbea9dded4a

📥 Commits

Reviewing files that changed from the base of the PR and between 2ad17c9 and 291829c.

📒 Files selected for processing (4)
  • slayer/core/formula.py
  • slayer/sql/generator.py
  • tests/test_integration.py
  • tests/test_sql_generator.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • slayer/core/formula.py
  • tests/test_sql_generator.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
slayer/engine/query_engine.py (2)

148-160: Naive substring match may cause false positives in time dimension detection.

The check if td_name in f_str on line 156 could match partial names. For example, a dimension named "at" would match "status == 'completed'". While typical time dimension names (like created_at) make this unlikely, consider using word boundary matching for robustness.

♻️ Suggested improvement
+import re
+
 if resolved_time_alias is None and query.filters:
     # Check if any filter references a time/timestamp/date dimension
     time_dim_names = {
         d.name for d in model.dimensions
         if d.type in (DataType.TIMESTAMP, DataType.DATE)
     }
     for f_str in query.filters:
         for td_name in time_dim_names:
-            if td_name in f_str:
+            if re.search(rf'\b{re.escape(td_name)}\b', f_str):
                 resolved_time_alias = f"{model.name}.{td_name}"
                 break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 148 - 160, The current naive
substring check in the resolved_time_alias detection (looping over
model.dimensions and checking if td_name in f_str) can yield false positives;
update the logic in the block that inspects query.filters to perform a
word-boundary-aware match instead (e.g., use a regex search with \b around
td_name or tokenize f_str by non-word characters) so only whole-dimension-name
occurrences trigger resolved_time_alias; keep references to resolved_time_alias,
query.filters, model.dimensions, DataType.TIMESTAMP, DataType.DATE, td_name and
f_str when changing the check.

383-391: Consider using nonlocal instead of mutable list for counter.

The counter = [0] pattern works but is less idiomatic than using nonlocal. This is a minor style suggestion.

♻️ Alternative pattern
-        counter = [0]
+        counter = 0

         def _replace(node):
+            nonlocal counter
             if isinstance(node, _ast.Call) and isinstance(node.func, _ast.Name) and node.func.id in ALL_TRANSFORMS:
-                name = f"_ft{counter[0]}"
-                counter[0] += 1
+                name = f"_ft{counter}"
+                counter += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 383 - 391, The helper _replace
currently uses a mutable list counter = [0] to allow increments inside the inner
function; change this to a plain integer (counter = 0) in the enclosing scope
and add a nonlocal counter declaration at the top of the inner function _replace
so you can increment counter with counter += 1; keep the rest of the logic
(building name f"_ft{counter}", using _ast.unparse(node), appending to
transforms, and returning _ast.Name) unchanged and continue to reference
ALL_TRANSFORMS and transforms as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/core/enums.py`:
- Line 24: The python_type mapping is missing an entry for DataType.LAST which
causes a KeyError when accessing DataType.LAST.python_type; add an entry for
DataType.LAST in the python_type dict (the same mapping block that contains
DataType.MAX, DataType.MIN, etc.) and map it to float because LAST is an
aggregation that is translated to MAX in the SQL generator; update the mapping
near the existing keys like DataType.MAX to include DataType.LAST: float.

In `@slayer/engine/query_engine.py`:
- Around line 302-321: The label isn't applied when a measure is DataType.LAST
because the code branches to add a last() transform and skips the existing elif
that sets field.label; change the LAST branch in query_engine.py so that after
calling _add_transform(name=field_name, transform="last",
measure_alias=internal_alias, offset=1) you also apply field.label (if present)
to the transform's output alias (field_name) or to the transform entry that was
just added; use the same symbols present (measure_def, DataType.LAST,
base_alias, internal_alias, known_aliases, _add_transform, field.label,
measures) to locate and update the code so the label is set for LAST-derived
measures.

---

Nitpick comments:
In `@slayer/engine/query_engine.py`:
- Around line 148-160: The current naive substring check in the
resolved_time_alias detection (looping over model.dimensions and checking if
td_name in f_str) can yield false positives; update the logic in the block that
inspects query.filters to perform a word-boundary-aware match instead (e.g., use
a regex search with \b around td_name or tokenize f_str by non-word characters)
so only whole-dimension-name occurrences trigger resolved_time_alias; keep
references to resolved_time_alias, query.filters, model.dimensions,
DataType.TIMESTAMP, DataType.DATE, td_name and f_str when changing the check.
- Around line 383-391: The helper _replace currently uses a mutable list counter
= [0] to allow increments inside the inner function; change this to a plain
integer (counter = 0) in the enclosing scope and add a nonlocal counter
declaration at the top of the inner function _replace so you can increment
counter with counter += 1; keep the rest of the logic (building name
f"_ft{counter}", using _ast.unparse(node), appending to transforms, and
returning _ast.Name) unchanged and continue to reference ALL_TRANSFORMS and
transforms as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f582588-d019-480b-aba4-f8770f911238

📥 Commits

Reviewing files that changed from the base of the PR and between 291829c and 99e1371.

📒 Files selected for processing (5)
  • slayer/core/enums.py
  • slayer/engine/query_engine.py
  • slayer/sql/generator.py
  • tests/test_integration.py
  • tests/test_sql_generator.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/test_sql_generator.py
  • slayer/sql/generator.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/concepts/formulas.md`:
- Around line 123-145: Summary: Wording overstates what can be used as
post-filterable names — only computed/transform expressions (computed field
names) are supported, not arbitrary renamed bare measures. Update the docs text
to state that post-filters can reference computed field names produced by
transform expressions (i.e., expressions defined as fields via a formula or
auto-extracted transforms), and clarify that a bare measure renamed with "name"
(e.g., {"formula":"count","name":"n"}) is not eligible; mention the
auto-extraction of transform expressions when used directly in "filters" remains
supported.

In `@slayer/engine/query_engine.py`:
- Around line 300-320: The code only auto-wraps measures of type DataType.LAST
when the field is a top-level MeasureRef, causing nested uses (e.g.,
change(my_last_measure) or arithmetic with the measure) to bypass last(...) and
return inconsistent results; update the resolution flow so that whenever a
MeasureRef is encountered (not just when the entire Field is a bare MeasureRef)
you detect model.get_measure(spec.name) with type DataType.LAST and inject the
last transform via _add_transform using the internal alias pattern
(internal_alias = f"{query.model}._base_{spec.name}"), update measures and
known_aliases accordingly, and ensure enriched_transforms labels are adjusted as
currently done — apply this logic inside the general expression resolution path
that calls _ensure_measure(spec) so nested MeasureRef nodes get wrapped the same
way as top-level ones.
- Around line 381-384: The current except SyntaxError branch returns early and
skips inline-transform extraction when _ast.parse(filter_str, mode="eval") fails
(e.g., filters containing SQL-like operators), so change the except block to
still run the same inline-transform extraction logic used for successfully
parsed trees rather than returning filter_str, []; i.e., on SyntaxError, do not
return immediately — invoke the existing transform-extraction routine (the same
code/path that processes the parsed tree) with filter_str so computed-column
transforms are extracted even when parsing as Python fails.
- Around line 341-346: The inline-transform alias generation currently resets
inside SlayerQueryEngine._extract_filter_transforms causing duplicate
`_ft0/_ft1` across filters; make those aliases query-scoped by moving the
counter out of _extract_filter_transforms and into the caller (e.g., _enrich) or
the instance (e.g., a per-query counter/generator on the engine) so each call to
_extract_filter_transforms gets a continuing sequence; update calls in the
filter-processing loop (the block that calls _extract_filter_transforms, then
parse_formula/_flatten_spec and appends to processed_filters) and the similar
block at the other site to pass/consume the shared counter so generated names
`_ftN` are unique across the entire query.
- Around line 137-160: Restore the documented resolution chain in
resolved_time_alias: first honor query.main_time_dimension
(query.main_time_dimension → f"{model.name}.{...}"), then only use
time_dimensions[0].alias if there is exactly one time dimension in
time_dimensions (i.e., len(time_dimensions) == 1), otherwise do NOT
heuristically scan query.filters or raw strings; next fall back to
model.default_time_dimension (format as
f"{model.name}.{model.default_time_dimension}" if present), and if no time
dimension is resolved raise an explicit error (e.g., ValueError) so callers know
ordering cannot be determined; update references to resolved_time_alias,
query.main_time_dimension, time_dimensions, model.default_time_dimension, and
remove the scan over query.filters and DataType checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0a788864-ac8d-4467-80ca-311f6bb75688

📥 Commits

Reviewing files that changed from the base of the PR and between 99e1371 and 3288f7a.

📒 Files selected for processing (9)
  • .claude/skills/slayer-query.md
  • README.md
  • docs/concepts/formulas.md
  • slayer/core/enums.py
  • slayer/core/formula.py
  • slayer/core/query.py
  • slayer/engine/query_engine.py
  • slayer/mcp/server.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (3)
  • .claude/skills/slayer-query.md
  • slayer/mcp/server.py
  • slayer/core/query.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • slayer/core/formula.py
  • tests/test_sql_generator.py

Comment on lines +123 to +145
Filters can reference field names defined in `fields`. These are applied as post-filters on the outer query, after all transforms are computed:

```json
{
"fields": [
{"formula": "revenue"},
{"formula": "change(revenue)", "name": "rev_change"}
],
"filters": ["rev_change < 0"]
}
```

This returns only rows where revenue decreased from the previous period.

Transform expressions can also be used **directly in filters** without defining them as fields first:

```json
{
"filters": ["last(change(revenue)) < 0"]
}
```

This keeps only rows where the most recent period's revenue change is negative — useful for queries like "show me monthly data, but only for metrics that are declining." The transform is auto-extracted as a hidden field and applied as a post-filter.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten this wording to computed field names.

The engine only treats expression/transform names as post-filterable computed columns. A renamed bare measure field like {"formula": "count", "name": "n"} still won't be recognized here, so “field names defined in fields” overpromises the supported scope.

📝 Suggested wording
-Filters can reference field names defined in `fields`. These are applied as post-filters on the outer query, after all transforms are computed:
+Filters can reference computed field names defined in `fields` (expressions/transforms). These are applied as post-filters on the outer query, after all transforms are computed:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/concepts/formulas.md` around lines 123 - 145, Summary: Wording
overstates what can be used as post-filterable names — only computed/transform
expressions (computed field names) are supported, not arbitrary renamed bare
measures. Update the docs text to state that post-filters can reference computed
field names produced by transform expressions (i.e., expressions defined as
fields via a formula or auto-extracted transforms), and clarify that a bare
measure renamed with "name" (e.g., {"formula":"count","name":"n"}) is not
eligible; mention the auto-extraction of transform expressions when used
directly in "filters" remains supported.

Comment on lines 137 to +160
# Resolve time dimension for transforms that need ORDER BY time.
# Resolution chain: query main_time_dimension → query time_dimensions (if exactly 1) → model default.
# Resolution chain:
# 1. query.main_time_dimension (explicit override)
# 2. First time dimension in query.time_dimensions (groupby)
# 3. First time dimension referenced in filters
# 4. model.default_time_dimension
resolved_time_alias = None
if query.main_time_dimension:
resolved_time_alias = f"{model.name}.{query.main_time_dimension}"
if resolved_time_alias is None and time_dimensions:
if len(time_dimensions) == 1:
resolved_time_alias = time_dimensions[0].alias
resolved_time_alias = time_dimensions[0].alias
if resolved_time_alias is None and query.filters:
# Check if any filter references a time/timestamp/date dimension
time_dim_names = {
d.name for d in model.dimensions
if d.type in (DataType.TIMESTAMP, DataType.DATE)
}
for f_str in query.filters:
for td_name in time_dim_names:
if td_name in f_str:
resolved_time_alias = f"{model.name}.{td_name}"
break
if resolved_time_alias:
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore the documented time-dimension resolution chain.

Picking time_dimensions[0] and then scanning raw filter strings turns this into a heuristic lookup. With multiple time dimensions, or even a filter that merely contains a time-dimension name as a substring, last()/lag()/change() can silently order by the wrong column instead of falling back to model.default_time_dimension or raising.

As per coding guidelines, "Functions needing time ordering use resolution chain: query main_time_dimension → query time_dimensions (if exactly 1) → model default_time_dimension → error".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 137 - 160, Restore the documented
resolution chain in resolved_time_alias: first honor query.main_time_dimension
(query.main_time_dimension → f"{model.name}.{...}"), then only use
time_dimensions[0].alias if there is exactly one time dimension in
time_dimensions (i.e., len(time_dimensions) == 1), otherwise do NOT
heuristically scan query.filters or raw strings; next fall back to
model.default_time_dimension (format as
f"{model.name}.{model.default_time_dimension}" if present), and if no time
dimension is resolved raise an explicit error (e.g., ValueError) so callers know
ordering cannot be determined; update references to resolved_time_alias,
query.main_time_dimension, time_dimensions, model.default_time_dimension, and
remove the scan over query.filters and DataType checks.

Comment on lines 300 to +320
if isinstance(spec, MeasureRef):
_ensure_measure(spec.name)
# If the measure has type=last, auto-wrap with last() transform
measure_def = model.get_measure(spec.name)
if measure_def and measure_def.type == DataType.LAST:
# Rename the base measure's alias to an internal name
# so it doesn't collide with the transform's output alias
base_alias = f"{query.model}.{spec.name}"
internal_alias = f"{query.model}._base_{spec.name}"
for m in measures:
if m.alias == base_alias:
m.alias = internal_alias
known_aliases[spec.name] = internal_alias
_add_transform(
name=field_name, transform="last",
measure_alias=internal_alias, offset=1,
)
if field.label:
for t in enriched_transforms:
if t.alias == f"{query.model}.{field_name}":
t.label = field.label
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Apply LAST wrapping at every MeasureRef, not only for top-level bare fields.

This branch only runs when the whole field is a standalone MeasureRef. Nested uses like change(my_last_measure) or my_last_measure / count still resolve through _ensure_measure(), so they bypass the generated last(...) transform and can return different results depending on how the measure was first referenced.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 300 - 320, The code only
auto-wraps measures of type DataType.LAST when the field is a top-level
MeasureRef, causing nested uses (e.g., change(my_last_measure) or arithmetic
with the measure) to bypass last(...) and return inconsistent results; update
the resolution flow so that whenever a MeasureRef is encountered (not just when
the entire Field is a bare MeasureRef) you detect model.get_measure(spec.name)
with type DataType.LAST and inject the last transform via _add_transform using
the internal alias pattern (internal_alias =
f"{query.model}._base_{spec.name}"), update measures and known_aliases
accordingly, and ensure enriched_transforms labels are adjusted as currently
done — apply this logic inside the general expression resolution path that calls
_ensure_measure(spec) so nested MeasureRef nodes get wrapped the same way as
top-level ones.

Comment on lines +341 to +346
for f_str in (query.filters or []):
rewritten, extra_fields = SlayerQueryEngine._extract_filter_transforms(f_str)
for name, formula in extra_fields:
spec = parse_formula(formula)
_flatten_spec(spec, name)
processed_filters.append(rewritten)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make generated _ftN names query-scoped, not filter-scoped.

_extract_filter_transforms() restarts its counter at 0 on every call, but _enrich() invokes it once per filter string. Two different filters with inline transforms will both emit _ft0, _ft1, etc., so later filters can bind to the wrong hidden column or produce duplicate aliases in the enriched query.

Also applies to: 386-395

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 341 - 346, The inline-transform
alias generation currently resets inside
SlayerQueryEngine._extract_filter_transforms causing duplicate `_ft0/_ft1`
across filters; make those aliases query-scoped by moving the counter out of
_extract_filter_transforms and into the caller (e.g., _enrich) or the instance
(e.g., a per-query counter/generator on the engine) so each call to
_extract_filter_transforms gets a continuing sequence; update calls in the
filter-processing loop (the block that calls _extract_filter_transforms, then
parse_formula/_flatten_spec and appends to processed_filters) and the similar
block at the other site to pass/consume the shared counter so generated names
`_ftN` are unique across the entire query.

Comment on lines +381 to +384
try:
tree = _ast.parse(filter_str, mode="eval")
except SyntaxError:
return filter_str, []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

like / not like currently disable inline transform extraction.

This helper parses the raw filter as Python, so a mixed filter such as change(revenue) > 0 and name like '%foo%' hits the SyntaxError path here. The transform never gets extracted, which means the computed-column post-filter path stops working for that filter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 381 - 384, The current except
SyntaxError branch returns early and skips inline-transform extraction when
_ast.parse(filter_str, mode="eval") fails (e.g., filters containing SQL-like
operators), so change the except block to still run the same inline-transform
extraction logic used for successfully parsed trees rather than returning
filter_str, []; i.e., on SyntaxError, do not return immediately — invoke the
existing transform-extraction routine (the same code/path that processes the
parsed tree) with filter_str so computed-column transforms are extracted even
when parsing as Python fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant