Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .claude/skills/slayer-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,9 @@ filters=[

**Boolean logic**: `and`, `or`, `not` within a single string

**Functions**: `contains(col, 'val')`, `starts_with(col, 'val')`, `ends_with(col, 'val')`, `between(col, 'a', 'b')`. Filters on measures are automatically routed to HAVING.
**Pattern matching**: `like` and `not like` operators (e.g., `"name like '%acme%'"`, `"name not like '%test%'"`). Filters on measures are automatically routed to HAVING.

**Filtering on computed columns**: filters can reference field names from `fields` (e.g., `"rev_change < 0"`) or contain inline transform expressions (e.g., `"last(change(revenue)) < 0"`). These are applied as post-filters on the outer query.

## Executing

Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ poetry run ruff check slayer/ tests/
- Dimension/measure SQL uses bare column names (e.g., `"amount"`); `${TABLE}` for complex expressions
- Queries support `fields` — list of `{"formula": "...", "name": "...", "label": "..."}` parsed by `slayer/core/formula.py`. `label` is an optional human-readable display name (also supported on `ColumnRef` and `TimeDimension`)
- Available formula functions: cumsum, time_shift, change, change_pct, rank, last (FIRST_VALUE window), lag, lead. time_shift, change, and change_pct always use self-join CTEs (no edge NULLs, gap-safe). time_shift uses row-number-based join without granularity, date-arithmetic-based with granularity. lag/lead use LAG/LEAD window functions directly (more efficient but produce NULLs at edges)
- Filters can reference computed field names or contain inline transform expressions (e.g., `"change(revenue) > 0"`, `"last(change(revenue)) < 0"`). These are auto-extracted as hidden fields and applied as post-filters on the outer query
- Functions needing time ordering use resolution chain: query main_time_dimension -> query time_dimensions (if exactly 1) -> model default_time_dimension -> error
- SlayerModel has optional `default_time_dimension` field for time-dependent formula resolution
- SQLite dialect uses STRFTIME instead of DATE_TRUNC (handled automatically by sqlglot)
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,12 @@ Filters use simple formula strings — no verbose JSON objects:
"filters": ["status == 'completed' or status == 'pending'"]
```

**Functions**: `contains(col, 'val')`, `starts_with(col, 'val')`, `ends_with(col, 'val')`, `between(col, 'a', 'b')`. Filters on measures (e.g., `"count > 10"`) are automatically routed to HAVING.
**Pattern matching**: `like` and `not like` operators (e.g., `"name like '%acme%'"`, `"name not like '%test%'"`). Filters on measures (e.g., `"count > 10"`) are automatically routed to HAVING.

**Computed column filters**: filters can reference field names or contain inline transform expressions. These are applied as post-filters after all transforms are computed:
```json
"filters": ["change(revenue_sum) > 0", "last(change(revenue_sum)) < 0"]
```


## Auto-Ingestion
Expand Down
45 changes: 36 additions & 9 deletions docs/concepts/formulas.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ Functions apply window operations to measures:
| `time_shift(x, -n)` | Value N periods back | Self-join CTE on row number |
| `time_shift(x, 1)` | Next period's value | Self-join CTE on row number |
| `time_shift(x, offset, gran)` | Value from a different calendar time bucket | Self-join CTE on date arithmetic |
| `change(x)` | Difference from previous period | Self-join CTE (current - previous) |
| `change_pct(x)` | Percentage change from previous period | Self-join CTE ((current - previous) / previous) |
| `lag(x, n)` | Value N rows back (window function) | `LAG(x, n) OVER (ORDER BY time)` |
| `lead(x, n)` | Value N rows ahead (window function) | `LEAD(x, n) OVER (ORDER BY time)` |
| `change(x)` | Difference from previous period | Self-join CTE (current - previous) |
| `change_pct(x)` | Percentage change from previous period | Self-join CTE ((current - previous) / previous) |
| `rank(x)` | Ranking by value (descending) | `RANK() OVER (ORDER BY x DESC)` |
| `last(x)` | Most recent time bucket's value | `FIRST_VALUE(x) OVER (ORDER BY time DESC ...)` |

Expand Down Expand Up @@ -102,6 +102,8 @@ Filter formulas define conditions for the query. They go in the `filters` parame
| `in` | `"status in ('active', 'pending')"` |
| `is None` | `"discount is None"` (IS NULL) |
| `is not None` | `"discount is not None"` (IS NOT NULL) |
| `like` | `"name like '%acme%'"` |
| `not like` | `"name not like '%test%'"` |

### Boolean Logic

Expand All @@ -116,14 +118,39 @@ Use `and`, `or`, `not` within a single filter string:

Multiple entries in the `filters` list are combined with AND.

### Filter Functions
### Filtering on Computed Columns

| Function | Example | SQL |
|----------|---------|-----|
| `contains(col, val)` | `"contains(name, 'acme')"` | `name LIKE '%acme%'` |
| `starts_with(col, val)` | `"starts_with(name, 'A')"` | `name LIKE 'A%'` |
| `ends_with(col, val)` | `"ends_with(email, '.com')"` | `email LIKE '%.com'` |
| `between(col, low, high)` | `"between(amount, 100, 500)"` | `amount BETWEEN 100 AND 500` |
Filters can reference field names defined in `fields`. These are applied as post-filters on the outer query, after all transforms are computed:

```json
{
"fields": [
{"formula": "revenue"},
{"formula": "change(revenue)", "name": "rev_change"}
],
"filters": ["rev_change < 0"]
}
```

This returns only rows where revenue decreased from the previous period.

Transform expressions can also be used **directly in filters** without defining them as fields first:

```json
{
"filters": ["last(change(revenue)) < 0"]
}
```

This keeps only rows where the most recent period's revenue change is negative — useful for queries like "show me monthly data, but only for metrics that are declining." The transform is auto-extracted as a hidden field and applied as a post-filter.
Comment on lines +123 to +145
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten this wording to computed field names.

The engine only treats expression/transform names as post-filterable computed columns. A renamed bare measure field like {"formula": "count", "name": "n"} still won't be recognized here, so “field names defined in fields” overpromises the supported scope.

📝 Suggested wording
-Filters can reference field names defined in `fields`. These are applied as post-filters on the outer query, after all transforms are computed:
+Filters can reference computed field names defined in `fields` (expressions/transforms). These are applied as post-filters on the outer query, after all transforms are computed:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/concepts/formulas.md` around lines 123 - 145, Summary: Wording
overstates what can be used as post-filterable names — only computed/transform
expressions (computed field names) are supported, not arbitrary renamed bare
measures. Update the docs text to state that post-filters can reference computed
field names produced by transform expressions (i.e., expressions defined as
fields via a formula or auto-extracted transforms), and clarify that a bare
measure renamed with "name" (e.g., {"formula":"count","name":"n"}) is not
eligible; mention the auto-extraction of transform expressions when used
directly in "filters" remains supported.


Post-filters can be combined with regular filters — base filters (on dimensions/measures) are applied in the inner query, post-filters on the outer wrapper:

```json
{
"filters": ["status == 'completed'", "change(revenue) > 0"]
}
```

---

Expand Down
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ A lightweight, open-source semantic layer by [MotleyAI](https://github.com/motle
## Key Features

- **Agent-first design** — MCP, Python SDK, and REST API interfaces
- **Datasource-agnostic** — Postgres, MySQL, BigQuery, Snowflake, and more via sqlglot
- **Datasource-agnostic** — first-class support for Postgres, MySQL, ClickHouse, and SQLite; additional support for Snowflake, BigQuery, Oracle, Redshift, DuckDB, and more via sqlglot
- **`fields` API** — derived metrics with formulas, transforms (`cumsum`, `time_shift`, `change`), and inline transform filters
- **Auto-ingestion with rollup joins** — Connect to a DB, introspect schema, generate denormalized models with FK-based LEFT JOINs automatically
- **Incremental model editing** — Add/remove measures and dimensions without replacing the full model
- **Lightweight** — Minimal dependencies, easy to set up and extend
Expand Down
6 changes: 3 additions & 3 deletions examples/embedded/verify.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,15 +150,15 @@ def check(name, condition):
cumvals = [r["orders.cumulative"] for r in result.data]
check("cumsum non-decreasing", all(a <= b for a, b in zip(cumvals, cumvals[1:])))

# Lag
# time_shift (row-based, previous period)
result = engine.execute(query=SlayerQuery(
model="orders",
time_dimensions=[{"dimension": {"name": "created_at"}, "granularity": "month"}],
fields=[Field(formula="count"), Field(formula="time_shift(count, -1)", name="prev")],
order=[{"column": {"name": "created_at"}, "direction": "asc"}],
))
check("lag first month is null", result.data[0]["orders.prev"] is None)
check("lag second month = first month count", result.data[1]["orders.prev"] == result.data[0]["orders.count"])
check("time_shift first month is null", result.data[0]["orders.prev"] is None)
check("time_shift second month = first month count", result.data[1]["orders.prev"] == result.data[0]["orders.count"])

# Change
result = engine.execute(query=SlayerQuery(
Expand Down
8 changes: 3 additions & 5 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ python = "^3.11"
sqlglot = ">=20.0"
sqlalchemy = ">=2.0"
pydantic = ">=2.0"
python-dateutil = ">=2.8"
pyyaml = ">=6.0"
fastapi = ">=0.100"
uvicorn = ">=0.20"
Expand Down
3 changes: 3 additions & 0 deletions slayer/core/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ class DataType(StrEnum):
AVERAGE = "avg"
MIN = "min"
MAX = "max"
LAST = "last"

@property
def is_aggregation(self) -> bool:
Expand All @@ -31,6 +32,7 @@ def is_aggregation(self) -> bool:
DataType.AVERAGE,
DataType.MIN,
DataType.MAX,
DataType.LAST,
)

@property
Expand All @@ -47,6 +49,7 @@ def python_type(self) -> type:
DataType.AVERAGE: float,
DataType.MIN: float,
DataType.MAX: float,
DataType.LAST: float,
}[self]


Expand Down
78 changes: 57 additions & 21 deletions slayer/core/formula.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,8 +194,8 @@ def _parse_literal(node: ast.AST, original: str) -> Any:
# Filter parsing
# ---------------------------------------------------------------------------

# String filter functions (no Python operator equivalent)
FILTER_FUNCTIONS = {"contains", "starts_with", "ends_with", "between"}
# Internal filter functions (used after pre-processing operators like `like`)
FILTER_FUNCTIONS = {"__like__", "__notlike__"}


@dataclass
Expand All @@ -208,6 +208,38 @@ class ParsedFilter:
sql: str # e.g., "status = 'completed'"
columns: List[str] # Column names referenced
is_having: bool = False # True if this is a HAVING filter (aggregate condition)
is_post_filter: bool = False # True if this references a computed column (transform/expression)


def _preprocess_like(formula: str) -> str:
"""Convert `like` and `not like` operators to internal function calls for AST parsing.

"name like '%acme%'" → "__like__(name, '%acme%')"
"name not like '%acme%'" → "__notlike__(name, '%acme%')"
"""
import re
formula = re.sub(
r'\b(\w+)\s+not\s+like\s+',
r'__notlike__(\1, ',
formula, flags=re.IGNORECASE,
)
# Close the parenthesis: find the string argument and close after it
formula = re.sub(
r'(__notlike__\([^,]+,\s*\'[^\']*\')',
r'\1)',
formula,
)
formula = re.sub(
r'\b(\w+)\s+like\s+',
r'__like__(\1, ',
formula, flags=re.IGNORECASE,
)
formula = re.sub(
r'(__like__\([^,]+,\s*\'[^\']*\')',
r'\1)',
formula,
)
return formula


def parse_filter(formula: str) -> ParsedFilter:
Expand All @@ -222,13 +254,13 @@ def parse_filter(formula: str) -> ParsedFilter:
"status in ('a', 'b', 'c')" → WHERE status IN ('a', 'b', 'c')
"status is None" → WHERE status IS NULL
"status is not None" → WHERE status IS NOT NULL
"contains(name, 'acme')" → WHERE name LIKE '%acme%'
"starts_with(name, 'A')" → WHERE name LIKE 'A%'
"ends_with(name, 'Inc')" → WHERE name LIKE '%Inc'
"between(created_at, '2024-01-01', '2024-12-31')" → WHERE created_at BETWEEN '...' AND '...'
"name like '%acme%'" → WHERE name LIKE '%acme%'
"name not like '%test%'" → WHERE name NOT LIKE '%test%'
"""
# Pre-process `like` / `not like` operators into internal function calls
processed = _preprocess_like(formula)
try:
tree = ast.parse(formula, mode="eval")
tree = ast.parse(processed, mode="eval")
except SyntaxError as e:
raise ValueError(f"Invalid filter syntax: {formula!r} — {e}")

Expand Down Expand Up @@ -296,26 +328,30 @@ def _filter_node_to_sql(node: ast.AST, original: str, columns: list[str]) -> str
elts = [_filter_node_to_sql(e, original, columns) for e in node.elts]
return f"({', '.join(elts)})"

# Function call → contains, starts_with, ends_with, between
# Arithmetic expression (e.g., change / revenue in a filter LHS)
if isinstance(node, ast.BinOp):
op_map = {
ast.Add: "+", ast.Sub: "-", ast.Mult: "*",
ast.Div: "/", ast.Mod: "%", ast.Pow: "**",
}
op_str = op_map.get(type(node.op))
if op_str is None:
raise ValueError(f"Unsupported arithmetic operator in filter: {original!r}")
left = _filter_node_to_sql(node.left, original, columns)
right = _filter_node_to_sql(node.right, original, columns)
return f"{left} {op_str} {right}"

# Internal function calls for like/not like operators
if isinstance(node, ast.Call) and isinstance(node.func, ast.Name):
func_name = node.func.id
if func_name == "contains" and len(node.args) >= 2:
if func_name == "__like__" and len(node.args) >= 2:
col = _filter_node_to_sql(node.args[0], original, columns)
val = _get_string_arg(node.args[1], original)
return f"{col} LIKE '%{val}%'"
elif func_name == "starts_with" and len(node.args) >= 2:
return f"{col} LIKE '{val}'"
elif func_name == "__notlike__" and len(node.args) >= 2:
col = _filter_node_to_sql(node.args[0], original, columns)
val = _get_string_arg(node.args[1], original)
return f"{col} LIKE '{val}%'"
elif func_name == "ends_with" and len(node.args) >= 2:
col = _filter_node_to_sql(node.args[0], original, columns)
val = _get_string_arg(node.args[1], original)
return f"{col} LIKE '%{val}'"
elif func_name == "between" and len(node.args) >= 3:
col = _filter_node_to_sql(node.args[0], original, columns)
low = _filter_node_to_sql(node.args[1], original, columns)
high = _filter_node_to_sql(node.args[2], original, columns)
return f"{col} BETWEEN {low} AND {high}"
return f"{col} NOT LIKE '{val}'"
raise ValueError(f"Unknown filter function '{func_name}' in: {original!r}")

raise ValueError(f"Unsupported filter syntax: {original!r}")
Expand Down
2 changes: 1 addition & 1 deletion slayer/core/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ class SlayerQuery(BaseModel):
def snap_to_whole_periods(self) -> "SlayerQuery":
"""Adjust date filters to align with period boundaries when whole_periods_only=True.

For each time dimension with a granularity, adds a between() filter
For each time dimension with a granularity, adds a date range filter
to exclude the current incomplete period if no date filter exists.
"""
if not self.whole_periods_only or not self.time_dimensions:
Expand Down
Loading
Loading