Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
03f0ad4
feat: Add Snowflake Semantic Views support
cpsievert Jan 21, 2026
ad96677
`air format` (GitHub Actions)
cpsievert Jan 21, 2026
1a19d32
fix: Address Snowflake Semantic Views code review concerns
cpsievert Jan 26, 2026
c3035b5
`air format` (GitHub Actions)
cpsievert Jan 26, 2026
83c271a
refactor: Extract Snowflake semantic views into adapter pattern
cpsievert Jan 26, 2026
42b8ece
chore: Remove backwards-compat re-exports from _datasource.py
cpsievert Jan 26, 2026
2826891
refactor: Move Snowflake semantic view discovery into base classes
cpsievert Jan 26, 2026
8c40db4
refactor: Move SEMANTIC_VIEW_SYNTAX to shared prompt files
cpsievert Jan 26, 2026
78d4f3e
`air format` (GitHub Actions)
cpsievert Jan 26, 2026
df33e82
Update pkg-py/src/querychat/_querychat_base.py
cpsievert Jan 26, 2026
1f0d888
refactor: Extract Snowflake semantic views into adapter pattern
cpsievert Jan 26, 2026
05fe216
chore: Remove is_snowflake template variable
cpsievert Jan 26, 2026
a652a67
refactor: Use isinstance() checks instead of type parameter
cpsievert Jan 26, 2026
1bade8e
chore: Remove _semantic_views class attribute declarations
cpsievert Jan 26, 2026
400cc22
refactor: Merge SnowflakeSource semantic view logic into DBISource
cpsievert Jan 26, 2026
7a029aa
(local)
cpsievert Jan 26, 2026
42f6d4c
chore: Minimize docstrings and remove redundant comments
cpsievert Jan 26, 2026
1c19adb
`air format` (GitHub Actions)
cpsievert Jan 26, 2026
4c88c24
`devtools::document()` (GitHub Actions)
cpsievert Jan 26, 2026
20bb7ff
refactor: Remove _semantic_views attribute and has_semantic_views pro…
cpsievert Jan 26, 2026
eda50f8
refactor: Consolidate semantic view information into single prompt se…
cpsievert Jan 26, 2026
6aa6a63
`air format` (GitHub Actions)
cpsievert Jan 26, 2026
1a77f67
`devtools::document()` (GitHub Actions)
cpsievert Jan 26, 2026
7cb9d36
refactor: Add semantic view methods to DataSource base class
cpsievert Jan 26, 2026
65b1bdc
refactor: Move semantic view discovery into get_semantic_view_ddls
cpsievert Jan 26, 2026
b7d2347
refactor: Remove has_semantic_views(), improve prompt text
cpsievert Jan 26, 2026
4fda131
`devtools::document()` (GitHub Actions)
cpsievert Jan 26, 2026
4f70663
refactor: Simplify semantic views, adjust prompt structure
cpsievert Jan 26, 2026
69863da
Apply suggestions from code review
cpsievert Jan 26, 2026
3e7bd9c
refactor: Restructure semantic views into dedicated directory
cpsievert Jan 26, 2026
b2d6bf3
`air format` (GitHub Actions)
cpsievert Jan 26, 2026
cfbb9cb
`devtools::document()` (GitHub Actions)
cpsievert Jan 26, 2026
3c73e5d
style: Revert formatting changes in DBISource.R
cpsievert Jan 26, 2026
27867cc
fix: Use raw_sql() for Ibis backends to support SHOW commands
cpsievert Jan 27, 2026
eec140a
fix: Address PR feedback and fix pyright error
cpsievert Jan 27, 2026
8814ab3
Add brand.yml website dependency
cpsievert Jan 27, 2026
334635b
refactor: Rename semantic views methods for clarity
cpsievert Jan 27, 2026
e76c80c
`devtools::document()` (GitHub Actions)
cpsievert Jan 27, 2026
4d88a77
style: Revert formatting changes and _shiny.py modification
cpsievert Jan 27, 2026
4aee0c7
docs: Add Snowflake Semantic Views to CHANGELOG and NEWS
cpsievert Jan 27, 2026
f6bd820
refactor: Update docstrings and revert formatting changes
cpsievert Jan 27, 2026
17ce54d
`devtools::document()` (GitHub Actions)
cpsievert Jan 27, 2026
f357d39
style: Revert list comprehension formatting to original style
cpsievert Jan 27, 2026
cbe2d2f
style: Revert unnecessary R formatting changes
cpsievert Jan 27, 2026
9b995fc
chore: Remove accidentally committed files
cpsievert Jan 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docs-r-pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::pkgdown, local::.
extra-packages: any::pkgdown, any::brand.yml, local::.
needs: website
working-directory: pkg-r

Expand Down
6 changes: 6 additions & 0 deletions pkg-py/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### New features

* Added support for Snowflake Semantic Views. When connected to Snowflake (via SQLAlchemy or Ibis), querychat automatically discovers available Semantic Views and includes their definitions in the system prompt. This helps the LLM generate correct queries using the `SEMANTIC_VIEW()` table function with certified business metrics and dimensions. (#200)

## [0.5.1] - 2026-01-23

### New features
Expand Down
28 changes: 24 additions & 4 deletions pkg-py/src/querychat/_datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
from sqlalchemy.sql import sqltypes

from ._df_compat import read_sql
from ._snowflake import (
discover_semantic_views,
format_semantic_views,
)
from ._utils import as_narwhals, check_query

if TYPE_CHECKING:
Expand Down Expand Up @@ -179,6 +183,10 @@ def cleanup(self) -> None:

"""

def get_semantic_views_description(self) -> str:
"""Get information about semantic views (if any) for the system prompt."""
return ""


class DataFrameSource(DataSource[IntoDataFrameT]):
"""A DataSource implementation that wraps a DataFrame using DuckDB."""
Expand Down Expand Up @@ -489,6 +497,13 @@ def get_schema(self, *, categorical_threshold: int) -> str:
self._add_column_stats(columns, categorical_threshold)
return format_schema(self.table_name, columns)

def get_semantic_views_description(self) -> str:
"""Get information about semantic views (if any) for the system prompt."""
if self._engine.dialect.name.lower() != "snowflake":
return ""
views = discover_semantic_views(self._engine)
return format_semantic_views(views)

@staticmethod
def _make_column_meta(name: str, sa_type: sqltypes.TypeEngine) -> ColumnMeta:
"""Create ColumnMeta from SQLAlchemy type."""
Expand Down Expand Up @@ -895,8 +910,7 @@ def _add_column_stats(

# Find text columns that qualify as categorical
categorical_cols = [
col
for col in columns
col for col in columns
if col.kind == "text"
and (nunique := stats.get(f"{col.name}__nunique"))
and nunique <= categorical_threshold
Expand Down Expand Up @@ -960,6 +974,13 @@ def get_schema(self, *, categorical_threshold: int) -> str:
self._add_column_stats(columns, self._table, categorical_threshold)
return format_schema(self.table_name, columns)

def get_semantic_views_description(self) -> str:
"""Get information about semantic views (if any) for the system prompt."""
if self._backend.name.lower() != "snowflake":
return ""
views = discover_semantic_views(self._backend)
return format_semantic_views(views)

@staticmethod
def _make_column_meta(name: str, dtype: IbisDataType) -> ColumnMeta:
"""Create ColumnMeta from an ibis dtype."""
Expand Down Expand Up @@ -1018,8 +1039,7 @@ def _add_column_stats(
col.max_val = stats.get(f"{col.name}__max")

categorical_cols = [
col
for col in columns
col for col in columns
if col.kind == "text"
and (nunique := stats.get(f"{col.name}__nunique"))
and nunique <= categorical_threshold
Expand Down
126 changes: 126 additions & 0 deletions pkg-py/src/querychat/_snowflake.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
"""
Snowflake-specific utilities for semantic view discovery.

This module provides functions for discovering Snowflake Semantic Views,
supporting both SQLAlchemy engines and Ibis backends via isinstance() checks.
"""

from __future__ import annotations

import logging
import os
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any

import sqlalchemy

if TYPE_CHECKING:
from ibis.backends.sql import SQLBackend

logger = logging.getLogger(__name__)


@dataclass
class SemanticViewInfo:
"""Metadata for a Snowflake Semantic View."""

name: str
"""Fully qualified name (database.schema.view_name)."""

ddl: str
"""The DDL definition from GET_DDL()."""


def execute_raw_sql(
query: str,
backend: sqlalchemy.Engine | SQLBackend,
) -> list[dict[str, Any]]:
"""Execute raw SQL and return results as list of row dicts."""
if isinstance(backend, sqlalchemy.Engine):
with backend.connect() as conn:
result = conn.execute(sqlalchemy.text(query))
keys = list(result.keys())
return [dict(zip(keys, row, strict=False)) for row in result.fetchall()]
else:
with backend.raw_sql(query) as cursor: # type: ignore[union-attr]
columns = [desc[0] for desc in cursor.description]
return [dict(zip(columns, row, strict=False)) for row in cursor.fetchall()]


def discover_semantic_views(
backend: sqlalchemy.Engine | SQLBackend,
) -> list[SemanticViewInfo]:
"""Discover semantic views in the current schema."""
if os.environ.get("QUERYCHAT_DISABLE_SEMANTIC_VIEWS"):
return []

rows = execute_raw_sql("SHOW SEMANTIC VIEWS", backend)

if not rows:
logger.debug("No semantic views found in current schema")
return []

views: list[SemanticViewInfo] = []
for row in rows:
db = row.get("database_name")
schema = row.get("schema_name")
name = row.get("name")

if not name:
continue

fq_name = f"{db}.{schema}.{name}"
ddl = get_semantic_view_ddl(backend, fq_name)
if ddl:
views.append(SemanticViewInfo(name=fq_name, ddl=ddl))

return views


def get_semantic_view_ddl(
backend: sqlalchemy.Engine | SQLBackend,
fq_name: str,
) -> str | None:
"""Get DDL for a semantic view by fully qualified name."""
safe_name = fq_name.replace("'", "''")
rows = execute_raw_sql(f"SELECT GET_DDL('SEMANTIC_VIEW', '{safe_name}')", backend)
if rows:
return str(next(iter(rows[0].values())))
return None


def format_semantic_view_ddls(semantic_views: list[SemanticViewInfo]) -> str:
"""Format just the DDL definitions for semantic views."""
lines: list[str] = []

for sv in semantic_views:
lines.append(f"### Semantic View: `{sv.name}`")
lines.append("")
lines.append("```sql")
lines.append(sv.ddl)
lines.append("```")
lines.append("")

return "\n".join(lines)


def format_semantic_views(semantic_views: list[SemanticViewInfo]) -> str:
"""Build the complete semantic views section for the prompt."""
if not semantic_views:
return ""

from importlib.resources import files

prompts = files("querychat.prompts.semantic-views")
prompt_text = (prompts / "prompt.md").read_text()
syntax_text = (prompts / "syntax.md").read_text()
ddls_text = format_semantic_view_ddls(semantic_views)

return f"""{prompt_text}

{syntax_text}

<semantic_views>
{ddls_text}
</semantic_views>
"""
6 changes: 4 additions & 2 deletions pkg-py/src/querychat/_system_prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,13 @@ def render(self, tools: tuple[TOOL_GROUPS, ...] | None) -> str:
Fully rendered system prompt string

"""
is_duck_db = self.data_source.get_db_type().lower() == "duckdb"
db_type = self.data_source.get_db_type()
is_duck_db = db_type.lower() == "duckdb"

context = {
"db_type": self.data_source.get_db_type(),
"db_type": db_type,
"is_duck_db": is_duck_db,
"semantic_views": self.data_source.get_semantic_views_description(),
"schema": self.schema,
"data_description": self.data_description,
"extra_instructions": self.extra_instructions,
Expand Down
1 change: 1 addition & 0 deletions pkg-py/src/querychat/prompts/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ quantile_cont(salary, 0.5)
```

{{/is_duck_db}}
{{{semantic_views}}}
## Your Capabilities

You can handle these types of requests:
Expand Down
5 changes: 5 additions & 0 deletions pkg-py/src/querychat/prompts/semantic-views/prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Semantic Views

**IMPORTANT**: This database has Semantic Views available. Semantic Views provide a curated layer over raw data with pre-defined metrics, dimensions, and relationships. They encode business logic and calculation rules that ensure consistent, accurate results. When a Semantic View covers the data you need, prefer it over raw tables to benefit from these certified definitions (that is, use the `SEMANTIC_VIEW()` table function where appropriate when generating SQL).

**Real-world example**: A legacy ERP database had a revenue column (`X_AMT`) with hidden business rules—only status code 90 transactions count as realized revenue, and a discount factor (`ADJ_FCTR`) must be applied. Querying raw tables for "external customer revenue" returned **$184B**. The same query using the semantic model's certified `NET_REVENUE` metric returned **$84.5B**—the correct answer. The raw query was **2x+ too high** because it ignored discounts and included invalid transaction codes.
96 changes: 96 additions & 0 deletions pkg-py/src/querychat/prompts/semantic-views/syntax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
### SEMANTIC_VIEW() Query Syntax

#### Basic Syntax

```sql
SELECT * FROM SEMANTIC_VIEW(
{view_name}
METRICS {logical_table}.{metric_name}
DIMENSIONS {logical_table}.{dimension_name}
[WHERE {dimension} = 'value'] -- Optional: pre-aggregation filter
)
[WHERE {column} = 'value'] -- Optional: post-aggregation filter
```

#### Key Rules

1. **Use `SEMANTIC_VIEW()` function** - Not direct SELECT FROM the view
2. **No GROUP BY needed** - Semantic layer handles aggregation via DIMENSIONS
3. **No JOINs needed within model** - Relationships are pre-defined
4. **No aggregate functions needed** - Metrics are pre-aggregated
5. **Use DDL-defined names** - Metrics and dimensions must match the DDL exactly

#### WHERE Clause: Inside vs Outside

- **Inside** (pre-aggregation): Filters base data BEFORE metrics are computed
- **Outside** (post-aggregation): Filters results AFTER metrics are computed

```sql
-- Pre-aggregation: only include 'EXT' accounts in the calculation
SELECT * FROM SEMANTIC_VIEW(
MODEL_NAME
METRICS T_DATA.NET_REVENUE
DIMENSIONS REF_ENTITIES.ACC_TYPE_CD
WHERE REF_ENTITIES.ACC_TYPE_CD = 'EXT'
)

-- Post-aggregation: compute all, then filter results
SELECT * FROM SEMANTIC_VIEW(
MODEL_NAME
METRICS T_DATA.NET_REVENUE
DIMENSIONS REF_ENTITIES.ACC_TYPE_CD
)
WHERE NET_REVENUE > 1000000
```

#### Common Patterns

**Single metric (total):**
```sql
SELECT * FROM SEMANTIC_VIEW(MODEL_NAME METRICS T_DATA.NET_REVENUE)
```

**Metric by dimension:**
```sql
SELECT * FROM SEMANTIC_VIEW(
MODEL_NAME
METRICS T_DATA.NET_REVENUE
DIMENSIONS REF_ENTITIES.ACC_TYPE_CD
)
```

**Multiple metrics and dimensions:**
```sql
SELECT * FROM SEMANTIC_VIEW(
MODEL_NAME
METRICS T_DATA.NET_REVENUE, T_DATA.GROSS_REVENUE
DIMENSIONS REF_ENTITIES.ACC_TYPE_CD, T_DATA.LOG_DT
)
ORDER BY LOG_DT ASC
```

**Time series:**
```sql
SELECT * FROM SEMANTIC_VIEW(
MODEL_NAME
METRICS T_DATA.NET_REVENUE
DIMENSIONS T_DATA.LOG_DT
)
ORDER BY LOG_DT ASC
```

**Join results with other data:**
```sql
SELECT sv.*, lookup.category_name
FROM SEMANTIC_VIEW(
MODEL_NAME
METRICS T_DATA.NET_REVENUE
DIMENSIONS REF_ENTITIES.ACC_TYPE_CD
) AS sv
JOIN category_lookup AS lookup ON sv.ACC_TYPE_CD = lookup.code
```

#### Troubleshooting

- **"Invalid identifier"**: Verify metric/dimension names match exactly what's in the DDL
- **Syntax error**: Use SEMANTIC_VIEW() function, GROUP BY isn't needed
Loading