Skip to content

Commit ffa1659

Browse files
committed
Add design spec for expression evaluator (sub-project 2)
1 parent df21f0f commit ffa1659

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
# SQL Engine Expression Evaluator — Design Specification
2+
3+
## Overview
4+
5+
The expression evaluator connects the SQL parser to the query engine. It takes a parser AST expression node, resolves column references via a caller-provided callback, and returns a `Value` using the type system from sub-project 1.
6+
7+
This is sub-project 2 of the query engine. It depends on sub-project 1 (type system) and the SQL parser.
8+
9+
### Goals
10+
11+
- **Evaluate any SQL expression** from the parser's AST: literals, column refs, arithmetic, comparison, logical operators, function calls, LIKE, IS NULL, BETWEEN, IN, CASE/WHEN
12+
- **Dialect-aware** via `CoercionRules<D>` for type promotion and `FunctionRegistry<D>` for function dispatch
13+
- **Correct NULL handling** via three-valued logic throughout
14+
- **Column resolution via callback** — no row format imposed on the caller
15+
- **Integration milestone** — first time parser and engine connect end-to-end
16+
17+
### Constraints
18+
19+
- C++17
20+
- Uses parser's AST (`AstNode`, `NodeType`) and arena
21+
- Uses type system's `Value`, `CoercionRules<D>`, `null_semantics`, `FunctionRegistry<D>`
22+
- Header-only (`expression_eval.h`, `like.h`)
23+
- No exceptions — errors return `value_null()`
24+
25+
### Non-Goals (deferred)
26+
27+
- Row/tuple representation (sub-project 4)
28+
- Aggregate function evaluation (needs executor with row-group state)
29+
- Subquery evaluation (needs full executor — `IN (SELECT ...)` returns NULL for now)
30+
- Window functions
31+
- ORDER BY / LIMIT execution
32+
33+
---
34+
35+
## Core Interface
36+
37+
```cpp
38+
template <Dialect D>
39+
Value evaluate_expression(const AstNode* expr,
40+
const std::function<Value(StringRef)>& resolve,
41+
FunctionRegistry<D>& functions,
42+
Arena& arena);
43+
```
44+
45+
**Parameters:**
46+
- `expr` — AST expression node from the parser
47+
- `resolve` — callback that maps column names to values. Called when the evaluator hits `NODE_COLUMN_REF` or `NODE_QUALIFIED_NAME`. For qualified names (`table.column`), the callback receives the full qualified string.
48+
- `functions` — function registry with built-in functions registered
49+
- `arena` — for allocating intermediate string results
50+
51+
**Returns:** A `Value`. Returns `value_null()` on error or unsupported node types.
52+
53+
---
54+
55+
## AST Node Dispatch
56+
57+
The evaluator switches on `expr->type`:
58+
59+
### Leaf nodes (no recursion)
60+
61+
| NodeType | Action |
62+
|---|---|
63+
| `NODE_LITERAL_INT` | Parse `expr->value()` string to int64 via `strtoll` |
64+
| `NODE_LITERAL_FLOAT` | Parse `expr->value()` string to double via `strtod` |
65+
| `NODE_LITERAL_STRING` | `value_string(expr->value())` |
66+
| `NODE_LITERAL_NULL` | `value_null()` |
67+
| `NODE_COLUMN_REF` | `resolve(expr->value())` |
68+
| `NODE_ASTERISK` | `value_string(StringRef{"*", 1})` (for `COUNT(*)`) |
69+
| `NODE_PLACEHOLDER` | `value_null()` (unresolved placeholder) |
70+
| `NODE_IDENTIFIER` | `resolve(expr->value())` (column or keyword-as-value) |
71+
72+
### Qualified name
73+
74+
`NODE_QUALIFIED_NAME` has two children (table, column). Combine into `"table.column"` and call `resolve()`.
75+
76+
### Binary operators
77+
78+
`NODE_BINARY_OP` has value = operator text, two children (left, right).
79+
80+
```
81+
1. If operator is AND/OR: use short-circuit evaluation (see below)
82+
2. Evaluate left child → left_val
83+
3. Evaluate right child → right_val
84+
4. NULL propagation: if either NULL → return value_null()
85+
5. Find common type: CoercionRules<D>::common_type(left_val.tag, right_val.tag)
86+
6. Coerce both to common type
87+
7. Apply operator → return result
88+
```
89+
90+
**Short-circuit for AND/OR:**
91+
- `AND`: if left is FALSE → return FALSE without evaluating right. If left is NULL → evaluate right; if right is FALSE → FALSE, else NULL.
92+
- `OR`: if left is TRUE → return TRUE without evaluating right. If left is NULL → evaluate right; if right is TRUE → TRUE, else NULL.
93+
94+
**Arithmetic operators** (`+`, `-`, `*`, `/`, `%`, `DIV`, `MOD`):
95+
- Operate on coerced numeric values (int64 or double)
96+
- Division by zero → NULL
97+
- `%` / `MOD` → integer remainder
98+
- `DIV` → integer division (truncate toward zero)
99+
100+
**Comparison operators** (`=`, `<>`, `!=`, `<`, `>`, `<=`, `>=`):
101+
- Compare coerced values
102+
- Return `value_bool(result)`
103+
104+
**String operators:**
105+
- `||` in PostgreSQL: string concatenation
106+
- `||` in MySQL: logical OR
107+
- `LIKE`: delegate to `match_like<D>()`
108+
109+
### Unary operators
110+
111+
`NODE_UNARY_OP` has value = operator text, one child.
112+
113+
| Operator | Action |
114+
|---|---|
115+
| `-` | Negate: int → `-int_val`, double → `-double_val`. NULL → NULL. |
116+
| `NOT` | `null_semantics::eval_not(child_val)` |
117+
| `+` | No-op (unary plus) |
118+
119+
### IS NULL / IS NOT NULL
120+
121+
`NODE_IS_NULL` / `NODE_IS_NOT_NULL` have one child.
122+
123+
- Evaluate child → `value_bool(child.is_null())` or `value_bool(!child.is_null())`
124+
- **Never returns NULL** — IS NULL always returns TRUE or FALSE
125+
126+
### BETWEEN
127+
128+
`NODE_BETWEEN` has three children: expr, low, high.
129+
130+
- Evaluate all three
131+
- Equivalent to `expr >= low AND expr <= high`
132+
- Uses coercion for comparison
133+
- NULL propagation: if any is NULL, follows standard comparison NULL rules
134+
135+
### IN list
136+
137+
`NODE_IN_LIST` has N children: first is the expression, rest are values.
138+
139+
- Evaluate the expression
140+
- If expression is NULL → return NULL
141+
- Evaluate each value, compare with `=`
142+
- If any match → TRUE
143+
- If no match but any comparison was NULL → NULL
144+
- If no match and no NULLs → FALSE
145+
146+
### CASE/WHEN
147+
148+
`NODE_CASE_WHEN` children are interleaved: [case_expr], when1, then1, when2, then2, ..., [else_expr].
149+
150+
**Searched CASE** (no case_expr — first child is a WHEN condition):
151+
```
152+
For each WHEN/THEN pair:
153+
evaluate WHEN condition
154+
if TRUE → evaluate and return THEN value
155+
If no match and ELSE exists → evaluate and return ELSE
156+
If no match and no ELSE → NULL
157+
```
158+
159+
**Simple CASE** (first child is case_expr):
160+
```
161+
Evaluate case_expr
162+
For each WHEN/THEN pair:
163+
evaluate WHEN value
164+
if case_expr = WHEN value → evaluate and return THEN
165+
If no match → ELSE or NULL
166+
```
167+
168+
### Function calls
169+
170+
`NODE_FUNCTION_CALL` has value = function name, children = arguments.
171+
172+
```
173+
1. Lookup function in FunctionRegistry<D> by name
174+
2. If not found → return value_null()
175+
3. Evaluate each argument child into Value array
176+
4. Call function(args, arg_count, arena)
177+
5. Return result
178+
```
179+
180+
### Subquery (deferred)
181+
182+
`NODE_SUBQUERY` → return `value_null()`. Full subquery evaluation requires the executor.
183+
184+
---
185+
186+
## LIKE Pattern Matching
187+
188+
```cpp
189+
template <Dialect D>
190+
bool match_like(StringRef text, StringRef pattern, char escape_char = '\\');
191+
```
192+
193+
**Pattern rules:**
194+
- `%` matches zero or more characters
195+
- `_` matches exactly one character
196+
- Escape character (default `\`) makes the next character literal
197+
198+
**Dialect differences:**
199+
- MySQL: case-insensitive by default
200+
- PostgreSQL: case-sensitive (`ILIKE` for insensitive)
201+
202+
**Algorithm:** Iterative two-pointer approach. O(n*m) worst case, O(n) typical. No regex.
203+
204+
---
205+
206+
## Error Handling
207+
208+
The evaluator does not throw exceptions. Error cases:
209+
210+
| Error | Behavior |
211+
|---|---|
212+
| Unknown node type | Return `value_null()` |
213+
| Division by zero | Return `value_null()` |
214+
| Function not found | Return `value_null()` |
215+
| Type coercion failure | Return `value_null()` |
216+
| Integer overflow | Wrap (int64 arithmetic) |
217+
| Invalid literal parse | `value_int(0)` for MySQL, `value_null()` for PostgreSQL |
218+
219+
---
220+
221+
## File Organization
222+
223+
```
224+
include/sql_engine/
225+
expression_eval.h — evaluate_expression<D>() template function
226+
like.h — match_like<D>() pattern matcher
227+
228+
tests/
229+
test_expression_eval.cpp — Unit tests: each node type
230+
test_like.cpp — LIKE pattern matching tests
231+
test_eval_integration.cpp — End-to-end: parse SQL → evaluate → check result
232+
```
233+
234+
---
235+
236+
## Testing Strategy
237+
238+
### Unit tests (test_expression_eval.cpp)
239+
240+
- **Literals:** INT, FLOAT, STRING, NULL, BOOL → correct Value
241+
- **Arithmetic:** `1 + 2`, `10 / 3`, `10 % 3`, `10 DIV 3`, division by zero, NULL + 1
242+
- **Comparison:** `1 = 1`, `1 < 2`, `'a' > 'b'`, cross-type (`1 = '1'` MySQL vs PgSQL)
243+
- **Logical:** AND/OR/NOT truth tables with NULL, short-circuit verification
244+
- **IS NULL / IS NOT NULL:** NULL and non-NULL inputs
245+
- **BETWEEN:** normal, NULL boundary, NULL expression
246+
- **IN list:** match, no match, NULL in list, NULL expression
247+
- **CASE/WHEN:** searched and simple forms, NULL handling, no ELSE
248+
- **Function calls:** known function, unknown function, NULL args
249+
- **Column resolution:** callback called with correct names, qualified names
250+
251+
### LIKE tests (test_like.cpp)
252+
253+
- Exact match, prefix `%`, suffix `%`, wildcard `_`
254+
- Case sensitivity per dialect
255+
- Escape characters
256+
- Empty string / empty pattern edge cases
257+
258+
### Integration tests (test_eval_integration.cpp)
259+
260+
Parse SQL → navigate to expression AST node → evaluate → verify result:
261+
262+
- `SELECT 1 + 2` → 3
263+
- `SELECT UPPER('hello')``'HELLO'`
264+
- `SELECT COALESCE(NULL, NULL, 42)` → 42
265+
- `SELECT CASE WHEN 1 > 2 THEN 'a' ELSE 'b' END``'b'`
266+
- `SELECT 1 IN (1, 2, 3)` → true
267+
- `SELECT 5 BETWEEN 1 AND 10` → true
268+
- `SELECT NULL IS NULL` → true
269+
- `SELECT 'test' LIKE 't%'` → true
270+
271+
---
272+
273+
## Performance Targets
274+
275+
| Operation | Target |
276+
|---|---|
277+
| Literal evaluation | <10ns |
278+
| Binary arithmetic (int + int) | <15ns |
279+
| Comparison (int = int) | <15ns |
280+
| NULL check + propagation | <5ns |
281+
| Function call (simple, e.g., ABS) | <30ns |
282+
| LIKE simple pattern | <100ns |
283+
| CASE/WHEN (3 branches) | <50ns |
284+
| Full expression: `price * qty > 100` | <50ns (excluding column resolution) |

0 commit comments

Comments
 (0)