|
| 1 | +# SQL Engine Expression Evaluator — Design Specification |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The expression evaluator connects the SQL parser to the query engine. It takes a parser AST expression node, resolves column references via a caller-provided callback, and returns a `Value` using the type system from sub-project 1. |
| 6 | + |
| 7 | +This is sub-project 2 of the query engine. It depends on sub-project 1 (type system) and the SQL parser. |
| 8 | + |
| 9 | +### Goals |
| 10 | + |
| 11 | +- **Evaluate any SQL expression** from the parser's AST: literals, column refs, arithmetic, comparison, logical operators, function calls, LIKE, IS NULL, BETWEEN, IN, CASE/WHEN |
| 12 | +- **Dialect-aware** via `CoercionRules<D>` for type promotion and `FunctionRegistry<D>` for function dispatch |
| 13 | +- **Correct NULL handling** via three-valued logic throughout |
| 14 | +- **Column resolution via callback** — no row format imposed on the caller |
| 15 | +- **Integration milestone** — first time parser and engine connect end-to-end |
| 16 | + |
| 17 | +### Constraints |
| 18 | + |
| 19 | +- C++17 |
| 20 | +- Uses parser's AST (`AstNode`, `NodeType`) and arena |
| 21 | +- Uses type system's `Value`, `CoercionRules<D>`, `null_semantics`, `FunctionRegistry<D>` |
| 22 | +- Header-only (`expression_eval.h`, `like.h`) |
| 23 | +- No exceptions — errors return `value_null()` |
| 24 | + |
| 25 | +### Non-Goals (deferred) |
| 26 | + |
| 27 | +- Row/tuple representation (sub-project 4) |
| 28 | +- Aggregate function evaluation (needs executor with row-group state) |
| 29 | +- Subquery evaluation (needs full executor — `IN (SELECT ...)` returns NULL for now) |
| 30 | +- Window functions |
| 31 | +- ORDER BY / LIMIT execution |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## Core Interface |
| 36 | + |
| 37 | +```cpp |
| 38 | +template <Dialect D> |
| 39 | +Value evaluate_expression(const AstNode* expr, |
| 40 | + const std::function<Value(StringRef)>& resolve, |
| 41 | + FunctionRegistry<D>& functions, |
| 42 | + Arena& arena); |
| 43 | +``` |
| 44 | +
|
| 45 | +**Parameters:** |
| 46 | +- `expr` — AST expression node from the parser |
| 47 | +- `resolve` — callback that maps column names to values. Called when the evaluator hits `NODE_COLUMN_REF` or `NODE_QUALIFIED_NAME`. For qualified names (`table.column`), the callback receives the full qualified string. |
| 48 | +- `functions` — function registry with built-in functions registered |
| 49 | +- `arena` — for allocating intermediate string results |
| 50 | +
|
| 51 | +**Returns:** A `Value`. Returns `value_null()` on error or unsupported node types. |
| 52 | +
|
| 53 | +--- |
| 54 | +
|
| 55 | +## AST Node Dispatch |
| 56 | +
|
| 57 | +The evaluator switches on `expr->type`: |
| 58 | +
|
| 59 | +### Leaf nodes (no recursion) |
| 60 | +
|
| 61 | +| NodeType | Action | |
| 62 | +|---|---| |
| 63 | +| `NODE_LITERAL_INT` | Parse `expr->value()` string to int64 via `strtoll` | |
| 64 | +| `NODE_LITERAL_FLOAT` | Parse `expr->value()` string to double via `strtod` | |
| 65 | +| `NODE_LITERAL_STRING` | `value_string(expr->value())` | |
| 66 | +| `NODE_LITERAL_NULL` | `value_null()` | |
| 67 | +| `NODE_COLUMN_REF` | `resolve(expr->value())` | |
| 68 | +| `NODE_ASTERISK` | `value_string(StringRef{"*", 1})` (for `COUNT(*)`) | |
| 69 | +| `NODE_PLACEHOLDER` | `value_null()` (unresolved placeholder) | |
| 70 | +| `NODE_IDENTIFIER` | `resolve(expr->value())` (column or keyword-as-value) | |
| 71 | +
|
| 72 | +### Qualified name |
| 73 | +
|
| 74 | +`NODE_QUALIFIED_NAME` has two children (table, column). Combine into `"table.column"` and call `resolve()`. |
| 75 | +
|
| 76 | +### Binary operators |
| 77 | +
|
| 78 | +`NODE_BINARY_OP` has value = operator text, two children (left, right). |
| 79 | +
|
| 80 | +``` |
| 81 | +1. If operator is AND/OR: use short-circuit evaluation (see below) |
| 82 | +2. Evaluate left child → left_val |
| 83 | +3. Evaluate right child → right_val |
| 84 | +4. NULL propagation: if either NULL → return value_null() |
| 85 | +5. Find common type: CoercionRules<D>::common_type(left_val.tag, right_val.tag) |
| 86 | +6. Coerce both to common type |
| 87 | +7. Apply operator → return result |
| 88 | +``` |
| 89 | +
|
| 90 | +**Short-circuit for AND/OR:** |
| 91 | +- `AND`: if left is FALSE → return FALSE without evaluating right. If left is NULL → evaluate right; if right is FALSE → FALSE, else NULL. |
| 92 | +- `OR`: if left is TRUE → return TRUE without evaluating right. If left is NULL → evaluate right; if right is TRUE → TRUE, else NULL. |
| 93 | +
|
| 94 | +**Arithmetic operators** (`+`, `-`, `*`, `/`, `%`, `DIV`, `MOD`): |
| 95 | +- Operate on coerced numeric values (int64 or double) |
| 96 | +- Division by zero → NULL |
| 97 | +- `%` / `MOD` → integer remainder |
| 98 | +- `DIV` → integer division (truncate toward zero) |
| 99 | +
|
| 100 | +**Comparison operators** (`=`, `<>`, `!=`, `<`, `>`, `<=`, `>=`): |
| 101 | +- Compare coerced values |
| 102 | +- Return `value_bool(result)` |
| 103 | +
|
| 104 | +**String operators:** |
| 105 | +- `||` in PostgreSQL: string concatenation |
| 106 | +- `||` in MySQL: logical OR |
| 107 | +- `LIKE`: delegate to `match_like<D>()` |
| 108 | +
|
| 109 | +### Unary operators |
| 110 | +
|
| 111 | +`NODE_UNARY_OP` has value = operator text, one child. |
| 112 | +
|
| 113 | +| Operator | Action | |
| 114 | +|---|---| |
| 115 | +| `-` | Negate: int → `-int_val`, double → `-double_val`. NULL → NULL. | |
| 116 | +| `NOT` | `null_semantics::eval_not(child_val)` | |
| 117 | +| `+` | No-op (unary plus) | |
| 118 | +
|
| 119 | +### IS NULL / IS NOT NULL |
| 120 | +
|
| 121 | +`NODE_IS_NULL` / `NODE_IS_NOT_NULL` have one child. |
| 122 | +
|
| 123 | +- Evaluate child → `value_bool(child.is_null())` or `value_bool(!child.is_null())` |
| 124 | +- **Never returns NULL** — IS NULL always returns TRUE or FALSE |
| 125 | +
|
| 126 | +### BETWEEN |
| 127 | +
|
| 128 | +`NODE_BETWEEN` has three children: expr, low, high. |
| 129 | +
|
| 130 | +- Evaluate all three |
| 131 | +- Equivalent to `expr >= low AND expr <= high` |
| 132 | +- Uses coercion for comparison |
| 133 | +- NULL propagation: if any is NULL, follows standard comparison NULL rules |
| 134 | +
|
| 135 | +### IN list |
| 136 | +
|
| 137 | +`NODE_IN_LIST` has N children: first is the expression, rest are values. |
| 138 | +
|
| 139 | +- Evaluate the expression |
| 140 | +- If expression is NULL → return NULL |
| 141 | +- Evaluate each value, compare with `=` |
| 142 | +- If any match → TRUE |
| 143 | +- If no match but any comparison was NULL → NULL |
| 144 | +- If no match and no NULLs → FALSE |
| 145 | +
|
| 146 | +### CASE/WHEN |
| 147 | +
|
| 148 | +`NODE_CASE_WHEN` children are interleaved: [case_expr], when1, then1, when2, then2, ..., [else_expr]. |
| 149 | +
|
| 150 | +**Searched CASE** (no case_expr — first child is a WHEN condition): |
| 151 | +``` |
| 152 | +For each WHEN/THEN pair: |
| 153 | + evaluate WHEN condition |
| 154 | + if TRUE → evaluate and return THEN value |
| 155 | +If no match and ELSE exists → evaluate and return ELSE |
| 156 | +If no match and no ELSE → NULL |
| 157 | +``` |
| 158 | +
|
| 159 | +**Simple CASE** (first child is case_expr): |
| 160 | +``` |
| 161 | +Evaluate case_expr |
| 162 | +For each WHEN/THEN pair: |
| 163 | + evaluate WHEN value |
| 164 | + if case_expr = WHEN value → evaluate and return THEN |
| 165 | +If no match → ELSE or NULL |
| 166 | +``` |
| 167 | +
|
| 168 | +### Function calls |
| 169 | +
|
| 170 | +`NODE_FUNCTION_CALL` has value = function name, children = arguments. |
| 171 | +
|
| 172 | +``` |
| 173 | +1. Lookup function in FunctionRegistry<D> by name |
| 174 | +2. If not found → return value_null() |
| 175 | +3. Evaluate each argument child into Value array |
| 176 | +4. Call function(args, arg_count, arena) |
| 177 | +5. Return result |
| 178 | +``` |
| 179 | +
|
| 180 | +### Subquery (deferred) |
| 181 | +
|
| 182 | +`NODE_SUBQUERY` → return `value_null()`. Full subquery evaluation requires the executor. |
| 183 | +
|
| 184 | +--- |
| 185 | +
|
| 186 | +## LIKE Pattern Matching |
| 187 | +
|
| 188 | +```cpp |
| 189 | +template <Dialect D> |
| 190 | +bool match_like(StringRef text, StringRef pattern, char escape_char = '\\'); |
| 191 | +``` |
| 192 | + |
| 193 | +**Pattern rules:** |
| 194 | +- `%` matches zero or more characters |
| 195 | +- `_` matches exactly one character |
| 196 | +- Escape character (default `\`) makes the next character literal |
| 197 | + |
| 198 | +**Dialect differences:** |
| 199 | +- MySQL: case-insensitive by default |
| 200 | +- PostgreSQL: case-sensitive (`ILIKE` for insensitive) |
| 201 | + |
| 202 | +**Algorithm:** Iterative two-pointer approach. O(n*m) worst case, O(n) typical. No regex. |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Error Handling |
| 207 | + |
| 208 | +The evaluator does not throw exceptions. Error cases: |
| 209 | + |
| 210 | +| Error | Behavior | |
| 211 | +|---|---| |
| 212 | +| Unknown node type | Return `value_null()` | |
| 213 | +| Division by zero | Return `value_null()` | |
| 214 | +| Function not found | Return `value_null()` | |
| 215 | +| Type coercion failure | Return `value_null()` | |
| 216 | +| Integer overflow | Wrap (int64 arithmetic) | |
| 217 | +| Invalid literal parse | `value_int(0)` for MySQL, `value_null()` for PostgreSQL | |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## File Organization |
| 222 | + |
| 223 | +``` |
| 224 | +include/sql_engine/ |
| 225 | + expression_eval.h — evaluate_expression<D>() template function |
| 226 | + like.h — match_like<D>() pattern matcher |
| 227 | +
|
| 228 | +tests/ |
| 229 | + test_expression_eval.cpp — Unit tests: each node type |
| 230 | + test_like.cpp — LIKE pattern matching tests |
| 231 | + test_eval_integration.cpp — End-to-end: parse SQL → evaluate → check result |
| 232 | +``` |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## Testing Strategy |
| 237 | + |
| 238 | +### Unit tests (test_expression_eval.cpp) |
| 239 | + |
| 240 | +- **Literals:** INT, FLOAT, STRING, NULL, BOOL → correct Value |
| 241 | +- **Arithmetic:** `1 + 2`, `10 / 3`, `10 % 3`, `10 DIV 3`, division by zero, NULL + 1 |
| 242 | +- **Comparison:** `1 = 1`, `1 < 2`, `'a' > 'b'`, cross-type (`1 = '1'` MySQL vs PgSQL) |
| 243 | +- **Logical:** AND/OR/NOT truth tables with NULL, short-circuit verification |
| 244 | +- **IS NULL / IS NOT NULL:** NULL and non-NULL inputs |
| 245 | +- **BETWEEN:** normal, NULL boundary, NULL expression |
| 246 | +- **IN list:** match, no match, NULL in list, NULL expression |
| 247 | +- **CASE/WHEN:** searched and simple forms, NULL handling, no ELSE |
| 248 | +- **Function calls:** known function, unknown function, NULL args |
| 249 | +- **Column resolution:** callback called with correct names, qualified names |
| 250 | + |
| 251 | +### LIKE tests (test_like.cpp) |
| 252 | + |
| 253 | +- Exact match, prefix `%`, suffix `%`, wildcard `_` |
| 254 | +- Case sensitivity per dialect |
| 255 | +- Escape characters |
| 256 | +- Empty string / empty pattern edge cases |
| 257 | + |
| 258 | +### Integration tests (test_eval_integration.cpp) |
| 259 | + |
| 260 | +Parse SQL → navigate to expression AST node → evaluate → verify result: |
| 261 | + |
| 262 | +- `SELECT 1 + 2` → 3 |
| 263 | +- `SELECT UPPER('hello')` → `'HELLO'` |
| 264 | +- `SELECT COALESCE(NULL, NULL, 42)` → 42 |
| 265 | +- `SELECT CASE WHEN 1 > 2 THEN 'a' ELSE 'b' END` → `'b'` |
| 266 | +- `SELECT 1 IN (1, 2, 3)` → true |
| 267 | +- `SELECT 5 BETWEEN 1 AND 10` → true |
| 268 | +- `SELECT NULL IS NULL` → true |
| 269 | +- `SELECT 'test' LIKE 't%'` → true |
| 270 | + |
| 271 | +--- |
| 272 | + |
| 273 | +## Performance Targets |
| 274 | + |
| 275 | +| Operation | Target | |
| 276 | +|---|---| |
| 277 | +| Literal evaluation | <10ns | |
| 278 | +| Binary arithmetic (int + int) | <15ns | |
| 279 | +| Comparison (int = int) | <15ns | |
| 280 | +| NULL check + propagation | <5ns | |
| 281 | +| Function call (simple, e.g., ABS) | <30ns | |
| 282 | +| LIKE simple pattern | <100ns | |
| 283 | +| CASE/WHEN (3 branches) | <50ns | |
| 284 | +| Full expression: `price * qty > 100` | <50ns (excluding column resolution) | |
0 commit comments