|
| 1 | +# Reproducing Parser Comparison Benchmarks |
| 2 | + |
| 3 | +Step-by-step instructions to reproduce all comparison benchmarks from scratch on any Linux x86_64 machine. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Prerequisites |
| 8 | + |
| 9 | +```bash |
| 10 | +# Required tools |
| 11 | +sudo apt-get update |
| 12 | +sudo apt-get install -y build-essential git curl |
| 13 | + |
| 14 | +# Rust (for sqlparser-rs benchmark) |
| 15 | +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y |
| 16 | +source ~/.cargo/env |
| 17 | + |
| 18 | +# Verify |
| 19 | +g++ --version # need GCC 8+ (C++17 support) |
| 20 | +cargo --version # need Rust 1.70+ |
| 21 | +git --version |
| 22 | +``` |
| 23 | + |
| 24 | +--- |
| 25 | + |
| 26 | +## Step 1: Clone ParserSQL |
| 27 | + |
| 28 | +```bash |
| 29 | +git clone https://github.com/ProxySQL/ParserSQL.git |
| 30 | +cd ParserSQL |
| 31 | +``` |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## Step 2: Build ParserSQL (release mode) |
| 36 | + |
| 37 | +```bash |
| 38 | +# Create release Makefile (-O3, no debug symbols) |
| 39 | +sed 's/-g -O2/-O3/' Makefile.new > Makefile.release |
| 40 | + |
| 41 | +# Build the parser library |
| 42 | +make -f Makefile.release lib |
| 43 | + |
| 44 | +# Verify: run unit tests |
| 45 | +make -f Makefile.release test |
| 46 | +# Expected output: [ PASSED ] 430 tests. |
| 47 | +``` |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Step 3: Build libpg_query |
| 52 | + |
| 53 | +libpg_query is PostgreSQL's parser extracted as a standalone C library. |
| 54 | + |
| 55 | +```bash |
| 56 | +# Clone libpg_query (vendored in third_party/) |
| 57 | +# If not already present: |
| 58 | +if [ ! -d third_party/libpg_query ]; then |
| 59 | + git clone --depth 1 https://github.com/pganalyze/libpg_query.git third_party/libpg_query |
| 60 | +fi |
| 61 | + |
| 62 | +# Build libpg_query |
| 63 | +cd third_party/libpg_query |
| 64 | +make clean |
| 65 | +make -j$(nproc) |
| 66 | +cd ../.. |
| 67 | + |
| 68 | +# Verify the static library was built |
| 69 | +ls -la third_party/libpg_query/libpg_query.a |
| 70 | +# Expected: ~30MB static library |
| 71 | +``` |
| 72 | + |
| 73 | +**What libpg_query is:** This is PostgreSQL's actual parser (Bison-generated), extracted from the PostgreSQL source code by pganalyze. It compiles PostgreSQL's parser, lexer, memory management, and node types into a standalone library. The `pg_query_parse()` function takes a SQL string and returns a JSON-serialized parse tree. The `pg_query_raw_parse()` function (internal API) returns the raw AST without JSON serialization. |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## Step 4: Build the comparison benchmark |
| 78 | + |
| 79 | +```bash |
| 80 | +# Build the comparison benchmark binary |
| 81 | +make -f Makefile.release bench-compare |
| 82 | + |
| 83 | +# Verify the binary exists |
| 84 | +ls -la run_bench_compare |
| 85 | +``` |
| 86 | + |
| 87 | +**What gets built:** A single binary (`run_bench_compare`) that contains Google Benchmark harness + our parser + libpg_query. It benchmarks the same SQL queries through both parsers for a direct comparison. |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +## Step 5: Run ParserSQL vs libpg_query benchmark |
| 92 | + |
| 93 | +```bash |
| 94 | +./run_bench_compare --benchmark_format=console |
| 95 | +``` |
| 96 | + |
| 97 | +**Expected output** (numbers will vary by machine): |
| 98 | + |
| 99 | +``` |
| 100 | +--------------------------------------------------------------------- |
| 101 | +Benchmark Time CPU Iterations |
| 102 | +--------------------------------------------------------------------- |
| 103 | +BM_Ours_Select_Simple 223 ns 223 ns 3043637 |
| 104 | +BM_PgQuery_Select_Simple 1872 ns 1871 ns 374142 |
| 105 | +BM_PgQueryRaw_Select_Simple 684 ns 684 ns 1025094 |
| 106 | +BM_Ours_Select_Join 579 ns 579 ns 1210315 |
| 107 | +BM_PgQuery_Select_Join 4509 ns 4506 ns 154785 |
| 108 | +BM_PgQueryRaw_Select_Join 1646 ns 1646 ns 425588 |
| 109 | +... etc |
| 110 | +``` |
| 111 | + |
| 112 | +**Reading the results:** |
| 113 | +- `BM_Ours_*` — ParserSQL (this project) |
| 114 | +- `BM_PgQuery_*` — libpg_query with JSON serialization (`pg_query_parse()`) |
| 115 | +- `BM_PgQueryRaw_*` — libpg_query parse-only, no JSON (`pg_query_raw_parse()`) |
| 116 | +- The `BM_PgQueryRaw_*` numbers are the **fair comparison** (parse-only vs parse-only) |
| 117 | + |
| 118 | +**To save results as JSON** (for automated comparison): |
| 119 | +```bash |
| 120 | +./run_bench_compare --benchmark_format=json > comparison_results.json |
| 121 | +``` |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## Step 6: Build and run sqlparser-rs benchmark |
| 126 | + |
| 127 | +```bash |
| 128 | +cd bench/sqlparser_rs_bench |
| 129 | + |
| 130 | +# Build and run (Rust criterion benchmark) |
| 131 | +cargo bench |
| 132 | + |
| 133 | +cd ../.. |
| 134 | +``` |
| 135 | + |
| 136 | +**Expected output:** |
| 137 | + |
| 138 | +``` |
| 139 | +sqlparser_rs_mysql_simple_select |
| 140 | + time: [4.6 µs 4.7 µs 4.7 µs] |
| 141 | +sqlparser_rs_mysql_select_join |
| 142 | + time: [10.8 µs 10.9 µs 11.0 µs] |
| 143 | +... etc |
| 144 | +``` |
| 145 | + |
| 146 | +**Reading the results:** The `time:` line shows `[lower_bound median upper_bound]`. Compare the median value against ParserSQL's numbers from Step 5. |
| 147 | + |
| 148 | +**Note:** criterion outputs results to `bench/sqlparser_rs_bench/target/criterion/` with HTML reports you can open in a browser: |
| 149 | +```bash |
| 150 | +open bench/sqlparser_rs_bench/target/criterion/report/index.html # macOS |
| 151 | +xdg-open bench/sqlparser_rs_bench/target/criterion/report/index.html # Linux |
| 152 | +``` |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Step 7: Optional — ReadySet (nom-sql) benchmark |
| 157 | + |
| 158 | +ReadySet uses `nom-sql`, a nom-based SQL parser. However, ReadySet is transitioning to sqlparser-rs as their primary parser (default: `both-prefer-sqlparser`). Benchmarking nom-sql separately has limited value since it's being phased out. |
| 159 | + |
| 160 | +If you still want to run it: |
| 161 | + |
| 162 | +```bash |
| 163 | +# Clone ReadySet (full repo required for workspace dependencies, ~1.3GB) |
| 164 | +git clone --depth 1 https://github.com/readysettech/readyset.git /tmp/readyset_full |
| 165 | + |
| 166 | +# Build and run ReadySet's own comparison benchmark (nom-sql vs sqlparser-rs) |
| 167 | +cd /tmp/readyset_full |
| 168 | +cargo bench -p readyset-sql-parsing --bench parse_comparison 2>&1 | grep -E "time:" |
| 169 | +cd - |
| 170 | +``` |
| 171 | + |
| 172 | +**Note:** ReadySet's benchmark already includes sqlparser-rs comparisons internally, so this mostly confirms our sqlparser-rs numbers. |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## Step 8: Run the automated comparison script |
| 177 | + |
| 178 | +The `scripts/run_comparison.sh` script runs all comparisons in sequence: |
| 179 | + |
| 180 | +```bash |
| 181 | +./scripts/run_comparison.sh |
| 182 | +``` |
| 183 | + |
| 184 | +This produces console output with all three parser comparisons side by side. |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## Step 9: Generate the full benchmark report |
| 189 | + |
| 190 | +```bash |
| 191 | +./scripts/run_benchmarks.sh docs/benchmarks/latest.md |
| 192 | +``` |
| 193 | + |
| 194 | +This generates the complete performance report including: |
| 195 | +- Single-threaded benchmarks (18 operations) |
| 196 | +- Multi-threaded scaling (1/2/4/8 threads) |
| 197 | +- Percentile latency (avg/p50/p95/p99) |
| 198 | +- Corpus test results (9 corpora, 86K+ queries) |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Troubleshooting |
| 203 | + |
| 204 | +### libpg_query build fails |
| 205 | + |
| 206 | +```bash |
| 207 | +# libpg_query needs standard C build tools |
| 208 | +sudo apt-get install -y make gcc flex bison |
| 209 | + |
| 210 | +# If you get "redefinition" errors, make sure you're using a clean clone: |
| 211 | +cd third_party/libpg_query |
| 212 | +make clean |
| 213 | +make -j$(nproc) |
| 214 | +``` |
| 215 | + |
| 216 | +### Rust build fails |
| 217 | + |
| 218 | +```bash |
| 219 | +# Ensure Rust is up to date |
| 220 | +rustup update stable |
| 221 | + |
| 222 | +# If criterion fails to download, check network/proxy |
| 223 | +cd bench/sqlparser_rs_bench |
| 224 | +cargo update |
| 225 | +cargo bench |
| 226 | +``` |
| 227 | + |
| 228 | +### Benchmark numbers are unstable |
| 229 | + |
| 230 | +For the most reliable results: |
| 231 | + |
| 232 | +```bash |
| 233 | +# 1. Disable CPU frequency scaling |
| 234 | +echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
| 235 | + |
| 236 | +# 2. Disable turbo boost (Intel) |
| 237 | +echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo |
| 238 | +# Or (AMD) |
| 239 | +echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost |
| 240 | + |
| 241 | +# 3. Pin to a specific CPU core |
| 242 | +taskset -c 0 ./run_bench_compare |
| 243 | + |
| 244 | +# 4. Increase benchmark iterations for more stable results |
| 245 | +./run_bench_compare --benchmark_min_time=5s |
| 246 | +``` |
| 247 | + |
| 248 | +### Comparing across different machines |
| 249 | + |
| 250 | +Absolute numbers vary by CPU. Compare **ratios** instead: |
| 251 | +- ParserSQL / libpg_query ratio should be ~3x regardless of machine |
| 252 | +- ParserSQL / sqlparser-rs ratio should be ~15-20x regardless of machine |
| 253 | + |
| 254 | +--- |
| 255 | + |
| 256 | +## Queries Used in Benchmarks |
| 257 | + |
| 258 | +All parsers are benchmarked on the same set of queries: |
| 259 | + |
| 260 | +```sql |
| 261 | +-- simple_select |
| 262 | +SELECT col FROM t WHERE id = 1 |
| 263 | + |
| 264 | +-- select_join |
| 265 | +SELECT u.id, o.total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.status = 'active' |
| 266 | + |
| 267 | +-- select_complex |
| 268 | +SELECT u.id, u.name, COUNT(o.id) AS order_count |
| 269 | +FROM users u LEFT JOIN orders o ON u.id = o.user_id |
| 270 | +WHERE u.status = 'active' |
| 271 | +GROUP BY u.id, u.name |
| 272 | +HAVING COUNT(o.id) > 5 |
| 273 | +ORDER BY order_count DESC LIMIT 50 |
| 274 | + |
| 275 | +-- insert_values |
| 276 | +INSERT INTO users (name, email) VALUES ('John', 'john@example.com') |
| 277 | + |
| 278 | +-- update_simple |
| 279 | +UPDATE users SET status = 'inactive' WHERE last_login < '2024-01-01' |
| 280 | + |
| 281 | +-- delete_simple |
| 282 | +DELETE FROM users WHERE id = 42 |
| 283 | + |
| 284 | +-- set_simple |
| 285 | +SET @@session.wait_timeout = 600 |
| 286 | + |
| 287 | +-- set_names |
| 288 | +SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci |
| 289 | + |
| 290 | +-- begin |
| 291 | +BEGIN |
| 292 | + |
| 293 | +-- create_table |
| 294 | +CREATE TABLE IF NOT EXISTS users (id INT PRIMARY KEY, name VARCHAR(255), email VARCHAR(255)) |
| 295 | +``` |
| 296 | + |
| 297 | +--- |
| 298 | + |
| 299 | +## Reference Results |
| 300 | + |
| 301 | +These were measured on AMD Ryzen 9 5950X, Linux 6.17, GCC 13.3 -O3: |
| 302 | + |
| 303 | +| Query | ParserSQL | pg_query (raw) | pg_query (+JSON) | sqlparser-rs | |
| 304 | +|---|---|---|---|---| |
| 305 | +| SELECT simple | 223 ns | 684 ns (3.1x) | 1,872 ns (8.4x) | 4,687 ns (21x) | |
| 306 | +| SELECT JOIN | 579 ns | 1,646 ns (2.8x) | 4,509 ns (7.8x) | 10,684 ns (18x) | |
| 307 | +| SELECT complex | 1,189 ns | 3,304 ns (2.8x) | 8,675 ns (7.3x) | 23,411 ns (19x) | |
| 308 | +| INSERT | 244 ns | 781 ns (3.2x) | 1,831 ns (7.5x) | 3,784 ns (16x) | |
| 309 | +| BEGIN | 36 ns | 230 ns (6.4x) | 421 ns (11.7x) | 412 ns (11x) | |
0 commit comments