- Overview
- What-is-DBGEN
- DBGEN Output
- Building DBGEN
- Command-Line Options
- Limitations and Compliance
- Sample Executions
- What-is-QGEN
- QGEN Output
- Building QGEN
- Command-Line Options for QGEN
- Query Template Syntax
- Sample QGEN Executions
- Environment Variables
- Version Numbering
- Validated Platforms
The TPC-H Benchmark is an industry-standard decision support benchmark designed to evaluate performance of database systems when executing complex, ad-hoc queries on large volumes of data.
It measures both query processing and data refresh performance, making it a critical evaluation tool for database systems in enterprise environments.
TPC-H comprises:
- A suite of business-oriented analytical queries
- A set of data population and refresh operations
- Standardized result metrics including QphH@Size (queries per hour at scale) and Price/QphH@Size as defined by the Transaction Processing Performance Council (TPC) Source: TPC-H Homepage, Version 3.0, 2024.
This repository contains DBGEN and QGEN, the reference tools for dataset generation and query creation in TPC-H version 2.x/3.x.
DBGEN (Database Generator) is the data generation utility used to populate the TPC-H schema.
It is written in ANSI C for platform portability and ensures reproducibility and compliance across systems.
Even though users may choose to generate data with other tools, the resulting dataset must match DBGEN output exactly to be compliant with TPC-H specification.
By default, DBGEN generates eight ASCII flat files, one for each table in the TPC-H schema (.tbl format, pipe-delimited).
Example:
customer.tbl -- contains data for the CUSTOMER table- Default scale factor (
-s) = 1, representing approximately 1 GB of data. - Output destination: Current working directory.
- File naming convention:
<table>.tbl
For update functionality (-U flag), DBGEN also produces:
- Update files (
u_<table>.tbl.<n>) - Delete SQL scripts (
delete.<n>)
- Edit
makefile.suiteas per your environment. - Run:
make- For details on compiler optimizations or cross-platform builds, refer to
Porting.Notes.
| Option | Argument | Default | Description |
|---|---|---|---|
-h |
— | — | Show usage summary |
-f |
— | — | Overwrite existing files |
-F |
— | yes | Flat file output |
-D |
— | — | Direct database load (requires custom loader) |
-s |
<scale> |
1 | Specifies scale factor (1.0 ≈ 1GB) |
-T |
<table> |
— | Generate data for specified table only |
-O |
<mode> |
— | Modify output behavior (headings, files, etc.) |
-r |
<percentage> |
10 | Percentage for update file scaling |
-v |
— | — | Verbose mode with progress output |
-C |
<children> |
— | Use <children> parallel processes |
-S |
<n> |
— | Segment index for multi-part load |
-U |
<updates> |
— | Number of update sets to create |
The TPC-H specification only recognizes compliant runs at certain scale factors and refresh percentages.
Compliant DBGEN configurations:
- Scale factors: 1, 10, 100, 300, 1000, 3000, 10000, 30000, 100000
- Refresh percentage: 10 (
-r 10)
Using non-standard values will produce non-compliant datasets.
# Generate default 1GB dataset
dbgen -s 1
# Generate only lineitem table for 10GB scale and overwrite if exists
dbgen -s 10 -f -T L
# Generate 100GB dataset in 1GB chunks
dbgen -s 100 -S 1 -C 100 -T p -v
dbgen -s 100 -S 2 -C 100 -T p -v
# Generate update sets for throughput test
dbgen -s 100 -U 4 -C 8QGEN (Query Generator) is used to transform the TPC-H query templates into executable SQL queries.
The templates contain placeholders (parameter tags) that are replaced by QGEN with valid values based on scale, stream, and random seed configurations.
Generated query files contain substituted parameters and can include:
- Database setup statements
- Output instructions
- Query plan generation directives
QGEN reads input templates stored in $DSS_QUERY/<query>.sql and outputs standard SQL files ready for execution.
QGEN is built using the same makefile as DBGEN:
makeConsult Porting.Notes for environment-specific adjustments.
| Option | Argument | Default | Description |
|---|---|---|---|
-h |
— | — | Display help |
-c |
— | — | Retain comments in output |
-d |
— | — | Use default parameter substitution |
-i |
<file> |
— | Initialize query from file |
-l |
<file> |
— | Save query parameters |
-n |
<db> |
— | Specify database name |
-p |
<stream> |
— | Use specified query stream |
-r |
<seed> |
— | Set random seed |
-s |
<scale> |
1 | Specify scale factor |
-o |
<path> |
— | Output query files to path |
-x |
— | — | Include query explain plan output |
-v |
— | — | Verbose generation messages |
QGEN processes templates line-by-line, replacing special tokens:
| Token | Replacement | Description |
|---|---|---|
:c |
database <dbname>; |
Database connection command |
:q |
Query number | Identifies current query |
:s |
Stream number | Identifies query stream |
:n |
Row count | Affects number of returned rows |
:b |
BEGIN WORK; |
Marks start of transaction |
:e |
COMMIT WORK; |
Marks end of transaction |
:<int> |
Parameter value | Query-specific substitutions |
:x |
set explain on; |
Generates query plan output |
# Generate query for default DB
qgen 1
# Generate queries for database "dss1" with param substitutions
qgen -d -c dss1 1
# Generate queries with explain plan and output directory
qgen -d -c dss1 -x -o ./queries 1| Variable | Default | Description |
|---|---|---|
DSS_PATH |
. |
Directory for generated flat files |
DSS_CONFIG |
. |
Path to configuration files |
DSS_DIST |
dists.dss |
Distribution definitions |
DSS_QUERY |
. |
Directory for query templates |
Each build follows V.R.P.M pattern:
| Field | Description |
|---|---|
| V | Major version |
| R | Specification release |
| P | Patch level |
| M | Minor change identifier |
Current Versions:
DBGEN: 2.4.0
QGEN: 2.4.0| Processor | OS | Compiler | Version | Flags |
|---|---|---|---|---|
| POWER5 | AIX 5.3 (64-bit) | IBM XL C | v7 | -q64 |
| IA-64 | HP-UX 64-bit | Intel ICC | — | — |
| x86 | Linux 32-bit | GCC | — | Default |
- Transaction Processing Performance Council.
“TPC-H Benchmark Specification (Version 3.0)”, 2024. - Apache Doris. TPC-H Benchmark Documentation, 2024.
- Springer LNCS. A PDGF Implementation for TPC-H, 2011.