Skip to content

ishatak/tpch-dbgen

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPC-H Database Generation Tools (DBGEN & QGEN)

Table of Contents

  1. Overview
  2. What-is-DBGEN
  3. DBGEN Output
  4. Building DBGEN
  5. Command-Line Options
  6. Limitations and Compliance
  7. Sample Executions
  8. What-is-QGEN
  9. QGEN Output
  10. Building QGEN
  11. Command-Line Options for QGEN
  12. Query Template Syntax
  13. Sample QGEN Executions
  14. Environment Variables
  15. Version Numbering
  16. Validated Platforms

Overview

The TPC-H Benchmark is an industry-standard decision support benchmark designed to evaluate performance of database systems when executing complex, ad-hoc queries on large volumes of data.
It measures both query processing and data refresh performance, making it a critical evaluation tool for database systems in enterprise environments.

TPC-H comprises:

  • A suite of business-oriented analytical queries
  • A set of data population and refresh operations
  • Standardized result metrics including QphH@Size (queries per hour at scale) and Price/QphH@Size as defined by the Transaction Processing Performance Council (TPC) Source: TPC-H Homepage, Version 3.0, 2024.

This repository contains DBGEN and QGEN, the reference tools for dataset generation and query creation in TPC-H version 2.x/3.x.


What is DBGEN?

DBGEN (Database Generator) is the data generation utility used to populate the TPC-H schema.
It is written in ANSI C for platform portability and ensures reproducibility and compliance across systems.

Even though users may choose to generate data with other tools, the resulting dataset must match DBGEN output exactly to be compliant with TPC-H specification.


DBGEN Output

By default, DBGEN generates eight ASCII flat files, one for each table in the TPC-H schema (.tbl format, pipe-delimited).
Example:

customer.tbl -- contains data for the CUSTOMER table
  • Default scale factor (-s) = 1, representing approximately 1 GB of data.
  • Output destination: Current working directory.
  • File naming convention: <table>.tbl

For update functionality (-U flag), DBGEN also produces:

  • Update files (u_<table>.tbl.<n>)
  • Delete SQL scripts (delete.<n>)

Building DBGEN

  1. Edit makefile.suite as per your environment.
  2. Run:
make
  1. For details on compiler optimizations or cross-platform builds, refer to Porting.Notes.

Command-Line Options

Option Argument Default Description
-h Show usage summary
-f Overwrite existing files
-F yes Flat file output
-D Direct database load (requires custom loader)
-s <scale> 1 Specifies scale factor (1.0 ≈ 1GB)
-T <table> Generate data for specified table only
-O <mode> Modify output behavior (headings, files, etc.)
-r <percentage> 10 Percentage for update file scaling
-v Verbose mode with progress output
-C <children> Use <children> parallel processes
-S <n> Segment index for multi-part load
-U <updates> Number of update sets to create

Limitations and Compliance

The TPC-H specification only recognizes compliant runs at certain scale factors and refresh percentages.

Compliant DBGEN configurations:

  • Scale factors: 1, 10, 100, 300, 1000, 3000, 10000, 30000, 100000
  • Refresh percentage: 10 (-r 10)

Using non-standard values will produce non-compliant datasets.


Sample Executions

# Generate default 1GB dataset
dbgen -s 1

# Generate only lineitem table for 10GB scale and overwrite if exists
dbgen -s 10 -f -T L

# Generate 100GB dataset in 1GB chunks
dbgen -s 100 -S 1 -C 100 -T p -v
dbgen -s 100 -S 2 -C 100 -T p -v

# Generate update sets for throughput test
dbgen -s 100 -U 4 -C 8

What is QGEN?

QGEN (Query Generator) is used to transform the TPC-H query templates into executable SQL queries.
The templates contain placeholders (parameter tags) that are replaced by QGEN with valid values based on scale, stream, and random seed configurations.


QGEN Output

Generated query files contain substituted parameters and can include:

  • Database setup statements
  • Output instructions
  • Query plan generation directives

QGEN reads input templates stored in $DSS_QUERY/<query>.sql and outputs standard SQL files ready for execution.


Building QGEN

QGEN is built using the same makefile as DBGEN:

make

Consult Porting.Notes for environment-specific adjustments.


Command-Line Options for QGEN

Option Argument Default Description
-h Display help
-c Retain comments in output
-d Use default parameter substitution
-i <file> Initialize query from file
-l <file> Save query parameters
-n <db> Specify database name
-p <stream> Use specified query stream
-r <seed> Set random seed
-s <scale> 1 Specify scale factor
-o <path> Output query files to path
-x Include query explain plan output
-v Verbose generation messages

Query Template Syntax

QGEN processes templates line-by-line, replacing special tokens:

Token Replacement Description
:c database <dbname>; Database connection command
:q Query number Identifies current query
:s Stream number Identifies query stream
:n Row count Affects number of returned rows
:b BEGIN WORK; Marks start of transaction
:e COMMIT WORK; Marks end of transaction
:<int> Parameter value Query-specific substitutions
:x set explain on; Generates query plan output

Sample QGEN Executions

# Generate query for default DB
qgen 1

# Generate queries for database "dss1" with param substitutions
qgen -d -c dss1 1

# Generate queries with explain plan and output directory
qgen -d -c dss1 -x -o ./queries 1

Environment Variables

Variable Default Description
DSS_PATH . Directory for generated flat files
DSS_CONFIG . Path to configuration files
DSS_DIST dists.dss Distribution definitions
DSS_QUERY . Directory for query templates

Version Numbering

Each build follows V.R.P.M pattern:

Field Description
V Major version
R Specification release
P Patch level
M Minor change identifier

Current Versions:

DBGEN: 2.4.0
QGEN: 2.4.0

Validated Platforms

Processor OS Compiler Version Flags
POWER5 AIX 5.3 (64-bit) IBM XL C v7 -q64
IA-64 HP-UX 64-bit Intel ICC
x86 Linux 32-bit GCC Default

Reference


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C 88.4%
  • Shell 4.7%
  • Makefile 3.5%
  • Perl 3.4%