Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 128 additions & 13 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,154 @@
## Creating LaTeX tables
## Scripts for Benchmark Analysis

Prerequisite: You should be able to build and run the C++ benchmark. You need Python 3 on your system.
This directory contains scripts for processing and visualizing benchmark results.

Run your benchmark:
### Prerequisites

- Python 3.6+
- Required Python packages: `pandas`, `numpy`, `matplotlib`, `seaborn`

You can install the required packages with:

```bash
pip install pandas numpy matplotlib seaborn
```
cmake -B build

### Creating LaTeX Tables

#### Basic Table Generation

Run your benchmark and convert the output to a LaTeX table:

```bash
# Run benchmark
cmake -B build .
cmake --build build
./build/benchmarks/benchmark -f data/canada.txt > myresults.txt

# Convert to LaTeX table
./scripts/latex_table.py myresults.txt
```

Process the raw output:
This will print a LaTeX table to stdout with numbers rounded to two significant digits.

#### Automated Multiple Table Generation

Instead of manually running benchmarks and generating tables, you can use the `generate_multiple_tables.py` script to automate the entire process:

```bash
# Basic usage with g++ compiler
./scripts/generate_multiple_tables.py g++
```
./scripts/latex_table.py myresults.txt

This script:

- Automatically compiles the benchmark code with the specified compiler
- Runs multiple benchmarks with different configurations
- Generates LaTeX tables for each benchmark result
- Saves all tables to the output directory

Options:

- First argument: Compiler to use (g++, clang++)
- `--build-dir`: Build directory (default: build)
- `--output-dir`: Output directory for tables (default: ./outputs)
- `--clean`: Clean build directory before compilation
- `--march`: Architecture target for -march flag (default: native)

The script also has several configurable variables at the top of the file:

- Benchmark datasets (canada, mesh, uniform_01)
- Algorithm filters
- Number of runs
- Volume size

This is the recommended approach for generating comprehensive benchmark results.

### Combining Tables

The `concat_tables.py` script combines separate benchmark tables (mesh, canada, uniform_01) into comprehensive tables:

```bash
# Basic usage, using tables in ./outputs
./scripts/concat_tables.py
```

This will print to stdout the table.
The numbers are already rounded to two significant digits, ready to be included in a scientific manuscript.
Options:

- `--input-dir`, `-i`: Directory containing benchmark .tex files (default: ./outputs)
- `--output-dir`, `-o`: Output directory for combined tables (default: same as input)
- `--exclude`, `-e`: Algorithms to exclude from the output tables

### Generating Visualization Figures

It is also possible to create multiple LaTeX tables at once with:
The `generate_figures.py` script creates heatmaps and relative performance plots:

```bash
# Generate figures for nanoseconds per float metric
./scripts/generate_figures.py nsf ./outputs
```
./scripts/generate_multiple_tables.py <compiler_name>`

Options:

- First argument: Metric to visualize (`nsf`, `insf`, or `insc`)
- Second argument: Directory containing benchmark result .tex files
- `--output-dir`, `-o`: Directory to save generated figures (default: same as input directory)
- `--exclude`, `-e`: Algorithms to exclude from visualization
- `--cpus`, `-c`: CPUs to include in relative performance plots

### Extracting Summary Metrics

The `get_summary_metrics.py` script analyzes raw benchmark files to extract performance metrics:

```bash
# Analyze all CPUs
./scripts/get_summary_metrics.py
```

## Running tests on Amazon AWS
Options:

- `--cpu`: CPU folder name to restrict analysis
- `--input-dir`, `-i`: Directory containing benchmark .raw files (default: ./outputs)
- `--outlier-threshold`, `-t`: Threshold for reporting outliers (default: 5.0%)
- `--dedicated-cpus`, `-d`: CPU folder names considered dedicated (non-cloud)

### Running Tests on Amazon AWS

It is possible to generate tests on Amazon AWS:

```
./scripts/aws_tests.py
```bash
./scripts/aws_tests.bash
```

This script will create new EC2 instances, run
`./scripts/generate_multiple_tables.py` script on both g++ and clang++ builds,
save each output to a separate folder, and then terminate the instance.

Prerequisites and some user configurable variables are in the script itself.

### Workflow Example

A typical complete workflow might look like:

1. **Generate benchmark results and tables automatically**:
```bash
# For g++ compiler (compiles and runs benchmarks)
./scripts/generate_multiple_tables.py g++ --clean

# For clang++ compiler (compiles and runs benchmarks)
./scripts/generate_multiple_tables.py clang++ --clean
```
2. **Combine tables for better comparison**:
```bash
./scripts/concat_tables.py
```
3. **Generate visualization figures**:
```bash
./scripts/generate_figures.py nsf ./outputs
```
4. **Extract summary metrics**:
```bash
./scripts/get_summary_metrics.py
```

This automated workflow handles the entire process from compilation to visualization with minimal manual intervention.
177 changes: 177 additions & 0 deletions scripts/concat_tables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
#!/usr/bin/env python3
"""
Concatenate multiple benchmark result tables into a single comprehensive table.

This script finds and combines related benchmark results from different datasets
(mesh, canada, uniform_01) into a single LaTeX table for easier comparison.
"""
import os
import re
import argparse
import pandas as pd


def parse_tex_table(filepath):
"""Parse a LaTeX table file into a pandas DataFrame."""
with open(filepath, 'r') as file:
lines = file.readlines()
data_start = False
parsed = []
for line in lines:
if "\\midrule" in line:
data_start = True
continue
if "\\bottomrule" in line:
break
if data_start and '&' in line:
row = [x.strip().strip('\\') for x in line.split('&')]
if len(row) == 4:
parsed.append({
'algorithm': row[0],
'ns/f': row[1],
'ins/f': row[2],
'ins/c': row[3]
})
return pd.DataFrame(parsed)


def clean_cpu_name(cpu_name):
"""Clean CPU name for better display in tables."""
cpu_cleaned = cpu_name.replace("Ryzen9900x", "Ryzen 9900X")
cpu_cleaned = cpu_cleaned.replace("_Platinum", "")
cpu_cleaned = re.sub(r"_\d+-Core_Processor", "", cpu_cleaned)
cpu_cleaned = re.sub(r"_CPU__\d+\.\d+GHz", "", cpu_cleaned)
cpu_cleaned = re.sub(r"\(R\)", "", cpu_cleaned)
return cpu_cleaned.replace("_", " ").replace(" ", " ").strip()


def format_latex_table(df, cpu_name, compiler, float_bits, microarch=None,
exclude_algos=None):
"""Format the combined data as a LaTeX table."""
if exclude_algos is None:
exclude_algos = set()

cpu_cleaned = clean_cpu_name(cpu_name)
caption = f"{cpu_cleaned} results ({compiler}, {float_bits}-bit floats"
if microarch:
caption += f", {microarch}"
caption += ")"
label = f"tab:{re.sub(r'[^a-zA-Z0-9]+', '', cpu_name.lower())}results"
header = (
"\\begin{table}\n"
" \\centering\n"
f" \\caption{{{caption}}}%\n"
f" \\label{{{label}}}\n"
" \\begin{tabular}{lccccccccc}\n"
" \\toprule\n"
" \\multirow{1}{*}{Name} & \\multicolumn{3}{c|}{mesh} & "
"\\multicolumn{3}{c|}{canada} & \\multicolumn{3}{c}{unit} \\\\\n"
" & {ns/f} & {ins/f} & {ins/c} & "
"{ns/f} & {ins/f} & {ins/c} & {ns/f} & {ins/f} & {ins/c} \\\\ "
"\\midrule\n"
)
body = ""
for _, row in df.iterrows():
if row['algorithm'] in exclude_algos:
continue
line = (
f" {row['algorithm']} & {row['ns/f_mesh']} & "
f"{row['ins/f_mesh']} & {row['ins/c_mesh']} & "
f"{row['ns/f_canada']} & {row['ins/f_canada']} & "
f"{row['ins/c_canada']} & "
f"{row['ns/f_unit']} & {row['ins/f_unit']} & "
f"{row['ins/c_unit']} \\\\\n"
)
body += line
footer = (
" \\bottomrule\n"
" \\end{tabular}\\restartrowcolors\n"
"\\end{table}\n"
)
return header + body + footer


def find_combinations(root, pattern=None):
"""Find all combinations of benchmark result files that can be combined."""
if pattern is None:
pattern = re.compile(
r"(.*?)_(g\+\+|clang\+\+)_(mesh|canada|uniform_01)_(none|s)"
r"(?:_(x86-64|x86-64-v2|x86-64-v3|x86-64-v4|native))?\.tex"
)
# group(1)=cpu, 2=compiler, 3=dataset, 4=variant, 5=microarch (optional)

combos = []
for dirpath, _, filenames in os.walk(root):
tex_files = [f for f in filenames if f.endswith('.tex')]
table = {}
for f in tex_files:
m = pattern.match(f)
if m:
cpu, compiler, dataset, variant, microarch = m.groups()
key = (dirpath, cpu, compiler, variant, microarch)
if key not in table:
table[key] = {}
table[key][dataset] = os.path.join(dirpath, f)
for (dirpath, cpu, compiler, variant, microarch), files in table.items():
if {"mesh", "canada", "uniform_01"}.issubset(files.keys()):
combos.append((dirpath, cpu, compiler, variant, microarch, files))
return combos


def main():
parser = argparse.ArgumentParser(
description="Concatenate benchmark tables into comprehensive tables")
parser.add_argument(
"--input-dir", "-i", default="./outputs",
help="Directory containing benchmark .tex files")
parser.add_argument(
"--output-dir", "-o",
help="Output directory for combined tables (defaults to input directory)")
parser.add_argument(
"--exclude", "-e", nargs="+",
default=["netlib", "teju\\_jagua", "yy\\_double", "snprintf", "abseil"],
help="Algorithms to exclude from the output tables")
args = parser.parse_args()

input_dir = args.input_dir
output_dir = args.output_dir if args.output_dir else input_dir
exclude_algos = set(args.exclude)

# Create output directory if it doesn't exist
if not os.path.exists(output_dir):
os.makedirs(output_dir)

combos = find_combinations(input_dir)
if not combos:
print(f"No matching benchmark files found in {input_dir}")
return

print(f"Found {len(combos)} combinations to process")

for dirpath, cpu, compiler, variant, microarch, paths in combos:
df_mesh = parse_tex_table(paths['mesh'])
df_canada = parse_tex_table(paths['canada'])
df_unit = parse_tex_table(paths['uniform_01'])
df_merged = df_mesh.merge(
df_canada, on='algorithm', suffixes=('_mesh', '_canada'))
df_merged = df_merged.merge(df_unit, on='algorithm')
df_merged.rename(columns={
'ns/f': 'ns/f_unit',
'ins/f': 'ins/f_unit',
'ins/c': 'ins/c_unit'
}, inplace=True)

float_bits = "32" if variant == "s" else "64"
tex_code = format_latex_table(
df_merged, cpu, compiler, float_bits, microarch, exclude_algos)

suffix = f"_{microarch}" if microarch else ""
out_path = os.path.join(
output_dir, f"{cpu}_{compiler}_all_{variant}{suffix}.tex")
with open(out_path, "w") as f:
f.write(tex_code)
print(f"[OK] {out_path}")


if __name__ == "__main__":
main()
Loading
Loading