Symbolic Tensor Graph (STG) Generator

Author: Changhai Man @ SynergyLab
Contact: cman8@gatech.edu

Overview

The Symbolic Tensor Graph is a generator for Chakra Execution Trace (ET) files. This tool is designed to generate synthetic workload traces for use in parallel strategy exploration without gathering data from a real system or implementing actual workload codes. It supports various parallelization strategies like Data Parallelism (DP), Tensor Parallelism (TP), Pipeline Parallelism (PP) and Sequence Parallelism (SP).

Key Features

Generate synthetic transformer workloads in Chakra ET format.
Supports multiple parallelism strategies (DP, TP, PP, SP).
Support customized model dimensions for Transformer Models (batch, seq, dmodel, dff, n_head)

Installation

To set up the environment and install the required dependencies, follow these steps:

# Clone the repository
git clone git@github.com:astra-sim/symbolic_tensor_graph.git

# Navigate to the project directory
cd symbolic_tensor_graph

# Install dependencies via conda
conda create -n <env_name>
conda activate <env_name>
conda install numpy sympy python-graphviz protobuf pandas -c conda-forge

Usage

To generate symbolic workloads, use the following command:

python main.py –h

This will show all available options and their descriptions. Example of running the generator:

python main.py --output_dir generated/ \
               --output_name workload.%d.et \
               --comm_group_file comm_group.json \
               --dp 2 --tp 2 --pp 2 \
               --weight_sharded 0

Example Output:

$ ls generated/
comm_group.json  workload.0.et  workload.1.et  workload.2.et  workload.3.et

Parameters

| Argument               | Type    | Required | Default    | Description                                                                 |
|------------------------|---------|----------|------------|-----------------------------------------------------------------------------|
| --output_dir           | str     | Yes      | -          | Directory to store output traces.                                           |
| --output_name          | str     | Yes      | -          | Name of the output traces.                                                  |
| --dp                   | int     | No       | 1          | Data parallel degree.                                                       |
| --tp                   | int     | No       | 1          | Tensor parallel degree.                                                     |
| --sp                   | int     | No       | 1          | Sequence parallel degree.                                                   |
| --ep                   | int     | No       | 1          | Expert parallel degree.                                                     |
| --pp                   | int     | No       | 1          | Pipeline parallel degree.                                                   |
| --weight_sharded       | bool    | No       | False      | Whether weights are sharded.                                                |
| --activation_recompute | bool    | No       | False      | Whether to recompute activations.                                           |
| --tpsp                 | bool    | No       | True       | Use tensor parallel + sequence parallel or tensor parallel only.            |
| --dvocal               | int     | No       | 32000      | Vocabulary size.                                                            |
| --dmodel               | int     | No       | 8192       | Model dimension.                                                            |
| --dff                  | int     | No       | 28672      | Feed-forward dimension.                                                     |
| --batch                | int     | No       | 64         | Batch size.                                                                 |
| --micro_batch          | int     | No       | -1         | Micro-batch size. Default is -1 (same as batch size).                       |
| --seq                  | int     | No       | 1024       | Sequence length.                                                            |
| --head                 | int     | No       | 64         | Number of attention heads.                                                  |
| --kvhead               | int     | No       | 8          | Number of key-value heads.                                                  |
| --num_stacks           | int     | No       | 80         | Number of transformer layers.                                               |
| --experts              | int     | No       | 8          | Number of experts in MoE.                                                   |
| --kexperts             | int     | No       | 2          | Number of selected experts per token.                                       |
| --chakra_schema_version| str     | No       | "v0.0.4"   | Chakra schema version.                                                      |
| --model_type           | str     | No       | "llama"    | Type of model to assemble ("llama", "gpt", "moe", or "debug").              |
| --mixed_precision      | bool    | No       | False      | Whether to use mixed precision.                                             |
| --print_gpu_vram       | bool    | No       | False      | Whether to print per-GPU VRAM footprint.                                    |

*: We do not specify number of total NPUs, which will be infered from the parallel degree as: num_NPUs=DP*TP*PP*SP

Example Commands

Generate with DP=8, TP=4, PP=4, no FSDP:

python main.py --output_dir generated/ --output_name workload_1.%d.et --comm_group_file comm_group_1.json --dp 8 --tp 4 --pp 4 --sp 1 --weight_sharded 0 --chakra_schema_version v0.0.4

Generate with DP=64, TP=1, PP=1, FSDP:

python main.py --output_dir generated/ --output_name workload_2.%d.et --comm_group_file comm_group_2.json --dp 64 --tp 1 --pp 1 --sp 1 --weight_sharded 1 --chakra_schema_version v0.0.4

Generate with DP=4, TP=4, PP=2, SP=2, FSDP, output in JSON format:

python main.py --output_dir generated/ --output_name workload_3.%d.json --comm_group_file comm_group_3.json --dp 4 --tp 4 --pp 2 --sp 2 --weight_sharded 1 --chakra_schema_version json

Tool workflow

Here is a breif workflow about how stg generate traces step by step.

Chakra Schema Version

The schema version used determines compatibility with different tools and repositories:

v0.0.4: Current latest chakra version (by Oct.6 2024).
v0.0.1: Supported for lagacy, not fully tested.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.vscode		.vscode
docs		docs
eg_simulator		eg_simulator
experiment		experiment
models		models
sharding_spreadsheets		sharding_spreadsheets
symbolic_tensor_graph		symbolic_tensor_graph
test_cases		test_cases
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Symbolic Tensor Graph (STG) Generator

Overview

Key Features

Installation

Usage

Example Output:

Parameters

Example Commands

Tool workflow

Chakra Schema Version

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Symbolic Tensor Graph (STG) Generator

Overview

Key Features

Installation

Usage

Example Output:

Parameters

Example Commands

Tool workflow

Chakra Schema Version

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages