This repository contains the Voyager DNN accelerator generator, a tool for generating and evaluating deep neural network accelerators.
For more details about the methodology and architecture, please refer to the paper: https://arxiv.org/abs/2509.15205
git clone https://github.com/StanfordAccelerate/voyager.git
cd voyager
git submodule update --init --recursiveVoyager does not require commercial tools for accuracy testing, SystemC simulations, or design space exploration.
However, Voyager requires the following EDA tools for RTL generation and simulation:
- Catapult HLS (tested with version 2024.2_1) - Used for high-level synthesis
- VCS (tested with T-2022.06-SP2) - Verilog simulator for RTL verification
- Verdi (tested with T-2022.06-SP2) - Debug tool for waveform analysis
- VCS_GNU_PACKAGE (tested with S-2021.09) - Required compiler toolchain for VCS
Note: You must load these tools every time you want to use this project. The method for activating these tools depends on your EDA environment setup (e.g., module system, environment scripts, etc.).
Using Conda to perform the setup is preferred due to its portability. Install Conda using miniforge.
Miniforge is recommended as it uses conda-forge as the default channel, which provides better compatibility with the required packages.
Create a Conda environment from the environment.yml file:
conda env create -f environment.ymlActivate the environment:
conda activate accelerator-envNote: You must activate this environment every time you want to use this project.
After activating the environment, install the required submodule packages:
pip install ./interstellar
pip install ./voyager-compilerAdditionally, add the project root path to the conda environment's stored variables, and reactivate the environment:
conda env config vars set PROJECT_ROOT=$(pwd)
conda deactivate
conda reactivate accelerator-envIn the future, when initializing accelerator-env, you may see a warning about overwriting PROJECT_ROOT and CODEGEN_DIR. This is expected.
If conda is not being used, then please install the packages listed in the environment.yml file manually.
This includes installing interstellar and voyager-compiler from their corresponding subdirectories.
Please also source env.sh in the top directory every time you want to use the project.
To test on established AI workloads, users first need to set up an account on HuggingFace.
To run the tests used in the example, the following assets also require users to request and acquire access on HuggingFace:
Finally, to use these assets in the code, create a HuggingFace access token with read permissions.
Use the access token to authenticate using the HuggingFace CLI (installed as part of the environment):
hf auth login/data: Test datasets and input data used for simulation and accuracy evaluation/lib: External libraries and dependencies required by the project/models: Pre-trained DNN model definitions and weights used for verification and evaluation/scripts: Technology-specific.tclscripts for RTL generation and synthesis configuration/src: SystemC implementation of the accelerator architecture/test: Testing infrastructure, including SystemC/C++ testbenches that invoke the acceleratorMakefile: Build configuration for compiling the SystemC source coderun_regression.py: Main regression testing script that orchestrates accuracy evaluation, functional simulations, and RTL generation
Accelerator configurations are specified through environment variables. The following variables control the accelerator architecture:
DATATYPE: The datatype/quantization scheme used for computations. Some of the supported values include:BF16,E4M3,INT8,MXINT8IC_DIMENSION: The systolic array dimension size for the input channel (reduction) dimensionOC_DIMENSION: The systolic array dimension size for the output channel dimensionINPUT_BUFFER_SIZE: The depth (number of entries) of the input activation bufferWEIGHT_BUFFER_SIZE: The depth (number of entries) of the weight bufferACCUM_BUFFER_SIZE: The depth (number of entries) of the accumulation buffer
Additional variables for RTL generation:
CLOCK_PERIOD: Target clock period in nanoseconds for RTL synthesisTECHNOLOGY: Technology name (must match a.tclfile inscripts/tech/)
The run_regression.py script is the main entry point for running various types of evaluations and simulations, including:
- Accuracy evaluation on full models and datasets
- Functional SystemC simulations for verification
- Cycle-accurate RTL simulations after HLS synthesis
For detailed information about available options and underlying steps, refer to the script's help message: python run_regression.py --help. Upon completion, results and log files are generated in the regression_results/latest/ directory.
Accuracy evaluation runs inference on full models and datasets to measure the impact of quantization and accelerator configuration on model accuracy. This is useful for validating that the accelerator maintains acceptable accuracy compared to floating-point baselines.
Supported models and datasets:
- BERT on SST-2 (sentiment classification)
- MobileBERT on SST-2 (sentiment classification)
- MobileBERT on SQuAD (question answering)
- ResNet18 on ImageNet (image classification)
- ResNet50 on ImageNet (image classification)
- ViT (Vision Transformer) on ImageNet (image classification)
- MobileNetV2 on ImageNet (image classification)
Example: To evaluate the accuracy of an INT8 16×16 systolic array accelerator on the ResNet-18 model using the ImageNet dataset with 32 parallel processes:
DATATYPE=INT8 IC_DIMENSION=16 OC_DIMENSION=16 python run_regression.py --models resnet18 --dataset imagenet --sims accuracy --num_processes 32Functional SystemC simulations verify the correctness of the SystemC accelerator implementation by comparing its outputs against a reference (golden) model. These simulations are not cycle-accurate but validate that the accelerator produces functionally correct results. The simulations run the SystemC model on every layer of the specified model(s) and compare outputs with the functional gold model.
This is typically the first verification step before proceeding to RTL generation, as it's faster than RTL simulation and catches functional bugs early.
Example: To verify an E4M3 32×32 accelerator on ResNet-18 and MobileBERT layers:
DATATYPE=E4M3 IC_DIMENSION=32 OC_DIMENSION=32 python run_regression.py --models resnet18,mobilebert_encoder --sims systemc --num_processes 32This workflow generates synthesizable RTL from the SystemC model using Catapult HLS and then runs cycle-accurate RTL simulations. This is the most detailed verification step and produces actual hardware RTL that can be pushed through downstream EDA tools.
Prerequisites: Before running RTL generation, you must create a technology-specific configuration file. In the scripts/tech/ directory, create a .tcl file named after your technology (e.g., tsmc40.tcl). This file configures Catapult HLS with your target technology library.
Technology file template:
# Point to your Catapult-generated characterized library.
# Refer to the Catapult library builder documentation for more details.
solution options set ComponentLibs/SearchPath {/path/to/catapult_library} -append
solution library add catapult_library_name
# Implement the following function that returns the name of the memory instance
# for a given depth and width. Refer to the Catapult memory library generation
# documentation for more details.
proc get_memory_name {is_sp depth width} {
# Return the appropriate memory name based on parameters
# is_sp: whether it's a single-port memory
# depth: memory depth
# width: memory width in bits
...
}Example: To run RTL generation on an INT8 32x32 accelerator with a 5.0ns clock period using the generic technology and evaluate runtime on MobileBERT:
DATATYPE=INT8 IC_DIMENSION=32 OC_DIMENSION=32 CLOCK_PERIOD=5.0 TECHNOLOGY=generic python run_regression.py --models mobilebert_encoder --sims rtl --num_processes 32 --uniquify_layersThe --uniquify_layers flag can be used to reduce the evaluation runtime by only running the layers with unique shapes.
Output: The generated RTL files can be found in the build/ directory, organized by configuration (datatype, dimensions, etc.) in separate subdirectories. Each configuration directory contains the synthesized Verilog RTL and other Catapult-generated collateral.