Skip to content

THU-DSP-LAB/ventus-pytorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

100,571 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ventus-pytorch

ventus-pytorch is a Ventus-enabled PyTorch fork for running small-model inference on the Ventus software stack. This repository includes the PyTorch-side integration, Ventus kernel assets, and a minimal QEMU runtime image for reproducible execution.

Release Artifacts

Do not commit the runtime image into git. Publish it through GitHub Releases instead.

Recommended release assets:

  • ventus-runtime/ventus-runtime.qcow2
  • ventus-runtime/Makefile

Users should download the runtime image from GitHub Releases, and place it at:

ventus-runtime/ventus-runtime.qcow2

so that make qemu can find it directly.

The qcow2 image is prepared as a minimal runtime VM. On boot it:

  • logs in automatically on the serial console as root
  • enters /opt/ventus
  • sources ./env.sh

Runtime Requirements

For stable execution, allocate at least:

  • 10 CPU cores
  • 8G memory

rtlsim was built from a Verilator configuration using 8 threads, so undersized CPU allocation is not recommended.

The provided Makefile defaults are higher than the minimum.

Run The VM

From ventus-runtime/:

make qemu

Or specify resources explicitly:

make qemu VM_CPUS=10 VM_MEMORY=8G

After boot, the shell is already in /opt/ventus with env.sh loaded.

Packaged Model Layout

The runtime image keeps the execution environment small:

  • /opt/ventus/install/
  • /opt/ventus/ventus-pytorch/torch/
  • /opt/ventus/models/gpt2/run_generate.py
  • /opt/ventus/models/pythia-410m-deduped/run_generate.py
  • /opt/ventus/models/qwen2.5-0.5b-instruct/run_generate.py

Model weights are intentionally not bundled into the qcow2 image. Before running a model, download its Hugging Face checkpoint files into the matching directory under /opt/ventus/models/.

Examples:

/opt/ventus/models/gpt2/
/opt/ventus/models/pythia-410m-deduped/
/opt/ventus/models/qwen2.5-0.5b-instruct/

Each directory should contain the model files plus the packaged run_generate.py.

Verified Native Inference

Native Ventus inference has been validated in three precisions:

  • GPT-2: TF32
  • Pythia-410m-deduped: FP16
  • Qwen2.5-0.5B-Instruct: BF16

Current reference results:

  • GPT-2 remains the minimal text-generation smoke test.
  • Qwen2.5-0.5B-Instruct on rtlsim generated one token from Hello, my name is and produced Hello, my name is Alex in about 7:00:56.18.
  • Pythia-410m-deduped on rtlsim generated one token from Hello, my name is and produced Hello, my name is John in about 4:54:40.50.

Stability Notes

Pythia-410m-deduped currently runs in FP16, but stability is not yet sufficient for KV-cache-based decoding. Keep use_cache=False for this model. With KV cache enabled, small numerical error can accumulate and propagate into NaN values.

Run The Packaged Scripts

Inside the VM:

export VENTUS_BACKEND=spike
python models/gpt2/run_generate.py
python models/pythia-410m-deduped/run_generate.py
python models/qwen2.5-0.5b-instruct/run_generate.py

For rtlsim, switch the backend before launching:

export VENTUS_BACKEND=rtlsim
python models/qwen2.5-0.5b-instruct/run_generate.py

About

A simple PyTorch backend implementation for Ventus GPGPU.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 61.6%
  • C++ 30.7%
  • Cuda 2.7%
  • C 1.4%
  • Objective-C++ 1.1%
  • CMake 0.6%
  • Other 1.9%