TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [🛠️ Experimental] are under active development. If you encounter setup or stability problems with any model please file an issue and our team will address it.

Model Support

For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide. The list below shows the default model implementations supported.

Model Weights	Hardware	Status	tt-metal commit	vLLM commit	Docker Image
AFM-4.5B	n300, WH-QuietBox/WH-LoudBox (T3K)	🛠️ Experimental	ae65ee5	35f023f	0.2.0-ae65ee5-35f023f
gemma-3-1b-it	n150	🛠️ Experimental	dc85f59	87fe4a4	0.2.0-dc85f59-87fe4a4
gemma-3-4b-it medgemma-4b-it	n300, n150	🛠️ Experimental	dc85f59	87fe4a4	0.2.0-dc85f59-87fe4a4
gemma-3-27b-it medgemma-27b-it	WH-QuietBox/WH-LoudBox (T3K)	🛠️ Experimental	17a5973	aa4ae1e	0.2.0-17a5973-aa4ae1e
Qwen3-8B	n150, n300, WH-QuietBox/WH-LoudBox (T3K), Galaxy	🟢 Complete	2496be4	2dcee0c	0.2.0-2496be4-2dcee0c
Qwen3-32B	WH-QuietBox/WH-LoudBox (T3K), Galaxy	🟡 Functional	2496be4	2dcee0c	0.2.0-2496be4-2dcee0c
Mistral-7B-Instruct-v0.3	n300, WH-QuietBox/WH-LoudBox (T3K), n150	🟢 Complete	9b67e09	a91b644	0.2.0-9b67e09-a91b644
QwQ-32B	WH-QuietBox/WH-LoudBox (T3K), Galaxy	🟡 Functional	2496be4	2dcee0c	0.2.0-2496be4-2dcee0c
Qwen2.5-72B Qwen2.5-72B-Instruct	WH-QuietBox/WH-LoudBox (T3K), Galaxy	🟢 Complete	2496be4	2dcee0c	0.2.0-2496be4-2dcee0c
Qwen2.5-7B Qwen2.5-7B-Instruct	n300	🛠️ Experimental	v0.62.0-rc10	c348d08	0.2.0-v0.62.0-rc10-c348d08
Llama-3.3-70B-Instruct Llama-3.1-70B Llama-3.1-70B-Instruct DeepSeek-R1-Distill-Llama-70B	Galaxy	🟢 Complete	2496be4	2dcee0c	0.2.0-2496be4-2dcee0c
Llama-3.3-70B-Instruct Llama-3.1-70B Llama-3.1-70B-Instruct DeepSeek-R1-Distill-Llama-70B	WH-QuietBox/WH-LoudBox (T3K)	🟡 Functional	9b67e09	a91b644	0.2.0-9b67e09-a91b644
Llama-3.3-70B-Instruct Llama-3.1-70B Llama-3.1-70B-Instruct DeepSeek-R1-Distill-Llama-70B	BH-QuietBox (P150X4)	🟡 Functional	55fd115	aa4ae1e	0.2.0-55fd115-aa4ae1e
Llama-3.2-11B-Vision Llama-3.2-11B-Vision-Instruct	n300, WH-QuietBox/WH-LoudBox (T3K)	🟡 Functional	v0.61.1-rc1	5cbc982	0.2.0-v0.61.1-rc1-5cbc982
Llama-3.2-90B-Vision Llama-3.2-90B-Vision-Instruct	WH-QuietBox/WH-LoudBox (T3K)	🟡 Functional	v0.61.1-rc1	5cbc982	0.2.0-v0.61.1-rc1-5cbc982
Llama-3.2-1B Llama-3.2-1B-Instruct	n300, WH-QuietBox/WH-LoudBox (T3K), n150	🟡 Functional	9b67e09	a91b644	0.2.0-9b67e09-a91b644
Llama-3.2-3B Llama-3.2-3B-Instruct	n300, WH-QuietBox/WH-LoudBox (T3K), n150	🟡 Functional	20edc39	03cb300	0.2.0-20edc39-03cb300
Llama-3.1-8B Llama-3.1-8B-Instruct	n300, WH-QuietBox/WH-LoudBox (T3K), n150	🟢 Complete	9b67e09	a91b644	0.2.0-9b67e09-a91b644
Llama-3.1-8B Llama-3.1-8B-Instruct	p100, p150	🛠️ Experimental	55fd115	aa4ae1e	0.2.0-55fd115-aa4ae1e
Llama-3.1-8B Llama-3.1-8B-Instruct	BH-QuietBox (P150X4)	🟢 Complete	55fd115	aa4ae1e	0.2.0-55fd115-aa4ae1e
Llama-3.1-8B Llama-3.1-8B-Instruct	Galaxy	🟡 Functional	2496be4	2dcee0c	0.2.0-2496be4-2dcee0c
Qwen2.5-Coder-32B-Instruct	WH-QuietBox/WH-LoudBox (T3K)	🛠️ Experimental	17a5973	aa4ae1e	0.2.0-17a5973-aa4ae1e
stable-diffusion-xl-base-1.0	n300, WH-QuietBox/WH-LoudBox (T3K), Galaxy, n150	🟢 Complete	2496be4	N/A	0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
stable-diffusion-3.5-large	WH-QuietBox/WH-LoudBox (T3K), Galaxy	🛠️ Experimental	2496be4	N/A	0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
openai-whisper-large-v3	Galaxy, n150	🛠️ Experimental	2496be4	N/A	0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
.cursor/rules		.cursor/rules
.github		.github
benchmarking		benchmarking
docs		docs
evals		evals
locust		locust
scripts		scripts
tests		tests
tt-media-server		tt-media-server
tt-metal-sdxl/tt_model_runners/forge_runners		tt-metal-sdxl/tt_model_runners/forge_runners
tt-metal-yolov4		tt-metal-yolov4
utils		utils
vllm-tt-metal-llama3		vllm-tt-metal-llama3
workflows		workflows
.cursorignore		.cursorignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
docker-entrypoint.sh		docker-entrypoint.sh
model_specs_output.json		model_specs_output.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
run.py		run.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TT-Inference-Server

Official Repository

Getting Started

Model Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

flexaihq/tt-inference-server

Folders and files

Latest commit

History

Repository files navigation

TT-Inference-Server

Official Repository

Getting Started

Model Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages