Skip to content

flexaihq/tt-inference-server

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [🛠️ Experimental] are under active development. If you encounter setup or stability problems with any model please file an issue and our team will address it.

Model Support

For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide. The list below shows the default model implementations supported.

Model Weights Hardware Status tt-metal commit vLLM commit Docker Image
AFM-4.5B n300, WH-QuietBox/WH-LoudBox (T3K) 🛠️ Experimental ae65ee5 35f023f 0.2.0-ae65ee5-35f023f
gemma-3-1b-it n150 🛠️ Experimental dc85f59 87fe4a4 0.2.0-dc85f59-87fe4a4
gemma-3-4b-it
medgemma-4b-it
n300, n150 🛠️ Experimental dc85f59 87fe4a4 0.2.0-dc85f59-87fe4a4
gemma-3-27b-it
medgemma-27b-it
WH-QuietBox/WH-LoudBox (T3K) 🛠️ Experimental 17a5973 aa4ae1e 0.2.0-17a5973-aa4ae1e
Qwen3-8B n150, n300, WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟢 Complete 2496be4 2dcee0c 0.2.0-2496be4-2dcee0c
Qwen3-32B WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟡 Functional 2496be4 2dcee0c 0.2.0-2496be4-2dcee0c
Mistral-7B-Instruct-v0.3 n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟢 Complete 9b67e09 a91b644 0.2.0-9b67e09-a91b644
QwQ-32B WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟡 Functional 2496be4 2dcee0c 0.2.0-2496be4-2dcee0c
Qwen2.5-72B
Qwen2.5-72B-Instruct
WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟢 Complete 2496be4 2dcee0c 0.2.0-2496be4-2dcee0c
Qwen2.5-7B
Qwen2.5-7B-Instruct
n300 🛠️ Experimental v0.62.0-rc10 c348d08 0.2.0-v0.62.0-rc10-c348d08
Llama-3.3-70B-Instruct
Llama-3.1-70B
Llama-3.1-70B-Instruct
DeepSeek-R1-Distill-Llama-70B
Galaxy 🟢 Complete 2496be4 2dcee0c 0.2.0-2496be4-2dcee0c
Llama-3.3-70B-Instruct
Llama-3.1-70B
Llama-3.1-70B-Instruct
DeepSeek-R1-Distill-Llama-70B
WH-QuietBox/WH-LoudBox (T3K) 🟡 Functional 9b67e09 a91b644 0.2.0-9b67e09-a91b644
Llama-3.3-70B-Instruct
Llama-3.1-70B
Llama-3.1-70B-Instruct
DeepSeek-R1-Distill-Llama-70B
BH-QuietBox (P150X4) 🟡 Functional 55fd115 aa4ae1e 0.2.0-55fd115-aa4ae1e
Llama-3.2-11B-Vision
Llama-3.2-11B-Vision-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K) 🟡 Functional v0.61.1-rc1 5cbc982 0.2.0-v0.61.1-rc1-5cbc982
Llama-3.2-90B-Vision
Llama-3.2-90B-Vision-Instruct
WH-QuietBox/WH-LoudBox (T3K) 🟡 Functional v0.61.1-rc1 5cbc982 0.2.0-v0.61.1-rc1-5cbc982
Llama-3.2-1B
Llama-3.2-1B-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟡 Functional 9b67e09 a91b644 0.2.0-9b67e09-a91b644
Llama-3.2-3B
Llama-3.2-3B-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟡 Functional 20edc39 03cb300 0.2.0-20edc39-03cb300
Llama-3.1-8B
Llama-3.1-8B-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟢 Complete 9b67e09 a91b644 0.2.0-9b67e09-a91b644
Llama-3.1-8B
Llama-3.1-8B-Instruct
p100, p150 🛠️ Experimental 55fd115 aa4ae1e 0.2.0-55fd115-aa4ae1e
Llama-3.1-8B
Llama-3.1-8B-Instruct
BH-QuietBox (P150X4) 🟢 Complete 55fd115 aa4ae1e 0.2.0-55fd115-aa4ae1e
Llama-3.1-8B
Llama-3.1-8B-Instruct
Galaxy 🟡 Functional 2496be4 2dcee0c 0.2.0-2496be4-2dcee0c
Qwen2.5-Coder-32B-Instruct WH-QuietBox/WH-LoudBox (T3K) 🛠️ Experimental 17a5973 aa4ae1e 0.2.0-17a5973-aa4ae1e
stable-diffusion-xl-base-1.0 n300, WH-QuietBox/WH-LoudBox (T3K), Galaxy, n150 🟢 Complete 2496be4 N/A 0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
stable-diffusion-3.5-large WH-QuietBox/WH-LoudBox (T3K), Galaxy 🛠️ Experimental 2496be4 N/A 0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
openai-whisper-large-v3 Galaxy, n150 🛠️ Experimental 2496be4 N/A 0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.9%
  • Shell 3.9%
  • Dockerfile 1.3%
  • JavaScript 0.5%
  • CSS 0.3%
  • HTML 0.1%