Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.
https://github.com/tenstorrent/tt-inference-server
Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.
Note: models with Status [🛠️ Experimental] are under active development. If you encounter setup or stability problems with any model please file an issue and our team will address it.
For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide. The list below shows the default model implementations supported.