Repo of alpa's multi-model serving system.
This is the official implementation of our OSDI'23 paper: AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
To reproduce all the main results in our paper, please check the artifact folder and follow the instructions in it.
Thanks to the AlpaServe Team, it is because of you that the AdaptServe work has been made possible. This repo also contain AdaptServe's dynamic multi-model serving system.
This is the official implementation of our HPSC'25 paper: AdaptServe: Auto-Scalable DL Serving with Dynamic Model Parallelism. The paper link will coming soon.
To reproduce all the main results in our paper, please check the artifact folder.