- 
                Notifications
    You must be signed in to change notification settings 
- Fork 199
Add sglang router minimal/experimental support #3194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add an example of how to test the new router
- Please ensure auto-scaling works (incl. downscaling to 0), and also that dstack uses routers' API to add/remove workers without restarting the gateway
- And only after that, refactor the code to move the sgl-router implementation to a separate sg-lang-related subclass - to ensure the normal gateway code doesn't have any sgl-router specific code - similar to how each backend encapsulates its own logic
- Ensure tests are working
| 
 Step 1 Eg: Step 2 Apply below gateway config. Step 3 Step 4 Apply below service config Step 5 Logs: Step 6 You you can query from terminal Note: You can check sglang-router logs: cat ~/dstack/router_logs/sgl-router. Also, maybe in the future we can show sglang-router's log instead of replica's log in dstack CLI Eg:  | 
| Will send a new PR | 
Intro
We want to make it possible to create a gateway which extends the gateway functionality with additional features (all sgl-router features such as cache aware routing, etc) while keeping all the standard gateway features (such as authentication, rate limits).
For the user, using such gateway should be very simple, e.g. setting router to sglang in gateway configurations. Eg:
The rest for the user should look the same - the
same service endpoint,authenticationandrate limits working,etc.While this first
experimental versionshould only bring minimum features - allow to route replicas traffic through the router (dstack’s gateway/ngnix -> sglang-router -> replica workers), in the future this may be extended with router-specific scaling metrics, such as ttft, e2e, Prefill-Decode Disaggregation, etc).As the first experimental version, the most critical is to come up with the minimum changes that are tested thoroughly that would allow embedding the
router: sglangwithout breaking any existing functionality.Note:
In this version installation of pip & sglang-router is done in gateway machine, irrespective of whether
router:sglangis in gateway config or not. To make it conditional in future, it should be implemented across backends that support gateway.Modified upstream block of
src/dstack/_internal/proxy/gateway/resources/nginx/service.jinja2to respectrouter: sglangin gateway config.src/dstack/_internal/proxy/gateway/resources/nginx/sglang_workers.jinja2This nginx conf forwards HTTP to Unix socket. dstack workers listen on Unix sockets, while the sglang-router speaks HTTP, so this bridge lets the router reach each worker via local TCP ports.