Add sglang router minimal/experimental support #3194

Bihan · 2025-10-15T13:09:53Z

Intro
We want to make it possible to create a gateway which extends the gateway functionality with additional features (all sgl-router features such as cache aware routing, etc) while keeping all the standard gateway features (such as authentication, rate limits).

For the user, using such gateway should be very simple, e.g. setting router to sglang in gateway configurations. Eg:

type: gateway
name: sglang-gateway

backend: aws
region: eu-west-1

domain: example.com
router: sglang

The rest for the user should look the same - the same service endpoint, authentication and rate limits working, etc.

While this first experimental version should only bring minimum features - allow to route replicas traffic through the router (dstack’s gateway/ngnix -> sglang-router -> replica workers), in the future this may be extended with router-specific scaling metrics, such as ttft, e2e, Prefill-Decode Disaggregation, etc).

As the first experimental version, the most critical is to come up with the minimum changes that are tested thoroughly that would allow embedding the router: sglang without breaking any existing functionality.

Note:

In this version installation of pip & sglang-router is done in gateway machine, irrespective of whether router:sglang is in gateway config or not. To make it conditional in future, it should be implemented across backends that support gateway.
Modified upstream block of src/dstack/_internal/proxy/gateway/resources/nginx/service.jinja2 to respect router: sglang in gateway config.

upstream {{ domain }}.upstream {
    {% if router == "sglang" %}
    server 127.0.0.1:3000;  # SGLang router on the gateway
    {% else %}
    {% for replica in replicas %}
    server unix:{{ replica.socket }};  # replica {{ replica.id }}
    {% endfor %}
    {% endif %}
}

Created new nginx conf: src/dstack/_internal/proxy/gateway/resources/nginx/sglang_workers.jinja2

This nginx conf forwards HTTP to Unix socket. dstack workers listen on Unix sockets, while the sglang-router speaks HTTP, so this bridge lets the router reach each worker via local TCP ports.

# Worker 1
upstream sglang_worker_1_upstream {
    server unix:/tmp/tmpazynu7m5/replica.sock;
}

server {
    listen 127.0.0.1:10001;
    access_log off; # disable access logs for this internal endpoint

    proxy_read_timeout 300s;
    proxy_send_timeout 300s;

    location / {
        proxy_pass http://sglang_worker_1_upstream;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Connection "";
        proxy_set_header Upgrade $http_upgrade;
    }
}

# Worker 2
upstream sglang_worker_2_upstream {
    server unix:/tmp/tmpazynu7m6/replica.sock;
}

server {
    listen 127.0.0.1:10002;
    access_log off; # disable access logs for this internal endpoint

    proxy_read_timeout 300s;
    proxy_send_timeout 300s;

    location / {
        proxy_pass http://sglang_worker_2_upstream;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Connection "";
        proxy_set_header Upgrade $http_upgrade;
    }
}

peterschmidt85

Add an example of how to test the new router
Please ensure auto-scaling works (incl. downscaling to 0), and also that dstack uses routers' API to add/remove workers without restarting the gateway
And only after that, refactor the code to move the sgl-router implementation to a separate sg-lang-related subclass - to ensure the normal gateway code doesn't have any sgl-router specific code - similar to how each backend encapsulates its own logic
Ensure tests are working

Bihan · 2025-10-16T07:23:52Z

@peterschmidt85

Add an example of how to test the new router

Step 1
Replace return value as shown in below example in method get_dstack_gateway_wheel (exact path see here) .

Eg:

def get_dstack_gateway_wheel(build: str) -> str:
    channel = "release" if settings.DSTACK_RELEASE else "stgn"
    base_url = f"https://dstack-gateway-downloads.s3.amazonaws.com/{channel}"
    if build == "latest":
        r = requests.get(f"{base_url}/latest-version", timeout=5)
        r.raise_for_status()
        build = r.text.strip()
        logger.debug("Found the latest gateway build: %s", build)
    # return f"{base_url}/dstack_gateway-{build}-py3-none-any.whl"
    return "https://bihan-test-bucket.s3.eu-west-1.amazonaws.com/dstack_gateway-0.0.0-py3-none-any.whl"

Step 2

Apply below gateway config.

type: gateway
name: sglang-gateway

backend: aws
region: eu-west-1

domain: example.com
router: sglang

Step 3
Update DNS

Step 4

Apply below service config

type: service
name: sglang-service


image: lmsysorg/sglang:latest
env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct

commands:
  - python -m sglang.launch_server --model-path $MODEL_ID --host 0.0.0.0 --port 8000 --enable-metrics

port: 8000
model: meta-llama/Llama-3.2-3B-Instruct

resources:
  gpu: 24GB

replicas: 2

Step 5
After you see /health endpoint returning 200 as show in below logs, your service is ready for query.

Logs:

[2025-10-16 07:01:38] INFO:     Application startup complete.
[2025-10-16 07:01:38] INFO:     Uvicorn running on https://sglang-service.bihan-gateway.dstack.ai (Press CTRL+C to quit)
[2025-10-16 07:01:39] INFO:     127.0.0.1:3580 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-10-16 07:01:39] Prefill batch. #new-seq: 1, #new-token: 7, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, 
[2025-10-16 07:02:07] INFO:     127.0.0.1:3906 - "GET /health HTTP/1.1" 503 Service Unavailable
[2025-10-16 07:02:46] INFO:     127.0.0.1:3592 - "POST /generate HTTP/1.1" 200 OK
[2025-10-16 07:02:46] The server is fired up and ready to roll!
[2025-10-16 07:03:07] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, 
[2025-10-16 07:03:08] INFO:     127.0.0.1:3516 - "GET /health HTTP/1.1" 200 OK
[2025-10-16 07:03:08] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, 
[2025-10-16 07:03:09] INFO:     127.0.0.1:3790 - "GET /health HTTP/1.1" 200 OK

Step 6
You can then either use dstack-frontend http://localhost:3000/projects/main/models/sglang-service
Or

You you can query from terminal

curl https://sglang-service.example.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer <token>" \
  --data '{
    "model": "meta-llama/Llama-3.2-3B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is Deep Learning?"
      }
    ]
  }'

Note: You can check sglang-router logs: cat ~/dstack/router_logs/sgl-router.

Also, maybe in the future we can show sglang-router's log instead of replica's log in dstack CLI

Eg:

sglang-service provisioning completed (running)
Service is published at:
  https://sglang-service.bihan-gateway.dstack.ai
Model meta-llama/Llama-3.2-3B-Instruct is published at:
  https://gateway.bihan-gateway.dstack.ai


2025-10-16 06:59:05  INFO sglang_router_rs::core::worker_manager: src/core/worker_manager.rs:1077: Waiting for 2 workers to become healthy. Unhealthy: ["http://127.0.0.1:10002", "http://127.0.0.1:10001"]
...
...
2025-10-16 07:03:08  INFO sglang_router_rs::core::worker_manager: src/core/worker_manager.rs:1111: All 2 workers are healthy: ["http://127.0.0.1:10002", "http://127.0.0.1:10001"]
...
...
2025-10-16 07:03:08  INFO sglang_router_rs::server: src/server.rs:1066: Router ready | workers: ["http://127.0.0.1:10002", "http://127.0.0.1:10001"]
2025-10-16 07:03:08  INFO sglang_router_rs::server: src/server.rs:1094: Starting server on 0.0.0.0:3000

Bihan · 2025-10-20T05:55:23Z

Will send a new PR

Bihan Rana added 11 commits October 13, 2025 14:56

Add SGLang Router Min Support

b28898a

Add Test Log to check Registration conf

82bae8e

Add start sglang-router

27c5204

Add sglang_workers jinga template

b2f1093

Modify service.jinja2 upstream block

3871f05

Add sglang log file

fa6d992

Add sglang router clean up in unregister method

4a47d86

Add test log to check unregister

ccae80e

Increase sglang router-request-timeout

0b7a6a1

Change sglang process to sglang::router

b32c8dd

Clean development code

36e84e6

Bihan requested review from jvstme and peterschmidt85 October 15, 2025 13:10

peterschmidt85 requested changes Oct 16, 2025

View reviewed changes

Add sglang autoscaling

79eef94

Bihan closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sglang router minimal/experimental support #3194

Add sglang router minimal/experimental support #3194

Uh oh!

Bihan commented Oct 15, 2025 •

edited

Loading

Uh oh!

peterschmidt85 left a comment •

edited

Loading

Uh oh!

Bihan commented Oct 16, 2025

Uh oh!

Bihan commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add sglang router minimal/experimental support #3194

Add sglang router minimal/experimental support #3194

Uh oh!

Conversation

Bihan commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterschmidt85 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bihan commented Oct 16, 2025

Uh oh!

Bihan commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bihan commented Oct 15, 2025 •

edited

Loading

peterschmidt85 left a comment •

edited

Loading