Skip to content

Current vidur backend support #60

@nba556677go

Description

@nba556677go

We’re attempting to reproduce the simulation results and observed that when comparing against vLLM 0.9.1 benchmarks, the P50 latency differs by 700%. May I ask if vLLM v1 is supported by Vidur? If not, which framework and version does Vidur use to reproduce the published results?

Specifically, when running the example command in the README (shown below), which LLM engine should we use to validate the simulation output? Is Vidur’s simulation based on vLLM or Sarathi-Serve?

When using vLLM 0.9.1, the mooncake_conversation_trace.csv trace fails because the total token length exceeds the max_model_len = 8192 limit for Meta-Llama-3-8B. Even after scaling down the token length and rerunning, the simulated latency still does not match vLLM’s measurements. Which framework does Vidur currently support, and what trace/configuration settings would you recommend for reproducing the results accurately?

Image

python -m vidur.main \
--time_limit 10800 \
--replica_config_model_name meta-llama/Meta-Llama-3-8B \
--replica_config_device h100 \
--replica_config_network_device h100_dgx \
--cluster_config_num_replicas 8 \
--replica_config_tensor_parallel_size 1 \
--replica_config_num_pipeline_stages 1 \
--request_generator_config_type synthetic \
--synthetic_request_generator_config_num_requests 128 \
--length_generator_config_type trace \
--trace_request_length_generator_config_trace_file ./data/processed_traces/mooncake_conversation_trace.csv \
--interval_generator_config_type poisson \
--poisson_request_interval_generator_config_qps 8.0 \
--global_scheduler_config_type round_robin \
--replica_scheduler_config_type vllm_v1 \
--vllm_v1_scheduler_config_chunk_size 512 \
--vllm_v1_scheduler_config_batch_size_cap 512 \
--cache_config_enable_prefix_caching

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions