-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
when I run the command
python -m lm_eval --model vllm_speculative --model_args "service_script_path=./spec_service.py,spec_reason=True,big_model=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B,small_model=akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SpeculativeReasoner,max_tokens=16384,big_model_gpus=0|1,small_model_gpus=2,pretrained=meta-llama/Llama-2-7b-chat-hf" \
--tasks gsm8k --batch_size auto --apply_chat_template --output_path log_traces/SPECR_32B --log_samples --gen_kwargs "max_gen_toks=16384,thinking_start=\n<think>,thinking_end=\n</think>" After outputting several responses, it gets stuck. At this point, there is GPU memory usage, but the utilization rate remains at 0. Why is this happening, and how can I resolve it?
Metadata
Metadata
Assignees
Labels
No labels