-
Notifications
You must be signed in to change notification settings - Fork 29
Description
2025-10-11 22:12:51,592 INFO worker.py:1951 -- Started a local Ray instance.
�[36m(WorkerDict pid=4130101)�[0m Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
�[36m(WorkerDict pid=4130101)�[0m You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4130101)�[0m [rank3]:[W1011 22:13:34.127253242 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 50%|█████ | 1/2 [00:03<00:03, 3.01s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.74s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.78s/it]
�[36m(WorkerDict pid=4129551)�[0m Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.�[32m [repeated 3x across cluster]�[0m
�[36m(WorkerDict pid=4130099)�[0m You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour�[32m [repeated 2x across cluster]�[0m
�[36m(WorkerDict pid=4130099)�[0m [rank1]:[W1011 22:13:34.127245472 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.�[32m [repeated 2x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 50%|█████ | 1/2 [00:02<00:02, 2.92s/it]
�[36m(WorkerDict pid=4129551)�[0m [rank0]:[W1011 22:13:40.446092096 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.68s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.72s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.43s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.59s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.57s/it]
�[36m(WorkerDict pid=4129551)�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 3%|▎ | 1/35 [00:00<00:22, 1.52it/s]
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 31%|███▏ | 11/35 [00:05<00:12, 1.91it/s]�[32m [repeated 20x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 60%|██████ | 21/35 [00:11<00:07, 1.85it/s]�[32m [repeated 20x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 89%|████████▊ | 31/35 [00:16<00:02, 1.83it/s]�[32m [repeated 20x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 91%|█████████▏| 32/35 [00:17<00:01, 1.87it/s]
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.05s/it]
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.70it/s]
�[36m(WorkerDict pid=4129551)�[0m /seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=4129551)�[0m warnings.warn(
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 89%|████████▊ | 31/35 [00:16<00:02, 1.85it/s]
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 97%|█████████▋| 34/35 [00:18<00:00, 1.90it/s]�[32m [repeated 5x across cluster]�[0m
�[36m(main_task pid=4129158)�[0m Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4130101, ip=172.22.4.103, actor_id=ee12783c35d9f6b46fc7b79601000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fc757258e00>)
Traceback (most recent call last):
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/main.py", line 183, in
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
�[36m(main_task pid=4129158)�[0m return getattr(self.worker_dict[key], name)(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
main()
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/main.py", line 45, in main
�[36m(main_task pid=4129158)�[0m return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
�[36m(main_task pid=4129158)�[0m with self.rollout_sharding_manager:
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.get(main_task.remote(ppo_config))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
�[36m(main_task pid=4129158)�[0m load_dtensor_weights(
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
return fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m weight_loader(actor_weights, vllm_model)
^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/private/client_mode_hook.py", line 104, in wrapper
�[36m(main_task pid=4129158)�[0m weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
�[36m(main_task pid=4129158)�[0m param.data.copy(loaded_weight)
return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m return disable_fn(*args, **kwargs)
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
�[36m(main_task pid=4129158)�[0m return fn(*args, **kwargs)
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/_private/worker.py", line 968, in get_objects
�[36m(main_task pid=4129158)�[0m return DTensor._op_dispatcher.dispatch(
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/dispatch.py", line 167, in dispatch
�[36m(main_task pid=4129158)�[0m op_info = self.unwrap_to_op_info(op_call, args, kwargs)
raise value.as_instanceof_cause()
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): �[36mray::main_task()�[39m (pid=4129158, ip=172.22.4.103)
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/main.py", line 96, in main_task
trainer.fit()
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/ray_trainer.py", line 779, in fit
gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 46, in func
output = ray.get(output)
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4129551, ip=172.22.4.103, actor_id=94e8fdbcef4df9cf94ab1dfa01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7ef371281190>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
with self.rollout_sharding_manager:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
load_dtensor_weights(
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
weight_loader(actor_weights, vllm_model)
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
param.data.copy(loaded_weight)
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
return DTensor._op_dispatcher.dispatch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
op_info = self.unwrap_to_op_info(op_call, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
raise RuntimeError(
RuntimeError: aten.copy.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
�[36m(main_task pid=4129158)�[0m self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
�[36m(main_task pid=4129158)�[0m raise RuntimeError(
�[36m(main_task pid=4129158)�[0m RuntimeError: aten.copy.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(main_task pid=4129158)�[0m Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4130100, ip=172.22.4.103, actor_id=60ce5139a15e106b4341d87601000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f5db100ce60>)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
�[36m(main_task pid=4129158)�[0m return getattr(self.worker_dict[key], name)(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
�[36m(main_task pid=4129158)�[0m return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
�[36m(main_task pid=4129158)�[0m with self.rollout_sharding_manager:
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
�[36m(main_task pid=4129158)�[0m load_dtensor_weights(
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
�[36m(main_task pid=4129158)�[0m weight_loader(actor_weights, vllm_model)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
�[36m(main_task pid=4129158)�[0m weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
�[36m(main_task pid=4129158)�[0m param.data.copy(loaded_weight)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
�[36m(main_task pid=4129158)�[0m return disable_fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
�[36m(main_task pid=4129158)�[0m return fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
�[36m(main_task pid=4129158)�[0m return DTensor._op_dispatcher.dispatch(
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
�[36m(main_task pid=4129158)�[0m op_info = self.unwrap_to_op_info(op_call, args, kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
�[36m(main_task pid=4129158)�[0m self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
�[36m(main_task pid=4129158)�[0m raise RuntimeError(
�[36m(main_task pid=4129158)�[0m RuntimeError: aten.copy.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(main_task pid=4129158)�[0m Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4130099, ip=172.22.4.103, actor_id=94a30e1935c25512dd98571301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f6c20210f50>)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
�[36m(main_task pid=4129158)�[0m return getattr(self.worker_dict[key], name)(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
�[36m(main_task pid=4129158)�[0m return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
�[36m(main_task pid=4129158)�[0m with self.rollout_sharding_manager:
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
�[36m(main_task pid=4129158)�[0m load_dtensor_weights(
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
�[36m(main_task pid=4129158)�[0m weight_loader(actor_weights, vllm_model)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
�[36m(main_task pid=4129158)�[0m weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
�[36m(main_task pid=4129158)�[0m param.data.copy(loaded_weight)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
�[36m(main_task pid=4129158)�[0m return disable_fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
�[36m(main_task pid=4129158)�[0m return fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
�[36m(main_task pid=4129158)�[0m return DTensor._op_dispatcher.dispatch(
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
�[36m(main_task pid=4129158)�[0m op_info = self.unwrap_to_op_info(op_call, args, kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
�[36m(main_task pid=4129158)�[0m self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
�[36m(main_task pid=4129158)�[0m raise RuntimeError(
�[36m(main_task pid=4129158)�[0m RuntimeError: aten.copy.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.04s/it]
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.70it/s]
�[36m(WorkerDict pid=4130099)�[0m /seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .�[32m [repeated 3x across cluster]�[0m
�[36m(WorkerDict pid=4130099)�[0m warnings.warn(�[32m [repeated 3x across cluster]�[0m