Skip to content

an error occurs:RuntimeError: aten.copy_.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators! #105

@ddzovo

Description

@ddzovo

2025-10-11 22:12:51,592 INFO worker.py:1951 -- Started a local Ray instance.
�[36m(WorkerDict pid=4130101)�[0m Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
�[36m(WorkerDict pid=4130101)�[0m You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4130101)�[0m [rank3]:[W1011 22:13:34.127253242 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 50%|█████ | 1/2 [00:03<00:03, 3.01s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.74s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.78s/it]
�[36m(WorkerDict pid=4129551)�[0m Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.�[32m [repeated 3x across cluster]�[0m
�[36m(WorkerDict pid=4130099)�[0m You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour�[32m [repeated 2x across cluster]�[0m
�[36m(WorkerDict pid=4130099)�[0m [rank1]:[W1011 22:13:34.127245472 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.�[32m [repeated 2x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 50%|█████ | 1/2 [00:02<00:02, 2.92s/it]
�[36m(WorkerDict pid=4129551)�[0m [rank0]:[W1011 22:13:40.446092096 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
�[36m(WorkerDict pid=4129551)�[0m
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.68s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.72s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.43s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.59s/it]
�[36m(WorkerDict pid=4129551)�[0m
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.57s/it]
�[36m(WorkerDict pid=4129551)�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 3%|▎ | 1/35 [00:00<00:22, 1.52it/s]
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s]
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 31%|███▏ | 11/35 [00:05<00:12, 1.91it/s]�[32m [repeated 20x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 60%|██████ | 21/35 [00:11<00:07, 1.85it/s]�[32m [repeated 20x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 89%|████████▊ | 31/35 [00:16<00:02, 1.83it/s]�[32m [repeated 20x across cluster]�[0m
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 91%|█████████▏| 32/35 [00:17<00:01, 1.87it/s]
�[36m(WorkerDict pid=4129551)�[0m
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.05s/it]
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.70it/s]
�[36m(WorkerDict pid=4129551)�[0m /seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=4129551)�[0m warnings.warn(
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 89%|████████▊ | 31/35 [00:16<00:02, 1.85it/s]
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 97%|█████████▋| 34/35 [00:18<00:00, 1.90it/s]�[32m [repeated 5x across cluster]�[0m
�[36m(main_task pid=4129158)�[0m Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4130101, ip=172.22.4.103, actor_id=ee12783c35d9f6b46fc7b79601000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fc757258e00>)
Traceback (most recent call last):
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/main.py", line 183, in
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
�[36m(main_task pid=4129158)�[0m return getattr(self.worker_dict[key], name)(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
main()
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/main.py", line 45, in main
�[36m(main_task pid=4129158)�[0m return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
�[36m(main_task pid=4129158)�[0m with self.rollout_sharding_manager:
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.get(main_task.remote(ppo_config))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
�[36m(main_task pid=4129158)�[0m load_dtensor_weights(
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
return fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m weight_loader(actor_weights, vllm_model)
^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/private/client_mode_hook.py", line 104, in wrapper
�[36m(main_task pid=4129158)�[0m weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
�[36m(main_task pid=4129158)�[0m param.data.copy
(loaded_weight)
return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m return disable_fn(*args, **kwargs)
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
�[36m(main_task pid=4129158)�[0m return fn(*args, **kwargs)
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/ray/_private/worker.py", line 968, in get_objects
�[36m(main_task pid=4129158)�[0m return DTensor._op_dispatcher.dispatch(
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/dispatch.py", line 167, in dispatch
�[36m(main_task pid=4129158)�[0m op_info = self.unwrap_to_op_info(op_call, args, kwargs)
raise value.as_instanceof_cause()
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): �[36mray::main_task()�[39m (pid=4129158, ip=172.22.4.103)
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/main.py", line 96, in main_task
trainer.fit()
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/trainer/ray_trainer.py", line 779, in fit
gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 46, in func
output = ray.get(output)
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4129551, ip=172.22.4.103, actor_id=94e8fdbcef4df9cf94ab1dfa01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7ef371281190>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
with self.rollout_sharding_manager:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
load_dtensor_weights(
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
weight_loader(actor_weights, vllm_model)
File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
param.data.copy
(loaded_weight)
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
return DTensor._op_dispatcher.dispatch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
op_info = self.unwrap_to_op_info(op_call, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
raise RuntimeError(
RuntimeError: aten.copy
.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
�[36m(main_task pid=4129158)�[0m self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
�[36m(main_task pid=4129158)�[0m raise RuntimeError(
�[36m(main_task pid=4129158)�[0m RuntimeError: aten.copy
.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(main_task pid=4129158)�[0m Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4130100, ip=172.22.4.103, actor_id=60ce5139a15e106b4341d87601000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f5db100ce60>)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
�[36m(main_task pid=4129158)�[0m return getattr(self.worker_dict[key], name)(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
�[36m(main_task pid=4129158)�[0m return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
�[36m(main_task pid=4129158)�[0m with self.rollout_sharding_manager:
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
�[36m(main_task pid=4129158)�[0m load_dtensor_weights(
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
�[36m(main_task pid=4129158)�[0m weight_loader(actor_weights, vllm_model)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
�[36m(main_task pid=4129158)�[0m weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
�[36m(main_task pid=4129158)�[0m param.data.copy
(loaded_weight)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
�[36m(main_task pid=4129158)�[0m return disable_fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
�[36m(main_task pid=4129158)�[0m return fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
�[36m(main_task pid=4129158)�[0m return DTensor._op_dispatcher.dispatch(
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
�[36m(main_task pid=4129158)�[0m op_info = self.unwrap_to_op_info(op_call, args, kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
�[36m(main_task pid=4129158)�[0m self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
�[36m(main_task pid=4129158)�[0m raise RuntimeError(
�[36m(main_task pid=4129158)�[0m RuntimeError: aten.copy
.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(main_task pid=4129158)�[0m Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): �[36mray::WorkerDict.actor_rollout_generate_sequences()�[39m (pid=4130099, ip=172.22.4.103, actor_id=94a30e1935c25512dd98571301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f6c20210f50>)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/ray/base.py", line 404, in func
�[36m(main_task pid=4129158)�[0m return getattr(self.worker_dict[key], name)(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/single_controller/base/decorator.py", line 209, in inner
�[36m(main_task pid=4129158)�[0m return func(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/fsdp_workers.py", line 417, in generate_sequences
�[36m(main_task pid=4129158)�[0m with self.rollout_sharding_manager:
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/sharding_manager/fsdp_vllm.py", line 64, in enter
�[36m(main_task pid=4129158)�[0m load_dtensor_weights(
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 390, in load_dtensor_weights
�[36m(main_task pid=4129158)�[0m weight_loader(actor_weights, vllm_model)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/seg-zero/Seg-Zero-main_20250905150799/Seg-Zero-main/verl/workers/rollout/vllm_rollout/dtensor_weight_loaders.py", line 255, in qwen2vl_dtensor_weight_loader
�[36m(main_task pid=4129158)�[0m weight_loader(vllm_param, local_actor_weight.to(dtype=vllm_param.dtype))
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 587, in default_weight_loader
�[36m(main_task pid=4129158)�[0m param.data.copy
(loaded_weight)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_compile.py", line 32, in inner
�[36m(main_task pid=4129158)�[0m return disable_fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
�[36m(main_task pid=4129158)�[0m return fn(*args, **kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 346, in torch_dispatch
�[36m(main_task pid=4129158)�[0m return DTensor._op_dispatcher.dispatch(
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
�[36m(main_task pid=4129158)�[0m op_info = self.unwrap_to_op_info(op_call, args, kwargs)
�[36m(main_task pid=4129158)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 372, in unwrap_to_op_info
�[36m(main_task pid=4129158)�[0m self._try_replicate_spec_for_scalar_tensor(op_call, arg, mesh)
�[36m(main_task pid=4129158)�[0m File "/seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 473, in try_replicate_spec_for_scalar_tensor
�[36m(main_task pid=4129158)�[0m raise RuntimeError(
�[36m(main_task pid=4129158)�[0m RuntimeError: aten.copy
.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
�[36m(WorkerDict pid=4130100)�[0m
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.04s/it]
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:20<00:00, 1.70it/s]
�[36m(WorkerDict pid=4130099)�[0m /seu_share/home/220242341/miniconda3/envs/visionreasoner/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .�[32m [repeated 3x across cluster]�[0m
�[36m(WorkerDict pid=4130099)�[0m warnings.warn(�[32m [repeated 3x across cluster]�[0m

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions