-
Notifications
You must be signed in to change notification settings - Fork 402
Description
Hi author, thanks for your great code. When I run your training script with multi-gpu setting, an error happened. Can you check your code again? Here is my training script: uv run accelerate launch --num_processes 2 scripts/train.py configs/ltx2_av_lora_low_vram.yaml
Training 0/2000 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Starting... 0:03:59 ETA: --:--[rank1]: Traceback (most recent call last):
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-trainer/scripts/train.py", line 64, in
[rank1]: app()
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/typer/main.py", line 336, in call
[rank1]: raise e
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/typer/main.py", line 319, in call
[rank1]: return get_command(self)(*args, **kwargs)
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/click/core.py", line 1485, in call
[rank1]: return self.main(*args, **kwargs)
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/typer/core.py", line 719, in main
[rank1]: return _main(
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/typer/core.py", line 189, in _main
[rank1]: rv = self.invoke(ctx)
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/click/core.py", line 1269, in invoke
[rank1]: return ctx.invoke(self.callback, **ctx.params)
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/click/core.py", line 824, in invoke
[rank1]: return callback(*args, **kwargs)
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/typer/main.py", line 706, in wrapper
[rank1]: return callback(**use_params)
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-trainer/scripts/train.py", line 60, in main
[rank1]: trainer.train(disable_progress_bars=disable_progress_bars)
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-trainer/src/ltx_trainer/trainer.py", line 169, in train
[rank1]: loss = self._training_step(batch)
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-trainer/src/ltx_trainer/trainer.py", line 312, in _training_step
[rank1]: video_embeds, audio_embeds, attention_mask = self._text_encoder._run_connectors(
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-core/src/ltx_core/text_encoders/gemma/encoders/av_encoder.py", line 53, in _run_connectors
[rank1]: encoded, encoded_connector_attention_mask = self.embeddings_connector(
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/vsw/Desktop/LTX-2/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-core/src/ltx_core/text_encoders/gemma/embeddings_connector.py", line 173, in forward
[rank1]: hidden_states, attention_mask = self._replace_padded_with_learnable_registers(hidden_states, attention_mask)
[rank1]: File "/home/vsw/Desktop/LTX-2/packages/ltx-core/src/ltx_core/text_encoders/gemma/embeddings_connector.py", line 148, in _replace_padded_with_learnable_registers
[rank1]: hidden_states = flipped_mask * adjusted_hidden_states + (1 - flipped_mask) * learnable_registers
[rank1]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Training 0/2000 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Starting... 0:04:16 ETA: --:--