Size not match for 'packed_noisy_model_input' and 'latent_image_ids'.

When I run train_tdd_adv.sh, i meet the following error.
`
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Traceback (most recent call last):
  File "/maindata/data/shared/public/songtao.tian/test_code/Target-Driven-Distillation/train/FLUX/train_tdd_adv.py", line 1705, in <module>
    main(args)
  File "/maindata/data/shared/public/songtao.tian/test_code/Target-Driven-Distillation/train/FLUX/train_tdd_adv.py", line 1406, in main
    model_pred = transformer(
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 512, in forward
    encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
    ret = function(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 507, in custom_forward
    return module(*inputs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 180, in forward
    attention_outputs = self.attn(
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 588, in forward
    return self.processor(
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 2318, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
  File "/maindata/data/shared/public/songtao.tian/anaconda3/envs/tdd/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 1208, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
RuntimeError: The size of tensor a (4464) must match the size of tensor b (16320) at non-singleton dimension 2
`
When I print the shape of some matrix in the following code in 【train_tdd_adv.py】. I find  the shape of latent_image_ids does not match the shape of packed_noisy_model_input. Since 【packed_noisy_model_input.shape=torch.Size([4, 3952, 64])】 and 【latent_image_ids.shape=torch.Size([15808, 3])】

`                model_pred = transformer(
                    hidden_states=packed_noisy_model_input,
                    # YiYi notes: divide it by 1000 for now because we scale it by 1000 in the transforme rmodel (we should not keep it but I want to keep the inputs same for the model for testing)
                    timestep=timesteps / 1000,
                    guidance=guidance,
                    pooled_projections=pooled_prompt_embeds,
                    encoder_hidden_states=prompt_embeds,
                    txt_ids=text_ids,
                    img_ids=latent_image_ids,
                    return_dict=False,
                )[0]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size not match for 'packed_noisy_model_input' and 'latent_image_ids'. #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Size not match for 'packed_noisy_model_input' and 'latent_image_ids'. #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions