Skip to content

Conversation

@Affan88888
Copy link

When running the demo with a study that contains only one DICOM/MP4, encode_study crashes with:
RuntimeError: Tensors must have same number of dimensions: got 2 and 1
This PR makes get_views (in the model.py file) always return a 2-D tensor of shape (N, 11), even when N=1, so concatenation with the video features (N, 512) works for any batch size.

get_views previously did:
stack_of_view_encodings = torch.stack([torch.nn.functional.one_hot(out_views,11)]).squeeze().to(self.device)

  • F.one_hot(out_views, 11) → (N, 11)
  • torch.stack([ ... ]) → (1, N, 11)
  • .squeeze() removes all size-1 dims
    When N=1, (1, 1, 11) becomes (11,) (rank-1), which cannot be concatenated with (1, 512) along dim=1.

What’s changed:
Replaced the above with a shape-safe line that preserves the batch dimension and matches dtype/device:
stack_of_view_encodings = torch.nn.functional.one_hot(out_views, num_classes=11).float().to(self.device) # (N, 11)

  • No stack / no squeeze → batch dim is retained for N=1.
  • .float() → matches the feature tensor dtype for torch.cat.
  • .to(self.device) → keeps tensors on the same device.

Expected behavior after this change:
The same code runs without error.
Shapes:

  • features: torch.Size([1, 512])
  • views: torch.Size([1, 11])
  • concatenated: torch.Size([1, 523])

Backwards compatibility:

  • For N>1, return shape remains (N, 11). No changes to callers.
  • Only behavioral change is avoiding accidental rank-1 tensors for N=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant