Skip to content

Comments

Feature/performance#18

Open
davidetorre99 wants to merge 6 commits intoteamtomo:mainfrom
davidetorre99:feature/performance
Open

Feature/performance#18
davidetorre99 wants to merge 6 commits intoteamtomo:mainfrom
davidetorre99:feature/performance

Conversation

@davidetorre99
Copy link
Contributor

@McHaillet @alisterburt with this commit I tried to address #12 #13 #15 #16. Sorry if it comes as a single commit, but I thought that they were largely interconnected. A short summary:

RE: #12 get DFT patches from torch-subpixel-crop:
Added support for Fourier space patch extraction using return_rfft=True in subpixel_crop_2d. Patches are now extracted directly in Fourier space and passed to insert_central_slices_rfft_3d_multichannel for reconstruction. That is the same as before, with backproject_2d_to_3d from torch-fourier-slice, but skipping one FFT step. I like the clarity here, but maybe it's too redundant. Do you think it would be better to directly add the option to backproject_2d_to_3d, or do you have a more streamlined way of reconstruction in mind?

RE: #13 batched reconstruction:
All patches at all positions are now reconstructed simultaneously using the multichannel from insert_central_slices_rfft_3d_multichannel. Added reconstruct_patches_batched() method that should processes everything in parallel. For tiling I swapped the for loop with einops.rearrange.

RE #15 class methods to load etomo and aretomo data directly:
Added Tomogram.from_etomo_directory() and Tomogram.from_aretomo_aln() class methods. I like the idea of only providing the path and then having everything handled within torch-tomogram (I updated the examples for both etomo and aretomo reconstruction to show what it would look like). However, perhaps providing the dataframe as input might offer more flexibility?

RE #16 define translations in Angstroms:
Added optional pixel_spacing parameter. When provided, sample_translations are stored in Angstroms and converted to pixels internally via sample_translations_px. Not sure if that was the way you were envisioning this, though

I tested it on my data and it all works fine.
I'd love to know what you guys think!

Copy link

@alisterburt alisterburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work here as always @davidetorre99 - will follow up with another comment shortly

corrected_shifts_yx = corrected_shifts_yx * pixel_spacing

tilt_stack_full = mrcfile.read(tilt_stack_path)
included_indices = df["sec"].values - 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this makes sense, maybe a comment would be useful

alnfile docs say sec corresponds to zero indexed indices into the tilt stack, this - 1 implies the values are one indexed?

also should be made clear that stack is expected to be aretomo output stack (avoid users supplying aretomo input stack and this code implicitly accounting for dark frames which haven't been removed from that stack)

Suggested change
included_indices = df["sec"].values - 1
idx_valid = df["sec"].values - 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, my mistake. I incorrectly wrote '0 indexed' in the alnfile readme, but it is actually '1 indexed'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no big deal, glad we caught it in review! can hopefully avoid someone getting confused in the future 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like AreTomo3 switched the SEC indexing from 0-based to 1-based starting in v2.0.10. This isn’t documented in their manual yet, but I noticed this inconsistency by testing alnfile on different versions (I think this is the commit when they introduced ToOneBased: czimaginginstitute/AreTomo3@048b161)

I’ll update alnfile so it automatically handles both conventions internally, while keeping the SEC output normalized to 0-based. For most downstream applications, 0-based indexing seems to be the more practical and consistent choice, but if you have a preference for keeping it 1-based for some workflows, I’m happy to discuss it!

Comment on lines 72 to 77
tilt_stack_full = mrcfile.read(tilt_stack_path)
included_indices = df["sec"].values - 1
tilt_stack = tilt_stack_full[included_indices]
tilt_stack = tilt_stack.astype(np.float32)
tilt_stack -= np.mean(tilt_stack, axis=(-2, -1), keepdims=True)
tilt_stack /= np.std(tilt_stack, axis=(-2, -1), keepdims=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoding normalization logic here feels icky...

also I'm used to normalizing on a central 25% crop these days to avoid most cases of hole edges coming into view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, not really sure where to include it though. Perhaps a separate function would make it a bit cleaner?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spitballing: maybe the class doesn't hold a reference to the stack and instead any methods which require the image data take the image as input? This could be nice for serialization too... what do you think?

@alisterburt
Copy link

alisterburt commented Nov 13, 2025

RE: #12 get DFT patches from torch-subpixel-crop: Added support for Fourier space patch extraction using return_rfft=True in subpixel_crop_2d. Patches are now extracted directly in Fourier space and passed to insert_central_slices_rfft_3d_multichannel for reconstruction. That is the same as before, with backproject_2d_to_3d from torch-fourier-slice, but skipping one FFT step. I like the clarity here, but maybe it's too redundant. Do you think it would be better to directly add the option to backproject_2d_to_3d, or do you have a more streamlined way of reconstruction in mind?

great q, love that you're thinking about this!
I think dropping to the lower level here for performance is the right move

RE: #13 batched reconstruction: All patches at all positions are now reconstructed simultaneously using the multichannel from insert_central_slices_rfft_3d_multichannel. Added reconstruct_patches_batched() method that should processes everything in parallel. For tiling I swapped the for loop with einops.rearrange.

batching is a really great addition here! I hesitated initially because this locks all patches into reconstructing with identical rotations but... nobody has implemented spatially varying tilt angles so let's not get ahead of ourselves 😂 this is great.

During tomogram reconstruction I imagine this blows up memory usage fairly quickly if tomograms are large and you reconstruct everything at once... might be worth taking a batch size parameter there for when people inevitably run into problems

RE #15 class methods to load etomo and aretomo data directly: Added Tomogram.from_etomo_directory() and Tomogram.from_aretomo_aln() class methods. I like the idea of only providing the path and then having everything handled within torch-tomogram (I updated the examples for both etomo and aretomo reconstruction to show what it would look like). However, perhaps providing the dataframe as input might offer more flexibility?

I'm also not sure what's best here and again, love that you're thinking critically about these things. The purist in me says this package should operate on in memory data only, the pragmatist in me says that the use case for quickly enabling people with AT/Etomo alignments to get up and running is a clear win

RE #16 define translations in Angstroms: Added optional pixel_spacing parameter. When provided, sample_translations are stored in Angstroms and converted to pixels internally via sample_translations_px. Not sure if that was the way you were envisioning this, though

Wonderful, suggested making it a required input to avoid later issues with CTF accuracy 🙂

I tested it on my data and it all works fine. I'd love to know what you guys think!

I think this is incredible, really exciting to have an actual easy to use torch interface to reconstructing tomograms/local subvolumes

@alisterburt
Copy link

@davidetorre99 thinking about this, now that you've got pixel spacing we should probably

  1. make sure positions for local reconstruction are also defined in angstroms relative to the tomogram center
  2. make sure we can do local reconstructions at arbitrary pixel sizes

With 2 we should be able to use torch-fourier-rescale, two notes there:

  • might be a good chance to properly verify and merge the unmerged API change in add ability to specify target shape in rescale API torch-fourier-rescale#19
  • errors in pixel sizes can get large when rescaling in fourier space with small box sizes... I'd look at how meaningful those errors are at typical pixel sizes (stack at 0.7-4apx, recon at 0.7-20apx) and consider working in a larger box for initial rescaling if significant

no pressure to add this now/as part of this transition but if you're working in the codebase these are things I'd be thinking about 🙂

… size for reconstruction, normalize on central 25%, normalization on the methods, update tests, update examples
@davidetorre99
Copy link
Contributor Author

@alisterburt just want to make sure we're on the same page about the current implementation before splitting it into torch-tilt-series and torch-reconstruct-tomogram.
In this last commit I should have addressed all your previous comments, but I'm still not sure about a couple of things:

  • Normalization is now in the class methods from_aretomo_output() and from_etomo_directory() (normalizing on central 25% crop to avoid edge artifacts). Maybe still feels hardcoded?

  • batch_size parameter for large reconstruction: I tried it on (4096, 4096, 2560) volumes, batch size 1 and it took 3.62 GB GPU memory. It works ok, but I'm not entirely sure if this is the "most optimal" implementation.

In the next steps, I would split it into:

  • torch-tilt-series (data structure + tilt-series operations)
    --TiltSeries class
    --Data storage: images, tilt angles, tilt axis angles, translations, pixel spacing
    --import alignment methods: from_aretomo_output(), from_etomo_directory()
    --geometric operations: projection_matrices property, project_points(), extract_particle_tilt_series()
    --utility methods: to(device), coordinate transformations

  • torch-reconstruct-tomogram
    --reconstruct_subvolume(tilt_series, points_zyx, sidelength) - rank-polymorphic reconstruction
    --reconstruct_tomogram(tilt_series, volume_shape, sidelength, batch_size=None) - full volume reconstruction

I'd love to know what you think!

@McHaillet
Copy link
Contributor

@davidetorre99 Really awesome work here!

In the next steps, I would split it into:

* torch-tilt-series (data structure + tilt-series operations)
  --TiltSeries class
  --Data storage: images, tilt angles, tilt axis angles, translations, pixel spacing
  --import alignment methods: from_aretomo_output(), from_etomo_directory()
  --geometric operations: projection_matrices property, project_points(), extract_particle_tilt_series()
  --utility methods: to(device), coordinate transformations

* torch-reconstruct-tomogram
  --reconstruct_subvolume(tilt_series, points_zyx, sidelength) - rank-polymorphic reconstruction
  --reconstruct_tomogram(tilt_series, volume_shape, sidelength, batch_size=None) - full volume reconstruction

This sounds perfect. Can I help you out with one of these things?

@davidetorre99
Copy link
Contributor Author

Hey @McHaillet! Thanks a lot for the feedback! At this point it seems mainly about refactoring as specified in the previous comment, so I’ll go ahead and continue with that. If you can review the result afterward to ensure I’ve interpreted everything correctly and everything still makes sense (especially in the structure), that would be really really helpful! Thanks a lot!

@McHaillet
Copy link
Contributor

Of course, I'll review afterwards! Looking forward to see the updates :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants