Skip to content

MACSIMA: Parsing subfolders is not intuitive when the machine writes subfolder for each cycle #378

@MSHelm

Description

@MSHelm

Hey everyone,
As reported initially by @nbonine , when preprocessing is performed during acquisition of MACSima data, for each cycle an independent folder is created.
The current reader handles this using the current preprocessed_multiple_folders parsing style. This results in separate Image, Table and coordinate system elements for each cycle. The result of this is not what a user typically expects, because all these cycles belong together and typically are analyzed together. Currently there is no straightforward way for the user to specifiy this.
For example:

my_data
- 3_Scan2
--- some_images.tif
- 6_Cycle1
--- some_more_images.tif
- 7_Cycle2
--- even_more_images.tif

is parsed into:

SpatialData object
├── Images
│     ├── '3_Scan2_image': DataTree[cyx] (4, 15275, 27678), (4, 7637, 13839), (4, 3818, 6919), (4, 1909, 3459), (4, 954, 1729)
│     ├── '6_Cycle1_image': DataTree[cyx] (4, 15275, 27678), (4, 7637, 13839), (4, 3818, 6919), (4, 1909, 3459), (4, 954, 1729)
│     └── '7_Cycle2_image': DataTree[cyx] (4, 15275, 27678), (4, 7637, 13839), (4, 3818, 6919), (4, 1909, 3459), (4, 954, 1729)
└── Tables
      ├── '3_Scan2_table': AnnData (0, 4)
      ├── '6_Cycle1_table': AnnData (0, 4)
      └── '7_Cycle2_table': AnnData (0, 4)
with coordinate systems:
    ▸ '3_Scan2', with elements:
        3_Scan2_image (Images)
    ▸ '6_Cycle1', with elements:
        6_Cycle1_image (Images)
    ▸ '7_Cycle2', with elements:
        7_Cycle2_image (Images)

I propose the following:

  • Deprecation of the auto discovery of the parsing style to use (the current default!).
  • Instead the preprocessed_single_folder becomes the new default. We change this in such a way that all tifs in the specified path, and all subdirectories are parsed together into 1 Image element. This would handle the regular case (1 folder with all tifs) and the case that happens when preprocessing is run during acquisition (several subfolders, each with tifs of 1 cycle).
  • We keep the preprocessed_multiple_folder option for the case that a user wants to do batch analysis of multiple ROIs. For example a user could have multiple ROIs of a single well, which are saved into separate folders. In these cases it is desired that the images of subfolders are separated, because they describe different image stacks.

I will submit an example implementation of this. But since this touches on the default settings of the reader, and I am not sure what @berombau intended originally with the different parsing styles I would love to have a discussion on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions