feat/on-the-fly inference by YoniSchirris · Pull Request #87 · NKI-AI/ahcore

YoniSchirris · 2024-06-06T08:56:49Z

Fixes #73.

The commit contains some minor comments that need quick fixing.

This PR implements generating an in-memory database on-the-fly.

This is a useful feature if you want to, e.g., run inference using a segmentation model on a set of slides that you do not with to generate a complete database for.

That is exactly the use-case that it is designed for; running inference of a segmentation model on a glob of slides from a directory, taking only the slide as input (no masks, annotations, labels, patient information).

To achieve this, I have

Implemented an OnTheFlyDataDescription class, which contains fewer arguments than DataDescription
The most important difference in OnTheFlyDataDescription is that the data_dir is used together with the glob_pattern. This searches for WSIs, populates the in-memory DB with this on-the-fly, which is used by the inference pipeline.
To achieve this, the DataManager gets an _on_the_fly property that is set by checking the data_description class.
During initialization, it populates the DB and sets an engine, in contrast to the saved DB where a uri from the data_description is used to load the DB
In the session, it opens the db either from an engine or a uri depending on the use-case
The on-the-fly population utilizes the DataManager's get_all_images function, which is only implemented for a MinimalImage table
The MinimalImage table is the sole, and very minimal, part of the on-the-fly DB
Creating the dataset from the data_description now uses a wrapper function create_datasets_from_data_description, which, based on the (OnTheFly)DataDescription class, generates a dataset with datasets_from_data_description_with_uri, which assumes a fully populated DB, and datasets_from_on_the_fly_data_description, which assumes a minimally populated DB with only the Minimalimage table
Three small test svs files are added. These come from openslide, but I could not find them in the installed openslide-python directories and decided to add them explicitly here, which is likely easier and may be used for any test
populate_minimal_db_for_inference.py provides a very simple test for DB population
tests/test_run_segmentation_inference_with_on_the_fly_in_memory_database.sh is a very detailed example / documentation on how to run the inference, including configs and required env variables.

Possible limitations

Can't give a mask yet, while you may want to run segmentation on only a part of the image, e.g. when you have a tumor bed mask
I didn't know exactly which config variables to keep/remove. LMK if the choice makes sense
Some naming can be improved
May want to refactor DataManager to use an engine for both DataDescriptions to open a session from the same input which may be easier to read and possibly add new features/refactors to the DataManager. In one case it creates an engine by creating a db and populating it. In the other case it creates an engine by reading it from the uri. And the session just opens the sessoin from the engine instead of first creating an engine from the uri and then returning the session to which the engine is bound
Current DLUP version has a bug concerning the pyramidal format of the generated segmentation map

YoniSchirris · 2024-06-07T12:00:26Z

in practice i've noticed that populating the db can take 10 minutes for 1500 wsis, which happens when initializing the datamodule, which initializes the datamanger, which immediately populates teh DB

the biggest problem here was opening each slide with dlup to extract the mpp, width, and height, which is completely irrelevant for our task here. the image.mpp was only used in the overwrite_mpp when constructing a dataset, which is also not interesting because if the image has no mpp, there's nothing to overwrite it with.

for now i've made the minimal image even more minimal; it only contains the fp to the slide.

we may also want to think about when to populate the db, which may be a bad design choice to do during datamodule initialization.

Honestly, we can even forego the entire DB generation, and just within datasets_from_on_the_fly_data_description do for image in data_description.image_dir.glob(data_description.glob_pattern): and do the rest.

no database models, no engine, no session required.

If we do awnt to keep the database, because it might add some more functionality later (e.g. when doing feature extraction w/ a mask?) we may want to open it, populate it, and close it, all during the call of datasets_from_on_the_fly_data_description,

YoniSchirris · 2024-06-12T10:15:20Z

ahcore/callbacks/abstract_writer_callback.py

-            assert current_dataset.slide_image.identifier
-            self._dataset_sizes[current_dataset.slide_image.identifier] = len(current_dataset)
+            curr_filename = current_dataset._path
+            assert curr_filename


The assertion is redundant though, since a tiledwsidataset always has a path.

YoniSchirris added 7 commits June 5, 2024 16:38

feat: add on-the-fly inference

4a524bc

add tox and pre-commit to dev install

9361c7d

re-add type ignore since this does not solve mypy issue

7e7e149

reduce code redundancy for table creation

e9b0c0e

fix docstring type in get_populated_in_memory_db

8d5340b

fix circular import error

b1de6da

refactor for circular imports

e68b5b2

YoniSchirris mentioned this pull request Jun 7, 2024

improvement: reduce setup time of AbstractWriterCallback #88

Open

reduce setup time of abstractwritercallback

b1f747e

YoniSchirris commented Jun 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/on-the-fly inference#87

feat/on-the-fly inference#87
YoniSchirris wants to merge 8 commits intomainfrom
on-the-fly-inference

YoniSchirris commented Jun 6, 2024 •

edited

Loading

Uh oh!

YoniSchirris commented Jun 7, 2024

Uh oh!

YoniSchirris Jun 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YoniSchirris commented Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YoniSchirris commented Jun 7, 2024

Uh oh!

YoniSchirris Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YoniSchirris commented Jun 6, 2024 •

edited

Loading