improvement: reduce setup time of AbstractWriterCallback

**Describe the bug**
When running inference, `AbstractWriterCallback` loops over all datasets to construct the `_dataset_size` dict. This opens a slide from cache several times, which can take 1-3 seconds. For a dataset of 1500 wsis this often takes 20 minutes.

**To Reproduce**
Run inference on-the-fly (#87) with your `data_dir` and `glob_pattern` set up to find many whole-slide images.

**Expected behavior**
You'll find that after printing the dataset statistics, it takes a long time to start setting up callback workers.

In my case
```
[2024-06-07 12:24:32,332][ahcore.data.dataset.DlupDataModule][INFO] - Dataset for stage predict has 773079 samples and the following statistics:
 - Mean: 485.30
 - Std: 145.56
 - Min: 48.00
 - Max: 1056.00
[2024-06-07 12:29:30,294][ahcore.callbacks.converters.common][INFO] - Starting worker for TiffConverterCallback
```

**Environment**
dlup version: 0.3.38
How installed: unsure
Python version: 3.11.9
Operating System: linux


Quick solution to reduce time by half; 
in https://github.com/NKI-AI/ahcore/blob/93274e5ed0859813011b81979367189a0b80a932/ahcore/callbacks/abstract_writer_callback.py#L181 change 
```
assert current_dataset.slide_image.identifier
self._dataset_sizes[current_dataset.slide_image.identifier] = len(current_dataset)
```

to

```
current_dataset_slide_id = current_dataset.slide_image.identifier
assert current_dataset_slide_id
self._dataset_sizes[current_dataset_slide_id] = len(current_dataset)
```
which will likely reduce the time by half




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement: reduce setup time of AbstractWriterCallback #88

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

improvement: reduce setup time of AbstractWriterCallback #88

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions