Skip to content

Volumina be lazy!#339

Merged
k-dominik merged 8 commits intoilastik:mainfrom
k-dominik:volumina-be-lazy
Oct 16, 2025
Merged

Volumina be lazy!#339
k-dominik merged 8 commits intoilastik:mainfrom
k-dominik:volumina-be-lazy

Conversation

@k-dominik
Copy link
Copy Markdown
Contributor

@k-dominik k-dominik commented Oct 11, 2025

ilastik faces performance problems with bigger data, especially when zooming out. Background is the nature of how volumina requests data from ilastik: Whatever is visible, will be requested. While this is super cool when looking at small parts of images, this lazyness is too eager for zoomed out views on larger data. As a result the user would experience unresponsive ui for a long time with often tens of thousands of requests being submitted simultaneously. This is a draw on the RAM as intermediate results of waiting requests must be stored there.

This PR adds the LazyflowRequestBuffer class which closely interacts with TileProvider. TileProvider will not directly submit it's requests for tiles to the requestpool, but go via the lazyflow request buffer. The buffer takes care of submitting the requests up to a maximum number. This greatly reduces the number of simultaneous running requests.

This PR furthermore addresses "cancellation". Running tasks are never cancelled, but with the queue inside the request buffer, it is possible to remove unsubmitted requests and dramatically speed up ilastik in interactive mode during scrolling through z, time, or scrolling in plane.

As a result the UI stays more or less interactive all the time. Scrolling/zooming is not an issue anymore. In my benchmark, the memory consumption was reduced to half.

Benchmark results (currently my machine only, but in earlier versions I could see similar results on win/linux)

Step duration main duration PR speedup
zoom out 0.4 0.2 2
live update 12.9 13.5 0.95
scroll-live-40 42.3 16.7 2.5
scroll-live+80 117.0 17.1 6.8

Benchmark on linux (alienware). Could not scroll-live+80 as it crashed (out of memory) on main, so I reduced to +60...

Step duration main duration PR speedup
zoom out 0.7 0.4 1.75
live update 47.7 43.7 1.09
scroll-live-40 351.0 41.1 8.5
scroll-live+60 169.8 45.6 3.7

don't really have the time to do the tests also on a windows machine. But there is nothing that could lead me to expect different behavior.

note, the more scrolling is done, the greater the speedup because of omitted computations.

Demo: scrolling through cremi with live update on (wouldn't recommend to try this on main - goes out of ram on my mashine, note this also uses #338).

requestbuffer.mp4

Bottom line: this does not speed up any individual computation, but by being smarter about what to compute, it results in a much more responsive experience and overall reduction in computations.

Todos:

  • benchmark other machines
  • make benchmark project/script available (made available internally)

Edit: Updated with benchmark results on linux.

* make volumina request tiles more lazy by first submitting to a queue
* tasks are submitted up to a limit to the lazyflow request pool
* mechanism for cancelling/removing unsubmitted requests from the queue
* As compared to before, ilastik will use way less memory (maybe half)

This enables way better interactive usage of ilastik as number of requests
that are submitted stays limited. Changing conditions (scrolling, dirtyness)
trigger cancellation of queued requests that are not relevant anymore.

Fixes ilastik#135
Fixes ilastik/ilastik#1735
Fixes ilastik/ilastik#1376, at least to a degree
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 11, 2025

Codecov Report

❌ Patch coverage is 84.02367% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.90%. Comparing base (fe15d9e) to head (ff73255).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
volumina/tiling/tileprovider.py 77.77% 8 Missing ⚠️
volumina/utility/prioritizedThreadPool.py 33.33% 8 Missing ⚠️
volumina/volumeEditorWidget.py 0.00% 8 Missing ⚠️
volumina/imageScene2D.py 66.66% 2 Missing ⚠️
volumina/pixelpipeline/datasources/cachesource.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #339      +/-   ##
==========================================
+ Coverage   36.33%   36.90%   +0.57%     
==========================================
  Files         108      109       +1     
  Lines       11439    11565     +126     
==========================================
+ Hits         4156     4268     +112     
- Misses       7283     7297      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@emilmelnikov emilmelnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is a neat solution to a long-standing request contention/cancellation problem!

Comment thread volumina/utility/lazyflowRequestBuffer.py Outdated
Comment thread volumina/tiling/tileprovider.py Outdated
@k-dominik k-dominik marked this pull request as ready for review October 13, 2025 06:59
k-dominik and others added 5 commits October 13, 2025 20:14
* interface for tile submission slightly changed, getTiles takes second
  argument for viewport Rect
* test_lazy: not really a fix, but stricter testing that helped debug some
  issues with cancellation
* note to `test_tiling.py`: Seems to be necessary to clean up the threadpool,
  otherwise the test fails

FIXUP somewhere, skips tests for lazyflowrequestbuffer if theres no lazyflow, patch rendererpool, probably sequence of test now different so it used to work coincidentally
time.time() is not guaranteed to be unique.
almost everything (except submit) was already operating under a lock, so the
locking overhead of PriorityQueue can be omitted.

Co-Authored-By: Emil Melnikov <emilmelnikov@users.noreply.github.com>
@btbest
Copy link
Copy Markdown
Contributor

btbest commented Oct 15, 2025

It's bonkers how much nicer this feels.

One unfortunate downside is that rendering requests for labels are now also subject to lazyflow queuing, which I think they were pretty much independent of before (?). Now newly drawn labels can take a while to show up, and in my tryout they usually showed up after the updated predictions (even within the same tile, visible in the last yellow brush stroke in the video). Feels a bit unintuitive, but I'd be happy to accept it for the speedup. I guess it might even be useful as feedback for the user that maybe they should consider not drawing more labels 😁

labels-after-predictions.mp4

@k-dominik
Copy link
Copy Markdown
Contributor Author

It's bonkers how much nicer this feels.

One unfortunate downside is that rendering requests for labels are now also subject to lazyflow queuing, which I think they were pretty much independent of before (?). Now newly drawn labels can take a while to show up, and in my tryout they usually showed up after the updated predictions (even within the same tile, visible in the last yellow brush stroke in the video). Feels a bit unintuitive, but I'd be happy to accept it for the speedup. I guess it might even be useful as feedback for the user that maybe they should consider not drawing more labels 😁
labels-after-predictions.mp4

For this there is #338 - with this we can prioritize e.g. highest for raw, then labels, then rest.

Copy link
Copy Markdown
Contributor

@btbest btbest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 let's gooooo

Comment thread tests/lazyflowRequestBuffer_test.py Outdated
Comment on lines +210 to +211
request_buffer = buffer = LazyflowRequestBuffer(1)
finished_evt = install_finish_even(buffer, "decr", n_calls=1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
request_buffer = buffer = LazyflowRequestBuffer(1)
finished_evt = install_finish_even(buffer, "decr", n_calls=1)
request_buffer = LazyflowRequestBuffer(1)
finished_evt = install_finish_even(request_buffer, "decr", n_calls=1)

not sure what this is for

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cooooooopy paste :)

Comment thread tests/lazyflowRequestBuffer_test.py Outdated
Comment on lines +233 to +234
request_buffer = buffer = LazyflowRequestBuffer(1)
finished_evt = install_finish_even(buffer, "decr", n_calls=2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
request_buffer = buffer = LazyflowRequestBuffer(1)
finished_evt = install_finish_even(buffer, "decr", n_calls=2)
request_buffer = LazyflowRequestBuffer(1)
finished_evt = install_finish_even(request_buffer, "decr", n_calls=2)

# submit with higher priority for good measure
request_buffer.submit(
waiting_func2,
priority=(-120,),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
priority=(-120,),
priority=(-120,), # higher prio than context.priority (-100)

if I understand correctly..?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah well, before I called it default_context, I had those comments, too. But figured it would be much clearer by establishing a set of default values, that are used in all the tests, except for the one that should be varied. But I guess this was not obvious...

default_context.priority,
viewport_ref=default_context.view_port,
stack_id=default_context.stack_id,
tile_no=1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tile_no=1,
tile_no=1, # not context.tile_no (which is 0)

Copy link
Copy Markdown
Contributor

@btbest btbest Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bunch of renaming in this file for my reading, feel free to take or discard any of these:

  • default_context --> context
  • finished_evt --> buffer_finished_evt
  • default_waiting_func --> buffer_pre_running_func
  • test_requests_are_cancelled_bc_keep_tiles --> test_clear_requests_for_other_tile
  • test_requests_are_cancelled_bc_keep_different_stack --> test_clear_requests_for_other_stack
  • test_requests_other_vp_not_cancelled --> test_clear_keeps_requests_for_other_vp
  • test_requests_are_cancelled_bc_priority --> test_clear_duplicates_of_keep_tiles (I couldn't see the role priority plays anywhere...)

Comment thread volumina/tiling/tileprovider.py Outdated
if USE_LAZYFLOW_THREADPOOL:
if USE_LAZYFLOW_THREADPOOL:
from volumina.utility.lazyflowRequestBuffer import LazyflowRequestBuffer
from lazyflow.request import Request
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from lazyflow.request import Request

duplicate of import above

Comment thread volumina/tiling/tileprovider.py Outdated

renderer_pool = LazyflowRequestBuffer(Request.global_thread_pool.num_workers)

def clear_threadpool_vp(vp: "TileProvider", stack_id: StackId, keep_tiles: list[int]):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel like doing some refactoring, the renderer instantiations and two functions here look like they want to be methods of the respective threadpool, with a common interface abc...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the context of this PR it's fine to keep the actual threadpools hidden. But will consider this refactor for #340, as there this would be definitely motivated.

self._cleared_tasks += 1
self._queue = []

def clear_vp_res(self, viewport: "TileProvider", stack_id: StackId, keep_tiles: list[int]):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a "res"?

Maybe "clear_queue_outside"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly don't remember. maybe I'll go full verbose clear_non_submitted_tiles_outside_field_of_view :)

Comment on lines +175 to +176
# task_o = tmp_queue.get((stack_id, task.tile_no))
# if task_o and task_o < task:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# task_o = tmp_queue.get((stack_id, task.tile_no))
# if task_o and task_o < task:

Comment on lines +182 to +192
# Remove task outside current 2d slice from the queue
if task.vp == viewport and task.stack_id != stack_id:
task.cancel()
self._cleared_tasks += 1
continue

# Remove task in the current slice, but no longer visible (not in keep_tiles)
if task.vp == viewport and task.stack_id == stack_id and task.tile_no not in keep_tiles:
task.cancel()
self._cleared_tasks += 1
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Remove task outside current 2d slice from the queue
if task.vp == viewport and task.stack_id != stack_id:
task.cancel()
self._cleared_tasks += 1
continue
# Remove task in the current slice, but no longer visible (not in keep_tiles)
if task.vp == viewport and task.stack_id == stack_id and task.tile_no not in keep_tiles:
task.cancel()
self._cleared_tasks += 1
continue
# Remove
# - tasks outside current 2d slice from the queue
# - tasks in the current slice, but no longer visible (not in keep_tiles)
if task.stack_id != stack_id or task.tile_no not in keep_tiles:
task.cancel()
self._cleared_tasks += 1
continue

Comment thread volumina/tiling/tileprovider.py Outdated
Summary:
* remove of superfluous buffer variable in tests
* add more comments to test that deviate from the default_context
* remove superfluous import of Requests
* rename clear_threadpool_vp ->
  clear_non_submitted_tiles_outside_field_of_view
* simplify boolean logic in cancel function
* rename finished_evt -> buffer_finished_evt
* get rid of the only warning in volumina tests?!

Co-authored-by: Benedikt Best <63287233+btbest@users.noreply.github.com>
@k-dominik k-dominik merged commit 637cf63 into ilastik:main Oct 16, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants