Skip to content

Creates a unified interface for task specification for python_api#484

Open
josegcpa wants to merge 12 commits intowasserth:masterfrom
josegcpa:master
Open

Creates a unified interface for task specification for python_api#484
josegcpa wants to merge 12 commits intowasserth:masterfrom
josegcpa:master

Conversation

@josegcpa
Copy link

@josegcpa josegcpa commented Jun 23, 2025

Hey! Big fan of the project here :-)

This PR significantly factorises how tasks are specified and creates a simple interface to add new tasks without adding too much clutter to totalsegmentator.python_api. This is part of a larger effort to support multi-task specification using the CLI as it is quite useful across multiple applications. Let me know if this is OK and if this is desired or not.

I tried running all of the tests but some files appear to be missing (at least for tests/test_end_to_end.py) - let me know if I missed something.

EDIT: I also bumped the version management to use uv as, in my experience, it features smoother version/package versioning

@wasserth
Copy link
Owner

Thanks for this PR. I will have a look at it.

@wasserth
Copy link
Owner

I had a more detailed look at the PR. Thank you! Overally, it adds quite a few lines of code and makes the code a little bit less straight forward by introducing the Task class. I do not see the advantage of this yet. So far it has also been easy to add a new class in the python_api in my opinion. Maybe you can elaborate a bit more what you are missing at the moment. Do you want to be able to run multiple tasks without having to call the python_api multiple times?
This would save a little bit of time since the CT image has only to be loaded once and the fast model to find the crop region also has to run only once. But the cropping, resampling and running of the actual model stays the same. This takes the majority of the time (depends on the model). Therefore, I am not sure if this change is worth the effort. You would also have to be able to specify multiple output files in this case. This would require a lot of refactoring and make the code more verbose at several places.
So far I have been fine with running the python_api several times for several models.

@josegcpa
Copy link
Author

I had a more detailed look at the PR. Thank you! Overally, it adds quite a few lines of code and makes the code a little bit less straight forward by introducing the Task class. I do not see the advantage of this yet. So far it has also been easy to add a new class in the python_api in my opinion. Maybe you can elaborate a bit more what you are missing at the moment. Do you want to be able to run multiple tasks without having to call the python_api multiple times? This would save a little bit of time since the CT image has only to be loaded once and the fast model to find the crop region also has to run only once. But the cropping, resampling and running of the actual model stays the same. This takes the majority of the time (depends on the model). Therefore, I am not sure if this change is worth the effort. You would also have to be able to specify multiple output files in this case. This would require a lot of refactoring and make the code more verbose at several places. So far I have been fine with running the python_api several times for several models.

Hey! It actually reduces the code and repetition, it introduces more code lines because it introduces PEP 8 compliant formatting, I should've been clearer about this.

As for the why:
Right now the main point is to compartmentalise everything such that there are no unnecessary repetitions (i.e. checks for whether tasks support fast/fastest, or whether they are commercial can be checked quickly through lists specified at the beginning of tasks.py. This is similar for task-to-task-ID conversions. Task is simply a straightforward data model which immediately makes clear what requires specification for interoperability with the Python API as this tends to significantly reduce errors due to lack of specification.

Eventually, the point was to i) further specify the weights_path and WEIGHTS_URL directly as part of the Task so that adding a new model is fully self-contained and ii) create a simple YAML-based interface where tasks can be specified as an auxiliary file rather than as part of the code.

The main point of specifying multiple tasks/roi_subsets is to reuse crops as some tasks have fairly overlapping crops and there is no need to perform the same computation 2x. This reduces compute time, it is helpful when being billed for compute.

But I fully understand if this is out of the scope of the project! It is not so much an addition to the functionalities so feel free to disregard.

@wasserth
Copy link
Owner

Thank you very much for these additional explanations! I will have to think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants