-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The services module was intended to loosely follow the Dependency Injection pattern, where the services module itself would serve as the container, and providers for services could be registered, so that dependencies of the framework could always be imported neatly, and only once they're used would start using them they would resolve themselves from the registered providers. The code now is kinda messy and I don't think the intended use cases are entirely covered, and the semantics are quite unclear.
- Why does it even exist? Because it wants to make sure the framework/user-code can safely import all its dependencies at import time. There are 2 tricky cases when that was usually impossible to do at import time:
- Optional dependencies brought in my sub-features that not all users would have installed -> resolved mostly by the plugin pattern we have now, for example
nestis no longer an optional dependency of the framework (bsb-core), but is used only within the pluginbsb-nest; 1 case remains which isMPI. - Dependencies that could differ based on user needs, for which a user might want to choose their own provider. For example we parallelize job distribution over
mpipoolbut one might want to usemultiprocessingor something else (although at this point I think theJobPoolrelies on too many implementation details of theMPIExecutorto unravel those); the lock service could also use something else thenMPILock, etc ...
- Optional dependencies brought in my sub-features that not all users would have installed -> resolved mostly by the plugin pattern we have now, for example
In order to be able to have a stable import, but lazy-load the implementation based on user choices like this, the entire services module tries to proxy everything, and can't be used at import time. I'm not sure this has been a great design choice; it's how the dependency injection pattern works, but those patterns are always bound to their container, and do not work globally at import time. So to address our need I suggest we break away from that part of the pattern. We can actually simplify alot if we drop that. The reason it exists in DI is because it makes it easy to swap out providers on different contexts, like testing. Since we're dealing with things that are only environment specific, and not application context specific, it doesn't even add any benefits (whether we need to use mpi4py depends only on what's installed on the machine, and doesn't differ between compiling or testing) so all our providers can be very easily resolved at import time and doesn't require a complicated DI provider hierarchy.
So if we drop that part of what inspired the bsb.services system, we can simplify it a lot.
The general idea
The framework defines a set of services: services are submodules to the bsb.services module, and represent an abstraction around a package dependency. When the bsb.services module is constructed, it resolves the providers the user configured (or the framework default) for each service. Then, still at import time, we swap out the submodules before any consumer can touch any of the submodule items (because importing bsb.services.* first imports bsb.services)
The proposal
The bsb.services.* submodules will serve as "reference modules". They can provide all the necessary stubs and/or type hints for IDE tools etc to work, for example from bsb.services.mpi import MPI would work and know which elements exist on the singleton. An developers would know what they'd have to implement.
The user can configure an ordered list of providers to use for each service. If none of the providers are available for a service the framework errors.
Service providers can be registered through the typical plugin package metadata entrypoint bsb.providers by advertising a module according to python entry points spec with the following special convention:
<bsb_service_name>_<provider_name> = "my.module"
with an optional 2nd paired entry point:
<bsb_service_name>_<provider_name>_loader = "other.module:my_loader"
The first entry point specifies the provider module which will be loaded if the provider is used. The service name is the name of the submodule. The provider name will be used by the user to choose which providers to use in either env or project options (script and CLI unavailable because they are not determinable at import time). Examples:
BSB_PROVIDE_MPI=mpi4py bsb compile # Run the framework with the service provider `mpi4py` or error
BSB_PROVIDE_MPI=mpi4py,default bsb compile # Run the framework with the service provider `mpi4py` or use the framework default (which noops in serial and errors in parallel
Then we need to find the "loader", the loader can do things that the provider needs to do when it is actually going to be used, such as set up or configure things. The loader can also raise a ProviderUnavailableError, in which case we'll skip it and go to the next provider. The loader is resolved in this order:
- Check for a 2nd paired entry point, use the advertised object as loader.
- Import the module and check for a
_bsb_load_providermethod, if it exists, use it as loader.
If no loader exists we unconditionally and immediately try to use the module as provider for the service. A service provider that expects to be used conditionally and has long import times or which has complicated initialization logic, or impure side effects (I'm basically talking about import mpi4py.MPI for every point in the list here) should therefor provide a loader via the paired entry point to avoid that it causes import errors or side effects when we're only trying to check whether the provider is available first.
I think this system is quite simple to implement, in bsb/services/__init__.py:
import pkgutil
from bsb.options import get_module_option
for service in (name for finder, name, ispkg in pkgutil.iter_modules(__path__)):
resolve_provider_module(service, get_module_option(f"provide_{service}"))
with then the logic described above implemented using importlib.metadata and importlib. If we place a submodule in sys.modules[f"bsb.services.{service}"] then it should never actually begin importing that file, and uses the module object we put in there instead. If that doesn't work because the import machinery is already going, then we can still add a call in each submodule to replace itself in sys.modules, which is supported and endorsed.