Skip to content

Refactor the services module #200

@Helveg

Description

@Helveg

The services module was intended to loosely follow the Dependency Injection pattern, where the services module itself would serve as the container, and providers for services could be registered, so that dependencies of the framework could always be imported neatly, and only once they're used would start using them they would resolve themselves from the registered providers. The code now is kinda messy and I don't think the intended use cases are entirely covered, and the semantics are quite unclear.

  • Why does it even exist? Because it wants to make sure the framework/user-code can safely import all its dependencies at import time. There are 2 tricky cases when that was usually impossible to do at import time:
    • Optional dependencies brought in my sub-features that not all users would have installed -> resolved mostly by the plugin pattern we have now, for example nest is no longer an optional dependency of the framework (bsb-core), but is used only within the plugin bsb-nest; 1 case remains which is MPI.
    • Dependencies that could differ based on user needs, for which a user might want to choose their own provider. For example we parallelize job distribution over mpipool but one might want to use multiprocessing or something else (although at this point I think the JobPool relies on too many implementation details of the MPIExecutor to unravel those); the lock service could also use something else then MPILock, etc ...

In order to be able to have a stable import, but lazy-load the implementation based on user choices like this, the entire services module tries to proxy everything, and can't be used at import time. I'm not sure this has been a great design choice; it's how the dependency injection pattern works, but those patterns are always bound to their container, and do not work globally at import time. So to address our need I suggest we break away from that part of the pattern. We can actually simplify alot if we drop that. The reason it exists in DI is because it makes it easy to swap out providers on different contexts, like testing. Since we're dealing with things that are only environment specific, and not application context specific, it doesn't even add any benefits (whether we need to use mpi4py depends only on what's installed on the machine, and doesn't differ between compiling or testing) so all our providers can be very easily resolved at import time and doesn't require a complicated DI provider hierarchy.

So if we drop that part of what inspired the bsb.services system, we can simplify it a lot.

The general idea

The framework defines a set of services: services are submodules to the bsb.services module, and represent an abstraction around a package dependency. When the bsb.services module is constructed, it resolves the providers the user configured (or the framework default) for each service. Then, still at import time, we swap out the submodules before any consumer can touch any of the submodule items (because importing bsb.services.* first imports bsb.services)

The proposal

The bsb.services.* submodules will serve as "reference modules". They can provide all the necessary stubs and/or type hints for IDE tools etc to work, for example from bsb.services.mpi import MPI would work and know which elements exist on the singleton. An developers would know what they'd have to implement.

The user can configure an ordered list of providers to use for each service. If none of the providers are available for a service the framework errors.

Service providers can be registered through the typical plugin package metadata entrypoint bsb.providers by advertising a module according to python entry points spec with the following special convention:

<bsb_service_name>_<provider_name> = "my.module"

with an optional 2nd paired entry point:

<bsb_service_name>_<provider_name>_loader = "other.module:my_loader"

The first entry point specifies the provider module which will be loaded if the provider is used. The service name is the name of the submodule. The provider name will be used by the user to choose which providers to use in either env or project options (script and CLI unavailable because they are not determinable at import time). Examples:

BSB_PROVIDE_MPI=mpi4py bsb compile  # Run the framework with the service provider `mpi4py` or error
BSB_PROVIDE_MPI=mpi4py,default bsb compile  # Run the framework with the service provider `mpi4py` or use the framework default (which noops in serial and errors in parallel

Then we need to find the "loader", the loader can do things that the provider needs to do when it is actually going to be used, such as set up or configure things. The loader can also raise a ProviderUnavailableError, in which case we'll skip it and go to the next provider. The loader is resolved in this order:

  • Check for a 2nd paired entry point, use the advertised object as loader.
  • Import the module and check for a _bsb_load_provider method, if it exists, use it as loader.

If no loader exists we unconditionally and immediately try to use the module as provider for the service. A service provider that expects to be used conditionally and has long import times or which has complicated initialization logic, or impure side effects (I'm basically talking about import mpi4py.MPI for every point in the list here) should therefor provide a loader via the paired entry point to avoid that it causes import errors or side effects when we're only trying to check whether the provider is available first.

I think this system is quite simple to implement, in bsb/services/__init__.py:

import pkgutil

from bsb.options import get_module_option

for service in (name for finder, name, ispkg in pkgutil.iter_modules(__path__)):
    resolve_provider_module(service, get_module_option(f"provide_{service}"))

with then the logic described above implemented using importlib.metadata and importlib. If we place a submodule in sys.modules[f"bsb.services.{service}"] then it should never actually begin importing that file, and uses the module object we put in there instead. If that doesn't work because the import machinery is already going, then we can still add a call in each submodule to replace itself in sys.modules, which is supported and endorsed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions