Introduce support for Dask backend #104
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the Dask module to support PyRDF analysis execution through a dask.distributed scheduler. The connection to the scheduler is either started remotely or locally depending on whether the user provides a scheduler address in the configuration of the Dask instance
The execution of the graph is done through the dask.delayed mechanism that wraps both the mapper and reducer functions. Data ranges are mapped and the results are recursively reduced until there is only one list of merged action results. A call to
dask.distributed.Future.computereturns the final result to the user.A new entry has been added to the options of PyRDF.use accordingly.
TODO: