-
Notifications
You must be signed in to change notification settings - Fork 673
GH-3510: Sparql Adapter System #3511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Overall I'd say the proposal is in a state where it can be reviewed. The part I that I am not totally sure about whether the (public) |
53ff0ac to
413a5ee
Compare
eab080d to
e8d0931
Compare
| import org.apache.jena.sparql.exec.UpdateExecBuilder; | ||
|
|
||
| public interface SparqlAdapter { | ||
| QueryExecBuilder newQuery(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a suggestion createAdapter.
But this is the adapter?
Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic idea of this proposal is to introduce SparqlAdapter as the one place in ARQ where eventually all the sparql level machinery over a specific DatasetGraph implementation would go. So adapt a DatasetGraph to the SPARQL layer (could also be called a bridge).
Currently, this comprises the SPARQL subset covered by query and update exec builders. Streaming updates (UpdateExecSteaming) and GSP would require additional methods on SparqlAdapter.
Conceptually, for a specific dataset implementation, there would be one specific SparqlAdapter instance - possibly dynamically assembled by a SparqlAdapterProvider.
However, I think "real" custom implementations should build upon RDFLink - so the only ARQ-level adapter that needs to exist (besides that for the ARQ engine) is the bridge to RDFLink, which is based on DatasetGraphOverRDFLink and SparqlAdapterProviderForDatasetGraphOverRDFLink.java (registered within jena-rdfconnection)
In the example above, DatasetGraphOverRDFLink is handled by a specific SparqlAdapterProvider.
Using custom wrappers with DatasetGraphs (without registering custom providers) will fall back to the default ARQ engine provider SparqlAdapterProviderMain.
An alternative design for SparqlAdapter is to keep QueryExecBuilder, UpdateExecBuilder and GSP in separate registries:
QueryExecBuilder.adapt(dsg) would go to a QueryExecBuilderRegistry backed by a list of QueryExecBuilderProviders and the first match creates the final QueryExecBuilder. Same for update.
I think collecting this related functionality in a single SparqlAdapter system might be nicer.
Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?
I took the naming from QueryExecBuilder RDFLink.newQuery. In essence SparqlAdapter is an ARQ-level variant of RDFLink - though ARQ doesn't have links - transactions are so far handled managed by the DatasetGraph (per thread).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but this just does not feel right... I mean, just look at the class name of SparqlAdapterProviderForDatasetGraphOverRDFLink :) To me it's a clear indication that two if not more concerns that are orthogonal got conflated/"flattened" into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that I am mixing orthogonal concerns - but I am open to criticism. To recap:
- SparqlAdapterProvider provides a "SPARQL layer" implementation for a dataset graph implementation.
- A SPARQL layer implementation comprises builders for query, update and GSP requests.
DatasetGraphOverRDFLinkis a specificDatasetGraphimplementation backed by a factory of RDFLinks. Much like a JDBC data source is a factory for connections.SparqlAdapterProviderDatasetGraphOverRDFLinkis the adapter forDatasetGraphOverRDFLink. For example, when querying, this provider will supply specializedQueryExecBuilderOverRDFLinkinstances that will pass on a query statement to the link - instead of evaluating it against the dataset graph API.
Note, that the sparql adapter system of this PR works under the hood - conventional code should never have to interact with it directly.
7baa3e0 to
2dd1dbb
Compare
2dd1dbb to
116e788
Compare
|
I think there are three meaningful policies for how to pass on transactions when doing DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator);
RDFLink frontLink = RDFLink.connect(dsg);
RDFLink backingLink = dsg.newLink(); // Calls linkCreator.create()
I updated the code of // Pass 'true' for supportsTransactions to enable thread local links:
boolean supportsTransactions = true;
DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator,
supportsTransactions, supportsTransactionAbort);[Update]
|
116e788 to
5637051
Compare
5637051 to
cc1fc9b
Compare
GitHub issue resolved #3510
Pull request Description: Updated proposal for custom SPARQL adapters over DatasetGraphs based on #3184 .
The goal is to unify local and remote query execution on the DatasetGraph level in a way that allows for efficient query/update execution by possibly offloading the execution work to an external or remote engine. This way, execution tracking on the dataset graph (using its Context) level should thus work for both local and remote workloads.
The main difference to the previous proposal is, that unification happens at the builder level - i.e. QueryExecBuilder and UpdateExecBuilder. SparqlAdapter, as the DatasetGraph-specific factory for those builders, is the ARQ-level driver interface for implementing custom sparql execution over a DatasetGraph.
A generic SparqlAdapter for RDFLink is provided. This adapter design is aimed at making any future vendor-specific RDFLink extension immediately available on the ARQ level. Conversely, new implementations of SparqlAdapter - while possible - should be avoided in favor of RDFLink-based implementations.
The class
ExampleDBpediaViaRemoteDataset.javademonstrates the system: A query with Virtuoso-specific features is passed through the Jena API to the DBpedia endpoint, a custom execution wrapper is applied, and yet the outcome is a QueryExecHTTP instance that allows for inspecting certain HTTP fields.Probably the most critical changes are:
QueryExecBuilderDatasetandQueryExecHTTPare now interfaces.[ ] Documentation change and updates are provided for the Apache Jena websiteBy submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.