Skip to content

Conversation

@Aklakan
Copy link
Contributor

@Aklakan Aklakan commented Oct 14, 2025

GitHub issue resolved #3510

Pull request Description: Updated proposal for custom SPARQL adapters over DatasetGraphs based on #3184 .
The goal is to unify local and remote query execution on the DatasetGraph level in a way that allows for efficient query/update execution by possibly offloading the execution work to an external or remote engine. This way, execution tracking on the dataset graph (using its Context) level should thus work for both local and remote workloads.

The main difference to the previous proposal is, that unification happens at the builder level - i.e. QueryExecBuilder and UpdateExecBuilder. SparqlAdapter, as the DatasetGraph-specific factory for those builders, is the ARQ-level driver interface for implementing custom sparql execution over a DatasetGraph.

A generic SparqlAdapter for RDFLink is provided. This adapter design is aimed at making any future vendor-specific RDFLink extension immediately available on the ARQ level. Conversely, new implementations of SparqlAdapter - while possible - should be avoided in favor of RDFLink-based implementations.

The class ExampleDBpediaViaRemoteDataset.java demonstrates the system: A query with Virtuoso-specific features is passed through the Jena API to the DBpedia endpoint, a custom execution wrapper is applied, and yet the outcome is a QueryExecHTTP instance that allows for inspecting certain HTTP fields.

String queryString =
	"SELECT * FROM <http://dbpedia.org> { ?s rdfs:label ?o . ?o bif:contains 'Leipzig' } LIMIT 3";

DatasetGraph dsg = new DatasetGraphOverRDFLink(() ->
	RDFLinkHTTP.newBuilder().destination("http://dbpedia.org/sparql").build());

try (QueryExec qe = QueryExec.newBuilder().dataset(dsg).query(queryString)
		.timeout(10, TimeUnit.SECONDS).transformExec(e -> new QueryExecWrapperDemo(label, e)).build()) {
    // ...
}
Remote Execution Deferred: Dataset type: DatasetGraphOverRDFLink
Remote Execution Deferred: QueryExecBuilder type: QueryExecDatasetBuilderDeferred
Remote Execution Deferred: QueryExec type: QueryExecHTTPWrapper
Remote Execution Deferred: Execution result object type: RowSetBuffered
---------------------------------------------------------------------------------------------------------------------
| s                                                                        | o                                      |
=====================================================================================================================
| <http://dbpedia.org/resource/1._FC_Lokomotive_Leipzig>                   | "1. FC Lokomotive Leipzig"@en          |
| <http://dbpedia.org/resource/Category:1._FC_Lokomotive_Leipzig>          | "1. FC Lokomotive Leipzig"@en          |
| <http://dbpedia.org/resource/Category:1._FC_Lokomotive_Leipzig_managers> | "1. FC Lokomotive Leipzig managers"@en |
---------------------------------------------------------------------------------------------------------------------

Probably the most critical changes are:

  • QueryExecBuilderDataset and QueryExecHTTP are now interfaces.

  • Tests are included.
  • [ ] Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 14, 2025

Overall I'd say the proposal is in a state where it can be reviewed.

The part I that I am not totally sure about whether the (public) transformExec methods are really needed at this stage, because post transformations of execs could also be hard-wired into the build method of builders.
I.e. during build, check the context for some key(s) and apply the exec transforms.
However, the indirection with QueryExecHTTPWrapper would still be needed in order to transparently modify a QueryExec while still exposing the HTTP fields.

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from 53ff0ac to 413a5ee Compare October 15, 2025 02:11
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 2 times, most recently from eab080d to e8d0931 Compare October 27, 2025 13:56
import org.apache.jena.sparql.exec.UpdateExecBuilder;

public interface SparqlAdapter {
QueryExecBuilder newQuery();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a suggestion createAdapter.

But this is the adapter?

Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?

Copy link
Contributor Author

@Aklakan Aklakan Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea of this proposal is to introduce SparqlAdapter as the one place in ARQ where eventually all the sparql level machinery over a specific DatasetGraph implementation would go. So adapt a DatasetGraph to the SPARQL layer (could also be called a bridge).
Currently, this comprises the SPARQL subset covered by query and update exec builders. Streaming updates (UpdateExecSteaming) and GSP would require additional methods on SparqlAdapter.

Conceptually, for a specific dataset implementation, there would be one specific SparqlAdapter instance - possibly dynamically assembled by a SparqlAdapterProvider.
However, I think "real" custom implementations should build upon RDFLink - so the only ARQ-level adapter that needs to exist (besides that for the ARQ engine) is the bridge to RDFLink, which is based on DatasetGraphOverRDFLink and SparqlAdapterProviderForDatasetGraphOverRDFLink.java (registered within jena-rdfconnection)

In the example above, DatasetGraphOverRDFLink is handled by a specific SparqlAdapterProvider.
Using custom wrappers with DatasetGraphs (without registering custom providers) will fall back to the default ARQ engine provider SparqlAdapterProviderMain.

An alternative design for SparqlAdapter is to keep QueryExecBuilder, UpdateExecBuilder and GSP in separate registries:
QueryExecBuilder.adapt(dsg) would go to a QueryExecBuilderRegistry backed by a list of QueryExecBuilderProviders and the first match creates the final QueryExecBuilder. Same for update.
I think collecting this related functionality in a single SparqlAdapter system might be nicer.

Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?

I took the naming from QueryExecBuilder RDFLink.newQuery. In essence SparqlAdapter is an ARQ-level variant of RDFLink - though ARQ doesn't have links - transactions are so far handled managed by the DatasetGraph (per thread).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but this just does not feel right... I mean, just look at the class name of SparqlAdapterProviderForDatasetGraphOverRDFLink :) To me it's a clear indication that two if not more concerns that are orthogonal got conflated/"flattened" into one.

Copy link
Contributor Author

@Aklakan Aklakan Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I am mixing orthogonal concerns - but I am open to criticism. To recap:

  • SparqlAdapterProvider provides a "SPARQL layer" implementation for a dataset graph implementation.
    • A SPARQL layer implementation comprises builders for query, update and GSP requests.
  • DatasetGraphOverRDFLink is a specific DatasetGraph implementation backed by a factory of RDFLinks. Much like a JDBC data source is a factory for connections.
  • SparqlAdapterProviderDatasetGraphOverRDFLink is the adapter for DatasetGraphOverRDFLink. For example, when querying, this provider will supply specialized QueryExecBuilderOverRDFLink instances that will pass on a query statement to the link - instead of evaluating it against the dataset graph API.

Note, that the sparql adapter system of this PR works under the hood - conventional code should never have to interact with it directly.

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 2 times, most recently from 7baa3e0 to 2dd1dbb Compare November 30, 2025 17:06
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from 2dd1dbb to 116e788 Compare December 9, 2025 17:17
@Aklakan
Copy link
Contributor Author

Aklakan commented Dec 9, 2025

I think there are three meaningful policies for how to pass on transactions when doing

DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator);
RDFLink frontLink = RDFLink.connect(dsg);

RDFLink backingLink = dsg.newLink(); // Calls linkCreator.create()
  • Link-per-execution: Every query exec obtained from the front facing link, such as via QueryExec qe = frontLink.query(...), is backed by a fresh backing link (dsg.newLink()) that is closed when qe is closed.
  • Pass through: RDFLink.connect(dsg) returns the same link returned by dsg.newLink(). This is currently unsupported because (a) it would require adapting RDFLink.connect and (b) the behavior should be covered by the thread-local-links policy.
  • Thread-local-links: dsg.begin() will open a link via dsg.newLink() and place it into a ThreadLocal of dsg. All further API calls will go to that backing link until dsg.end() is called.
    The behavior should be the same as pass-through. The only difference is, that the link returned by RDFLink.connect is a wrapper that delegates to the backing link in dsg's thread local.

I updated the code of DatasetGraphOverRDFLink with an internal TransactionalOverRDFLink class for the thread-local-links policy.

// Pass 'true' for supportsTransactions to enable thread local links:
boolean supportsTransactions = true;
DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator,
    supportsTransactions, supportsTransactionAbort);

[Update]

  • Note: Placing a link into a datasetgraph's thread local on begin and closing it on end ties the link's life-cycle to that of the transaction. Consequently, begin transaction -> possible connection overhead. Shouldn't be a problem but it is a possible caveat when using DatasetGraphOverRDFLink.

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from 116e788 to 5637051 Compare December 18, 2025 16:08
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from 5637051 to cc1fc9b Compare December 18, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparql Adapter System

3 participants