Conversation
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
|
Basic query I've been using to test things fails: curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
"parameters": { "tiers": [ 0 ], "timeout": -1 },
"bypass_cache": true,
"submitter": "bte-dev-tester-manual",
"message": {
"query_graph": {
"nodes": {
"nB": {
"categories": [ "biolink:Disease" ],
"ids": [ "MONDO:0005015" ]
},
"nA": {
"categories": [ "biolink:Drug" ],
"ids": [ "CHEBI:6801" ]
}
},
"edges": {
"e1": {
"subject": "nA",
"object": "nB",
"predicates": [ "biolink:treats_or_applied_or_studied_to_treat" ]
}
}
}
}
}
'2026-03-02T16:19:53.226-05:00 34540 ERROR 2bdf97f5 Unhandled exception occurred while processing Tier 0 query. See logs for details. retriever.utils.logs:exception():161
Traceback (most recent call last):
File "/Users/jcallaghan/Projects/retriever-dev/.venv/bin/retriever", line 10, in <module>
sys.exit(main())
File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/__main__.py", line 83, in main
uvloop.run(_main_inner())
File "/Users/jcallaghan/Projects/retriever-dev/.venv/lib/python3.13/site-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
File "/Users/jcallaghan/.local/share/uv/python/cpython-3.13.2-macos-x86_64-none/lib/python3.13/asyncio/runners.py", line 195, in run
return runner.run(main)
File "/Users/jcallaghan/.local/share/uv/python/cpython-3.13.2-macos-x86_64-none/lib/python3.13/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/Users/jcallaghan/Projects/retriever-dev/.venv/lib/python3.13/site-packages/opentelemetry/util/_decorator.py", line 71, in async_wrapper
return await func(*args, **kwargs) # type: ignore
> File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/base_query.py", line 50, in execute
backend_results = await self.get_results(self.qgraph)
File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/query.py", line 33, in get_results
result = transpiler.convert_results(qgraph, backend_record.data["q0"])
File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1589, in convert_results
partials = self._build_results(node, qgraph)
File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1525, in _build_results
for partial in self._build_results(edge.node, qg):
File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1499, in _build_results
constraints = qg["nodes"][original_node_id].get("constraints", []) or []
KeyError: 'intermediate' |
|
A more advanced testing case which runs in ~30 seconds on Tier 1, but seems to time out even on the present Tier 0, without this PR: curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
"parameters": { "tiers": [ 0 ], "timeout": -1 },
"bypass_cache": true,
"submitter": "bte-dev-tester-manual",
"message": {
"query_graph": {
"nodes": {
"nB": { "categories": [ "biolink:Disease" ], "ids": [ "MONDO:0005015" ] },
"nI": { "categories": [ "biolink:Gene" ] },
"nA": { "categories": [ "biolink:Drug" ], "ids": [ "CHEBI:6801" ] }
},
"edges": {
"e1": {
"subject": "nA",
"object": "nI",
"predicates": [ "biolink:affects" ]
},
"e2": { "subject": "nI", "object": "nB" }
}
}
}
}
' |
I pushed a commit to fix this error. |
f6a064e to
8b00255
Compare
|
Correction for my 2-hop test case: Having no predicate on curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
"parameters": { "tiers": [ 0 ], "timeout": -1 },
"bypass_cache": true,
"submitter": "bte-dev-tester-manual",
"message": {
"query_graph": {
"nodes": {
"nB": { "categories": [ "biolink:Disease" ], "ids": [ "MONDO:0005015" ] },
"nI": { "categories": [ "biolink:Gene" ] },
"nA": { "categories": [ "biolink:Drug" ], "ids": [ "CHEBI:6801" ] }
},
"edges": {
"e1": {
"subject": "nA",
"object": "nI",
"predicates": [ "biolink:affects" ]
},
"e2": {
"subject": "nI",
"object": "nB",
"predicates": [ "biolink:gene_associated_with_condition" ]
}
}
}
}
}
' |
|
I've rebased the PR to the latest from main and added a commit to theoretically handle subclass transformation to TRAPI to the sort of 'phase 1' state, prior to using a subclass backmap to reorganize the response with construct edges. This works for the simple 1-hop case; we can see that it's producing 12 results, each one binding a different subclass of diabetes. For the 2-hop case (the corrected version above), it incorrectly creates results only binding the first hop. I've checked and this is not due to my transformation code; instead, the result models provided don't continue past the first hop. The |
6c85c02 to
f2b229d
Compare
Description
This PR implements subclassing expansion in the Dgraph transpiler to enhance query capabilities.
Feature Flags
Two new feature flags have been added to
DgraphTranspilerclass, configurable via constructor parameters or the global configCONFIG.tier0.dgraph:Resolution order (highest to lowest priority):
TrueorFalseis passedNoneand the key exists inCONFIG.tier0.dgraphTrue(enabled) — fallback default when neither constructor arg nor config key is setBoth flags can be overridden at instantiation time:
Subclass Expansion
subclassing_enabledflag toDgraphTranspiler(default True) to emit subclass-based subqueries per hop.subclass_of(ID→subclass_of→IDorID→subclass_of→CAT).ID→P→ID): generate three formsA’ ← subclass_of ← A; A’ → R → BA → P → B’; B’ ← subclass_of ← BA’ ← subclass_of ← A; A’ → P → B’; B’ ← subclass_of ← BID→P→CAT-only): generate Form B only (A’ ← subclass_of ← A; A’ → P → CAT), and only when target has categories but no IDs.CAT→P→ID): Mirrored Form B only (CAT → P → B'; B' ← subclass_of ← B), when source has categories but no IDs and target has IDs. This handles queries traversing in either direction (forward from CAT or backward from ID).predicate Rsegments in subclass forms.subclass_ofedges.@cascadeto omit reverse fields forsymmetricandsubclassedges (those use OR-logic post-filtering instead)._filter_cascaded_with_orpost-processing step to enforce OR logic across symmetric and subclass expansion paths._detect_symmetric_and_subclass_edgesto pre-detect all special edges before query generation, enabling correct@cascadeclause construction.Dgraph Subclassing Traversal Logic
The following describes the Dgraph traversal logic for each expansion form, showing how nodes are connected via their
subjectandobjectedge fields.ID:A → predicate1 → ID:BID:A → ~subject → predicate1 → object → ID:BID:A → ~object → subclass_of → subject → A' → ~subject → predicate1 → object → ID:BID:A → ~subject → predicate1 → object → B' → ~subject → subclass_of → object → ID:BID:A → ~object → subclass_of → subject → A' → ~subject → predicate1 → object → B' → ~subject → subclass_of → object → ID:BID:A → ~subject → predicate1 → object → CAT:BID:A → ~object → subclass_of → subject → A' → ~subject → predicate1 → object → CAT:BCAT:A → ~subject → P → object → ID:BCAT:A → ~object → P → subject → B' → ~subject → subclass_of → object → ID:BData Representation in Dgraph
Query starting from Subject Node to Edge
Question: Given a subject node, how do I find all edges that start from it?
SubjectNode → ~subject → EdgeQuery starting from Object Node to Edge
Question: Given an object node, how do I find all edges that point to it?
ObjectNode → ~object → EdgeQuery starting from Edge to Subject Node
Question: Given an edge, how do I find its subject (source) node?
Edge → subject → SubjectNodeQuery starting from Edge to Object Node
Question: Given an edge, how do I find its object (target) node?
Edge → object → ObjectNodeQuery starting from Edge to Both Subject and Object Nodes
Question: Given an edge, how do I find both its subject and object nodes in a single query?
SubjectNode ← subject ← Edge → object → ObjectNode