Skip to content

feat(tier0): Implement Subclassing#134

Open
everaldorodrigo wants to merge 22 commits intomainfrom
tier0-feat-subclassing
Open

feat(tier0): Implement Subclassing#134
everaldorodrigo wants to merge 22 commits intomainfrom
tier0-feat-subclassing

Conversation

@everaldorodrigo
Copy link
Contributor

@everaldorodrigo everaldorodrigo commented Feb 26, 2026

Description

This PR implements subclassing expansion in the Dgraph transpiler to enhance query capabilities.

Feature Flags

Two new feature flags have been added to DgraphTranspiler class, configurable via constructor parameters or the global config CONFIG.tier0.dgraph:

Resolution order (highest to lowest priority):

  1. Explicit constructor argument — always wins when True or False is passed
  2. Config value — used when constructor arg is None and the key exists in CONFIG.tier0.dgraph
  3. True (enabled) — fallback default when neither constructor arg nor config key is set

Both flags can be overridden at instantiation time:

# Both flags explicitly set
transpiler = DgraphTranspiler(
    enable_symmetric_edges=True,
    enable_subclass_edges=False,
)

# Falls back to config or default (True)
transpiler = DgraphTranspiler()

Subclass Expansion

  • Adds a subclassing_enabled flag to DgraphTranspiler (default True) to emit subclass-based subqueries per hop.
  • Implements expansions:
    • Case 0a/0b: No expansion when original predicate is subclass_of (ID→subclass_of→ID or ID→subclass_of→CAT).
    • Case 1 (ID→P→ID): generate three forms
      • Form B: A’ ← subclass_of ← A; A’ → R → B
      • Form C: A → P → B’; B’ ← subclass_of ← B
      • Form D: A’ ← subclass_of ← A; A’ → P → B’; B’ ← subclass_of ← B
    • Case 2 (ID→P→CAT-only): generate Form B only (A’ ← subclass_of ← A; A’ → P → CAT), and only when target has categories but no IDs.
    • Case 3 (CAT→P→ID): Mirrored Form B only (CAT → P → B'; B' ← subclass_of ← B), when source has categories but no IDs and target has IDs. This handles queries traversing in either direction (forward from CAT or backward from ID).
  • Preserves constraints:
    • Applies attribute/qualifier constraints only to the original predicate R segments in subclass forms.
    • Does not apply constraints to subclass_of edges.
  • Updates node-level @cascade to omit reverse fields for symmetric and subclass edges (those use OR-logic post-filtering instead).
  • Adds _filter_cascaded_with_or post-processing step to enforce OR logic across symmetric and subclass expansion paths.
  • Adds _detect_symmetric_and_subclass_edges to pre-detect all special edges before query generation, enabling correct @cascade clause construction.
  • Adds tests.

Dgraph Subclassing Traversal Logic

The following describes the Dgraph traversal logic for each expansion form, showing how nodes are connected via their subject and object edge fields.

  • Case 1: ID:A → predicate1 → ID:B
    • Form A: ID:A → ~subject → predicate1 → object → ID:B
    • Form B: ID:A → ~object → subclass_of → subject → A' → ~subject → predicate1 → object → ID:B
    • Form C: ID:A → ~subject → predicate1 → object → B' → ~subject → subclass_of → object → ID:B
    • Form D: ID:A → ~object → subclass_of → subject → A' → ~subject → predicate1 → object → B' → ~subject → subclass_of → object → ID:B
  • Case 2: ID:A → predicate1 → CAT:B
    • Form A: ID:A → ~subject → predicate1 → object → CAT:B
    • Form B: ID:A → ~object → subclass_of → subject → A' → ~subject → predicate1 → object → CAT:B
  • Case 3: CAT:A → predicate P → ID:B
    • Form A: CAT:A → ~subject → P → object → ID:B
    • Mirrored Form B: CAT:A → ~object → P → subject → B' → ~subject → subclass_of → object → ID:B

Data Representation in Dgraph

  1. Query starting from Subject Node to Edge
    Question: Given a subject node, how do I find all edges that start from it?
    SubjectNode → ~subject → Edge

  2. Query starting from Object Node to Edge
    Question: Given an object node, how do I find all edges that point to it?
    ObjectNode → ~object → Edge

  3. Query starting from Edge to Subject Node
    Question: Given an edge, how do I find its subject (source) node?
    Edge → subject → SubjectNode

  4. Query starting from Edge to Object Node
    Question: Given an edge, how do I find its object (target) node?
    Edge → object → ObjectNode

  5. Query starting from Edge to Both Subject and Object Nodes
    Question: Given an edge, how do I find both its subject and object nodes in a single query?
    SubjectNode ← subject ← Edge → object → ObjectNode

@sentry
Copy link

sentry bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 62.54682% with 100 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...c/retriever/data_tiers/tier_0/dgraph/transpiler.py 61.90% 79 Missing and 17 partials ⚠️
...etriever/data_tiers/tier_0/dgraph/result_models.py 69.23% 3 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
src/retriever/config/general.py 100.00% <100.00%> (ø)
...etriever/data_tiers/tier_0/dgraph/result_models.py 87.42% <69.23%> (-1.77%) ⬇️
...c/retriever/data_tiers/tier_0/dgraph/transpiler.py 70.01% <61.90%> (-5.22%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@everaldorodrigo everaldorodrigo marked this pull request as ready for review February 28, 2026 00:27
@tokebe
Copy link
Contributor

tokebe commented Mar 2, 2026

Basic query I've been using to test things fails:

curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
  "parameters": { "tiers": [ 0 ], "timeout": -1 },
  "bypass_cache": true,
  "submitter": "bte-dev-tester-manual",
  "message": {
    "query_graph": {
      "nodes": {
        "nB": {
          "categories": [ "biolink:Disease" ],
          "ids": [ "MONDO:0005015" ]
        },
        "nA": {
          "categories": [ "biolink:Drug" ],
          "ids": [ "CHEBI:6801" ]
        }
      },
      "edges": {
        "e1": {
          "subject": "nA",
          "object": "nB",
          "predicates": [ "biolink:treats_or_applied_or_studied_to_treat" ]
        }
      }
    }
  }
}
'
2026-03-02T16:19:53.226-05:00 34540 ERROR   2bdf97f5 Unhandled exception occurred while processing Tier 0 query. See logs for details. retriever.utils.logs:exception():161
Traceback (most recent call last):
  File "/Users/jcallaghan/Projects/retriever-dev/.venv/bin/retriever", line 10, in <module>
    sys.exit(main())
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/__main__.py", line 83, in main
    uvloop.run(_main_inner())
  File "/Users/jcallaghan/Projects/retriever-dev/.venv/lib/python3.13/site-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
  File "/Users/jcallaghan/.local/share/uv/python/cpython-3.13.2-macos-x86_64-none/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
  File "/Users/jcallaghan/.local/share/uv/python/cpython-3.13.2-macos-x86_64-none/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/Users/jcallaghan/Projects/retriever-dev/.venv/lib/python3.13/site-packages/opentelemetry/util/_decorator.py", line 71, in async_wrapper
    return await func(*args, **kwargs)  # type: ignore
> File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/base_query.py", line 50, in execute
    backend_results = await self.get_results(self.qgraph)
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/query.py", line 33, in get_results
    result = transpiler.convert_results(qgraph, backend_record.data["q0"])
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1589, in convert_results
    partials = self._build_results(node, qgraph)
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1525, in _build_results
    for partial in self._build_results(edge.node, qg):
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1499, in _build_results
    constraints = qg["nodes"][original_node_id].get("constraints", []) or []
KeyError: 'intermediate'

@tokebe
Copy link
Contributor

tokebe commented Mar 2, 2026

A more advanced testing case which runs in ~30 seconds on Tier 1, but seems to time out even on the present Tier 0, without this PR:

curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
  "parameters": { "tiers": [ 0 ], "timeout": -1 },
  "bypass_cache": true,
  "submitter": "bte-dev-tester-manual",
  "message": {
    "query_graph": {
      "nodes": {
        "nB": { "categories": [ "biolink:Disease" ], "ids": [ "MONDO:0005015" ] },
        "nI": { "categories": [ "biolink:Gene" ] },
        "nA": { "categories": [ "biolink:Drug" ], "ids": [ "CHEBI:6801" ] }
      },
      "edges": {
        "e1": {
          "subject": "nA",
          "object": "nI",
          "predicates": [ "biolink:affects" ]
        },
        "e2": { "subject": "nI", "object": "nB" }
      }
    }
  }
}
'

@everaldorodrigo
Copy link
Contributor Author

Basic query I've been using to test things fails:

curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
  "parameters": { "tiers": [ 0 ], "timeout": -1 },
  "bypass_cache": true,
  "submitter": "bte-dev-tester-manual",
  "message": {
    "query_graph": {
      "nodes": {
        "nB": {
          "categories": [ "biolink:Disease" ],
          "ids": [ "MONDO:0005015" ]
        },
        "nA": {
          "categories": [ "biolink:Drug" ],
          "ids": [ "CHEBI:6801" ]
        }
      },
      "edges": {
        "e1": {
          "subject": "nA",
          "object": "nB",
          "predicates": [ "biolink:treats_or_applied_or_studied_to_treat" ]
        }
      }
    }
  }
}
'
2026-03-02T16:19:53.226-05:00 34540 ERROR   2bdf97f5 Unhandled exception occurred while processing Tier 0 query. See logs for details. retriever.utils.logs:exception():161
Traceback (most recent call last):
  File "/Users/jcallaghan/Projects/retriever-dev/.venv/bin/retriever", line 10, in <module>
    sys.exit(main())
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/__main__.py", line 83, in main
    uvloop.run(_main_inner())
  File "/Users/jcallaghan/Projects/retriever-dev/.venv/lib/python3.13/site-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
  File "/Users/jcallaghan/.local/share/uv/python/cpython-3.13.2-macos-x86_64-none/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
  File "/Users/jcallaghan/.local/share/uv/python/cpython-3.13.2-macos-x86_64-none/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/Users/jcallaghan/Projects/retriever-dev/.venv/lib/python3.13/site-packages/opentelemetry/util/_decorator.py", line 71, in async_wrapper
    return await func(*args, **kwargs)  # type: ignore
> File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/base_query.py", line 50, in execute
    backend_results = await self.get_results(self.qgraph)
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/query.py", line 33, in get_results
    result = transpiler.convert_results(qgraph, backend_record.data["q0"])
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1589, in convert_results
    partials = self._build_results(node, qgraph)
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1525, in _build_results
    for partial in self._build_results(edge.node, qg):
  File "/Users/jcallaghan/Projects/retriever-dev/src/retriever/data_tiers/tier_0/dgraph/transpiler.py", line 1499, in _build_results
    constraints = qg["nodes"][original_node_id].get("constraints", []) or []
KeyError: 'intermediate'

I pushed a commit to fix this error.

@tokebe tokebe force-pushed the tier0-feat-subclassing branch from f6a064e to 8b00255 Compare March 10, 2026 20:40
@tokebe
Copy link
Contributor

tokebe commented Mar 10, 2026

Correction for my 2-hop test case:

Having no predicate on e2 causes related_to to be used, which subclass_of is a descendant of, meaning subclassing does not occur in Tier 1. Below is a mild change that causes Tier 1 to use subclassing:

curl -X POST \
http://localhost:8080/query \
-H 'Content-Type: application/json' \
--data '{
  "parameters": { "tiers": [ 0 ], "timeout": -1 },
  "bypass_cache": true,
  "submitter": "bte-dev-tester-manual",
  "message": {
    "query_graph": {
      "nodes": {
        "nB": { "categories": [ "biolink:Disease" ], "ids": [ "MONDO:0005015" ] },
        "nI": { "categories": [ "biolink:Gene" ] },
        "nA": { "categories": [ "biolink:Drug" ], "ids": [ "CHEBI:6801" ] }
      },
      "edges": {
        "e1": {
          "subject": "nA",
          "object": "nI",
          "predicates": [ "biolink:affects" ]
        },
        "e2": {
          "subject": "nI",
          "object": "nB",
          "predicates": [ "biolink:gene_associated_with_condition" ]
        }
      }
    }
  }
}
'

@tokebe
Copy link
Contributor

tokebe commented Mar 10, 2026

I've rebased the PR to the latest from main and added a commit to theoretically handle subclass transformation to TRAPI to the sort of 'phase 1' state, prior to using a subclass backmap to reorganize the response with construct edges.

This works for the simple 1-hop case; we can see that it's producing 12 results, each one binding a different subclass of diabetes.

For the 2-hop case (the corrected version above), it incorrectly creates results only binding the first hop. I've checked and this is not due to my transformation code; instead, the result models provided don't continue past the first hop. The edges fields on the gene nodes are empty. @everaldorodrigo can you look into this?

@tokebe tokebe force-pushed the tier0-feat-subclassing branch from 6c85c02 to f2b229d Compare March 11, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants