Arax pathfinder by mohsenht · Pull Request #64 · BioPack-team/shepherd

mohsenht · 2026-01-21T16:17:24Z

Please review this pull request.

codecov · 2026-01-21T16:19:59Z

Codecov Report

❌ Patch coverage is 2.82486% with 172 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.48%. Comparing base (928e4d8) to head (ce1a5e6).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
workers/arax_pathfinder/worker.py	0.00%	111 Missing ⚠️
workers/arax/worker.py	0.00%	35 Missing ⚠️
shepherd_utils/inject_shepherd_arax_provenance.py	0.00%	26 Missing ⚠️

Files with missing lines	Coverage Δ
shepherd_server/main.py	`0.00% <ø> (ø)`
shepherd_utils/config.py	`100.00% <100.00%> (ø)`
shepherd_utils/inject_shepherd_arax_provenance.py	`0.00% <0.00%> (ø)`
workers/arax/worker.py	`0.00% <0.00%> (ø)`
workers/arax_pathfinder/worker.py	`0.00% <0.00%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a561fa...ce1a5e6. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

workers/arax/worker.py

workers/arax_pathfinder/worker.py

maximusunc · 2026-02-04T16:19:48Z

I tried running a query through the ARAX pathfinder and it's unclear what happened. I got these logs:

arax_pathfinder  | 2026-02-04T16:02:05.361576: DEBUG: lookup map not here! /tmp/biolink/biolink_lookup_map_4.2.5_v5.pickle
arax_pathfinder  | 2026-02-04T16:02:07.267560: INFO: Building local Biolink 4.2.5 ancestor/descendant lookup map because one doesn't yet exist
arax_pathfinder  | [2026-02-04 16:02:07,299: INFO/shepherd.arax.pathfinder.e3355332.00527f4e]: Model release date: 12/01/2025
arax_pathfinder  | [2026-02-04 16:02:07,299: INFO/shepherd.arax.pathfinder.e3355332.00527f4e]: Finding paths process has started
arax_pathfinder  | [2026-02-04 16:02:07,300: INFO/shepherd.arax.pathfinder.e3355332.00527f4e]: Expanding CHEBI:45783
arax_pathfinder  | [2026-02-04 16:02:07,301: INFO/shepherd.arax.pathfinder.e3355332.00527f4e]: Expanding MONDO:0004979

but nothing else before it was timed out after 5 minutes. I sent it Imatinib->Asthma.

mohsenht · 2026-02-04T18:52:46Z

Hi @maximusunc,

I changed the parameters to make it faster for now. I will get back to Shepherd-pathfinder and check it probably next week to figure out what the problem is.

…hfinder

maximusunc

When testing with Imatinib->Asthma, your Pathfinder is returning 0 paths. Is this intended?

maximusunc · 2026-02-16T17:38:47Z

workers/arax_pathfinder/worker.py

+    try:
+        start = time.perf_counter()
+        logger.info("Starting pathfinder.get_paths()")
+        result, aux_graphs, knowledge_graph = pathfinder.get_paths(


I'm not sure if your pathfinder code is asynchronous or not, but this call is blocking and so your pathfinder implementation can only handle one query at a time. Is this intended?

Hi @maximusunc

Could you please provide me your json query that you sent and got 0 paths?

Here is my query and I got result for this one.

{ "message": { "query_graph": { "nodes": { "n0": { "ids": [ "CHEBI:31690" ] }, "n1": { "ids": [ "MONDO:0004979" ] } }, "paths": { "p0": { "subject": "n0", "object": "n1", "predicates": [ "biolink:related_to" ], "constraints": [] } } } } }

It is now can handle multiple queries.

So now we have the opposite effect here. It looks like now we're trying to handle every query that comes in, and running this locally, the RAM and CPU usage of this worker shot way up and hit my docker limits. I think we need to tune this so that CPU and RAM stay reasonable. What do we think is reasonable @mohsenht @dkoslicki ?

I'm also a little concerned with the query that you sent that does return results. That Chebi curie is Imatinib, but it is not normalized. Are you doing some normalization in your pathfinder code somewhere?

The worker pool capped at 4, each query takes under 1GB of RAM.
Since requests already take 2 to 4 minutes, I'm worried that reducing the workers to save resources will make end users wait way too long.

I'm not sure I follow. What worker pool are you talking about? And how Shepherd is set up, it is made to horizontally scale, so if one arax_pathfinder worker can't handle all the requests coming in, Shepherd can just spin up another one. So when we build these individual workers, we should make sure that we understand where the bottlenecks are and pick a reasonable threshold for what it should be able to handle as a single worker, and then we can duplicate them at the kubernetes level to handle more load.

Ah, that makes sense regarding Shepherd's horizontal scaling. The local resource spike is coming from num_cores = min(multiprocessing.cpu_count(), 4) inside the pathfinder library.

I added that multiprocessing specifically to speed up path expansion and keep query times down to 2-4 minutes. If we restrict each worker to 1 core to reduce its footprint, individual queries will take significantly longer.

We have a trade-off: Are we okay with longer end-user wait times so Shepherd can scale smaller workers? Or should we set a higher baseline per worker (e.g., 4 cores) to keep queries fast, and let Shepherd scale those larger pods?

maximusunc · 2026-02-26T15:15:29Z

I ran some tests last night and ran into some issues. I was able to run your query and get back results, but then I tried sending 5 concurrent queries and while they all fired off, I got this error for all of them:

arax_pathfinder  | requests.exceptions.ConnectionError: HTTPSConnectionPool(host='kg2cplover3.rtx.ai', port=9990): Max retries exceeded with url: /query (Caused by NewConnectionError("HTTPSConnection(host='kg2cplover3.rtx.ai', port=9990): Failed to establish a new connection: [Errno 111] Connection refused"))

And then I tried backing off and just sending one query and got this error:

arax_pathfinder  | [2026-02-26 02:18:32,536: ERROR/shepherd.arax.pathfinder.d18b72ad.f07114e6]: Path MONDO:0004979MONDO:0011786 raised an exception: MySQL connection failed: 2003 (HY000): Can't connect to MySQL server on 'arax-databases-mysql.rtx.ai:3306' (111)
arax_pathfinder  | [2026-02-26 02:18:34,166: ERROR/shepherd.arax.pathfinder.d18b72ad.f07114e6]: Path CHEBI:31690NCBIGene:1544 raised an exception: MySQL connection failed: 2003 (HY000): Can't connect to MySQL server on 'arax-databases-mysql.rtx.ai:3306' (111)
arax_pathfinder  | [2026-02-26 02:18:34,186: ERROR/shepherd.arax.pathfinder.d18b72ad.f07114e6]: PathFinder failed to find paths between on and sn. Error message is: MySQL connection failed: 2003 (HY000): Can't connect to MySQL server on 'arax-databases-mysql.rtx.ai:3306' (111)

Now this doesn't seem to be an issue with Shepherd but more coming from these external services. So my follow up questions are:
Can these external services handle concurrent queries? If they can't, then your Pathfinder shouldn't either. Is your Pathfinder heavily CPU-bound? If so, then we will want to set up some multi-processing potentially.

mohsenht · 2026-02-26T15:30:26Z

PloverDB Concurrency: The error came from PloverDB, which is actually designed to handle thousands of requests in parallel for Pathfinder and other services. The KG2 team is currently working hard on its stability.

Pathfinder Performance: Pathfinder is both CPU-bound and IO-bound. It already uses multiprocessing in its core code to calculate rankings and expand nodes while building paths and trees.

The subsequent failure for the single query shows that the database connection to arax-databases-mysql was also down. I will ping KG2 team for this one.

Thanks Max

maximusunc · 2026-03-10T13:15:33Z

Hey @mohsenht is this ready for another review?

…hfinder # Conflicts: # workers/arax/worker.py

mohsenht · 2026-03-11T00:51:44Z

Hi @maximusunc

yes, ready to review

maximusunc

With these changes, I was able to start a pathfinder query, but I still got an error when connecting to kg2cplover3.rtx.ai.

workers/arax/worker.py

workers/arax_pathfinder/worker.py

mohsenht · 2026-03-11T16:12:52Z

Hi @maximusunc,

Since these changes mostly from your work in the main branch, could you please resolve these conflicts yourself?

maximusunc · 2026-03-11T16:26:37Z

Ok, I think code looks good now, just the external plover error is an issue.

…inder

mohsenht · 2026-03-11T17:25:22Z

I asked KGX group and it turned out the CI one is up and running again so I updated the PloverDB url to point to the CI one.

mohsenht · 2026-03-11T17:41:39Z

@maximusunc

Forgot to mention you :)

mohsenht · 2026-03-11T18:24:31Z

@maximusunc

I ran the branch and tested it for this query and it worked.

POST:
http://localhost:5439/arax/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": [
                        "CHEBI:31690"
                    ]
                },
                "n1": {
                    "ids": [
                        "MONDO:0004979"
                    ]
                }
            },
            "paths": {
                "p0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": [
                        "biolink:related_to"
                    ],
                    "constraints": []
                }
            }
        }
    }
}

mohsenht added 3 commits January 20, 2026 10:16

Using Pathfinder package with local sqlite files

cba10f7

Using Pathfinder package with mysql server

5539e3d

Settings for arax pathfinder

cf94d6d

mohsenht requested a review from maximusunc January 21, 2026 16:17

dkoslicki reviewed Jan 21, 2026

View reviewed changes

workers/arax/worker.py Outdated Show resolved Hide resolved

dkoslicki reviewed Jan 21, 2026

View reviewed changes

workers/arax_pathfinder/worker.py Outdated Show resolved Hide resolved

mohsenht added 4 commits January 21, 2026 13:55

Black style errors

4999dff

Black style errors

767e51b

New pathfinder package release update.

540f220

Inject shepherd-arax provenance in all edge sources field

cd4cf5a

Temporary faster pathfinder by decreasing parameters

ce1a5e6

mohsenht added 2 commits February 11, 2026 11:09

Merge branch 'main' of github.com:BioPack-team/shepherd into arax-pat…

7cd663a

…hfinder

Arax Pathfinder tested with 4 hops

8ec57a9

maximusunc requested changes Feb 16, 2026

View reviewed changes

Async Arax Pathfinder

82852c4

mohsenht added 2 commits March 2, 2026 22:47

Pathfinder package updated

6b537e5

Pathfinder package updated

9919306

mohsenht added 2 commits March 10, 2026 20:40

Merge branch 'main' of github.com:BioPack-team/shepherd into arax-pat…

ab7049e

…hfinder # Conflicts: # workers/arax/worker.py

resolved conflicts

4771f90

mohsenht added 2 commits March 10, 2026 20:55

resolved conflicts

107614b

resolved conflicts

268d226

maximusunc requested changes Mar 11, 2026

View reviewed changes

workers/arax/worker.py Outdated Show resolved Hide resolved

workers/arax/worker.py Outdated Show resolved Hide resolved

workers/arax_pathfinder/worker.py Outdated Show resolved Hide resolved

workers/arax_pathfinder/worker.py Outdated Show resolved Hide resolved

maximusunc added 2 commits March 11, 2026 12:24

Update to latest main code

c6b79c3

Run black

f98c6e9

mohsenht added 2 commits March 11, 2026 13:24

PloverDB url updated to point to CI

298aee5

Merge remote-tracking branch 'origin/arax-pathfinder' into arax-pathf…

17d4187

…inder

PRUNE more

70758cb

Conversation

mohsenht commented Jan 21, 2026

Uh oh!

codecov bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

maximusunc commented Feb 4, 2026

Uh oh!

mohsenht commented Feb 4, 2026

Uh oh!

maximusunc left a comment

Choose a reason for hiding this comment

Uh oh!

maximusunc Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

mohsenht Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

mohsenht Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohsenht Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maximusunc Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

maximusunc Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

mohsenht Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

maximusunc Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

mohsenht Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

maximusunc commented Feb 26, 2026

Uh oh!

mohsenht commented Feb 26, 2026

Uh oh!

maximusunc commented Mar 10, 2026

Uh oh!

mohsenht commented Mar 11, 2026

Uh oh!

maximusunc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mohsenht commented Mar 11, 2026

Uh oh!

maximusunc commented Mar 11, 2026

Uh oh!

mohsenht commented Mar 11, 2026

Uh oh!

mohsenht commented Mar 11, 2026

Uh oh!

mohsenht commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 21, 2026 •

edited

Loading

mohsenht Feb 18, 2026 •

edited

Loading

mohsenht Feb 18, 2026 •

edited

Loading