perf(core): implement HTTP connection pooling for SPARQL endpoint#36
Conversation
Replaced the localized 'with httpx.Client() as client:' blocks with a global module-level HTTP client. This eliminates the overhead of establishing a new TCP handshake and TLS negotiation for every database query, preparing the API for high-frequency MCP Agent workloads.
|
Thanks @kaiprodevops for addressing this issue. Connection pooling is a good idea, however Claude gave me some advice:
What do you think? |
…security best practices per review
|
Thank you @wiresio for the excellent review and constructive feedback. I completely agree with the points raised and have updated the PR accordingly.
I have just pushed the requested changes. Please let me know if everything looks good to merge! |
|
Thanks @kaiprodevops! |
Description
This PR addresses a critical performance bottleneck in the core SPARQL communication layer (
tdd/sparql.py) by replacing the localized HTTP context managers with a globally pooled HTTP client.Changes Included:
with httpx.Client() as client:blocks inside thequeryfunction with a persistent, module-levelhttp_clientinstance.TODOregarding SPARQL injection to clarify that sanitization is the architectural responsibility of upstream input validators, not the HTTP transport layer.Motivation & Impact
While this refactoring is a prerequisite for the upcoming Model Context Protocol (MCP) integration—where AI Agents will execute complex reasoning loops involving multiple rapid, sequential queries to the graph database—it also delivers massive immediate benefits to the existing REST API.
Previously, every single CRUD operation via the REST API forced the application to establish a new TCP 3-way handshake and TLS negotiation. By leveraging HTTP Keep-Alive through connection pooling, we eliminate this overhead entirely for subsequent queries, drastically reducing the Time-to-First-Byte (TTFB) and overall system latency for all current users.
Security & Stability Considerations
To ensure that introducing a global, long-lived HTTP client does not expand our attack surface or introduce instability, the
http_clienthas been strictly hardened:httpx.Limits(max_keepalive_connections=50, max_connections=100)to ensure high-concurrency spikes do not drain server file descriptors.Timeout(10.0, connect=5.0)to prevent stale connections from blocking application worker threads.trust_env=Falseto prevent potential Environment Variable Pollution (e.g., maliciousHTTP_PROXYinjection) from hijacking internal database traffic.follow_redirects=Falseto ensure the client strictly communicates with the known SPARQL endpoint and cannot be tricked into pivoting to other internal services.Testing Performed
pytest tdd/tests/) -> All 41 tests passed successfully.docker-composeenvironment and verified that high-frequency data seeding (viascripts/import_all_plugfest.py) and basic REST API CRUD operations execute flawlessly without socket hangs or connection drops.