Skip to content

feat(security): implement strict upstream validation for URIs and pagination to mitigate SPARQL injection#37

Merged
wiresio merged 9 commits intoeclipse-thingweb:mainfrom
kaiprodevops:security/upstream-input-validation
Mar 31, 2026
Merged

feat(security): implement strict upstream validation for URIs and pagination to mitigate SPARQL injection#37
wiresio merged 9 commits intoeclipse-thingweb:mainfrom
kaiprodevops:security/upstream-input-validation

Conversation

@kaiprodevops
Copy link
Copy Markdown
Contributor

@kaiprodevops kaiprodevops commented Mar 28, 2026

Overview

As part of an ongoing architectural review to enhance TD Directories with MCP-driven capabilities, I conducted a security audit of the TDD API. While this audit was motivated by the need to build robust guardrails for future AI agents (which can unpredictably hallucinate malformed parameters), the vulnerabilities discovered are critical for the existing REST API security.

This PR implements a Zero-Trust upstream validation layer, effectively mitigating SPARQL Injection across multiple attack vectors, and resolves a sophisticated streaming architecture bypass.

Vulnerabilities Discovered

During Red Team penetration testing, I identified two critical injection vectors:

  1. URI & Identifier Injection: Unsanitized URIs were being directly interpolated into SPARQL templates. An attacker could craft a URI with structural characters (e.g., > } ;) to break the Abstract Syntax Tree (AST) and execute unauthorized administrative commands like DROP GRAPH.
  2. Pagination Injection & Generator Bypass: The sort_order parameter was vulnerable to injection. More critically, I discovered an architectural flaw: because the GET /things endpoint uses a Python streaming generator (yield), validation occurring inside the database layer was executed after Flask had sent the initial HTTP headers. This bypassed the global error handler entirely, causing the WSGI server to crash mid-stream and leak raw HTML 500 Internal Server Error traces.
  3. Race Condition in Concurrent TD Retrieval: The get_paginated_tds() function used ThreadPoolExecutor to fetch TDs concurrently but failed to wait for all tasks to complete before returning results.
  4. UTF-8 Encoding Issue on Windows: The subprocess calling Node.js for JSON-LD framing used the system default encoding (cp1252 on Windows), causing UnicodeDecodeError when processing TDs with international characters. This resulted in silent failures for TDs containing UTF-8 characters.

The Fix: "Shift-Left" Validation + Concurrency Hardening

To address this without disrupting the core business logic, I implemented the following:

  • Centralized Sanitization (tdd/validators.py): Created a dedicated validation module using strict RFC 3986-compliant Regex for URIs and explicit allowlists (ASC/DESC) for pagination.
  • Controller-Level Interception: Shifted the pagination validation upstream directly into the routing layer (tdd/__init__.py). By validating parameters before the generator is instantiated, we completely eliminated the lazy evaluation bypass.
  • Structured Error Handling (tdd/errors.py): To ensure robust and consistent error reporting, I implemented a dedicated SecurityValidationErrorclass, wired the new validators to trigger this specific exception, ensuring that malicious inputs are elegantly caught and converted into structured JSON-LD 400 Bad Request responses, maintaining the API's contract consistency.
  • Concurrency Fix (tdd/td.py): Added explicit task completion waiting using concurrent.futures.as_completed() to ensure all concurrent TD retrieval tasks finish before returning results. This maintains the parallel execution performance while guaranteeing data integrity.
  • Encoding Fix (tdd/common.py): Added explicit encoding='utf-8' parameter to the subprocess call, ensuring consistent UTF-8 handling across all platforms, particularly Windows.

Red Team Test Results (Proof of Concept)

I tested the endpoints using a custom-crafted AST breakout payload(CONSTRUCT { ?s ?p ?o } WHERE { GRAPH urn:test { ?s ?p ?o } } ; DROP SILENT GRAPH ; #> { ?s ?p ?o } }):
urn:test%3E%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D%20%7D%20%3B%20DROP%20SILENT%20GRAPH%20%3CALL%3E%20%3B%20%23

  • Before the Fix: The payload successfully reached the database, crashing the Fuseki parser and returning an unhandled 500 error (or crashing the stream for pagination).
  • After the Fix: The API gateway intercepts the structural characters immediately, returning a safe, structured 400 Bad Request.

[Before]
Screenshot 2026-03-27 234139

[After]
1
2

@kaiprodevops
Copy link
Copy Markdown
Contributor Author

Hi @wiresio ,
I’ve just submitted this PR to implement a strict "Shift-Left" upstream validation layer for the TDD API.

A major driver for this architectural update is laying the groundwork for Enhancing TD Directories with MCP-driven Capabilities. Since future LLM agents interacting via MCP can unpredictably hallucinate parameters or inject structural characters, it was critical to secure the API boundary. This upstream validation ensures that malformed tool calls are intercepted before reaching the SPARQL engine or Python generators, returning a structured 400 Bad Request to enable LLM self-correction rather than causing a 500 Internal Server Error crash.

Additionally, this PR introduces a workaround for the RDFlib Blank Node (BNode) pagination data loss and fixes the Windows UTF-8 subprocess encoding issue.

I would be incredibly grateful for your feedback whenever you have a chance to look at this. Please let me know if anything needs to be changed. Thanks again for maintaining such an amazing project!

@wiresio
Copy link
Copy Markdown
Member

wiresio commented Mar 30, 2026

Thanks @kaiprodevops! Please allow me some time to have a close look.

@wiresio
Copy link
Copy Markdown
Member

wiresio commented Mar 30, 2026

Hi @kaiprodevops, here is my Claude powered feedback:

The changes are directionally correct and address real SPARQL injection risks. The approach (allowlist at the trust boundary before interpolation into SPARQL templates) is sound. However, there are several issues worth raising:

IMHO it is great to prepare the API before making use of it in an MCP server!

Issues found

  1. No tests for the new validators (high priority)
    validators.py is entirely untested. Given this is security-critical code, unit tests should cover:
  • Valid URIs passing through validate_uri
  • URIs containing <, >, {, }, spaces being rejected
  • validate_sort_order with "asc", "ASC", "Desc", empty string, and invalid values
  • validate_uris with mixed valid/invalid lists
  1. Malicious input echoed in error messages and logs
    In validators.py:28:
logger.warning(f"SECURITY ALERT: Malformed or unsafe URI blocked: {uri}")
raise SecurityValidationError(f"Malformed or unsafe URI detected: {uri}")
  • Log injection: A crafted URI containing \n can corrupt log entries.
  • Information leakage: The raw (attacker-controlled) URI is included in the HTTP 400 response body, enabling attackers to probe the allowlist efficiently.
    Recommendation: log a sanitized/truncated/repr of the URI, and return a generic error message without the input.
  1. Double validate_sort_order call
    validate_sort_order is called in the route handler in tdd/init.py, and then again inside get_paginated_tds in td.py:355. The second call is redundant. Additionally, init.py calls .lower() on the already-uppercase result of validate_sort_order, which is inconsistent (though harmless functionally).

  2. validate_uri applied to database-sourced URIs
    In sparql.py:249 (delete_named_graph) and td.py:271 (delete_graphs), validate_uri is called on URIs that were retrieved from the SPARQL store — not from user input. If any legitimate stored graph URI contains a character outside the allowlist (e.g. a fragment # combined with percent-encoding anomalies, or a URN), these operations will break silently with a 400 rather than propagating a meaningful error. The validate_uri guard should be applied at the external trust boundary (request parameters, TD payload), not on round-tripped database values.

  3. int() cast in get_paginated_tds is redundant
    td.py:359-360's safe_limit = int(limit) / safe_offset = int(offset) are already done in init.py before calling get_paginated_tds. The redundancy is harmless but adds noise. If the intent is to protect all callers, a ValueError here would surface as an unhandled 500 rather than a clean 400.

  4. Thread-safety of all_tds list (pre-existing, but not fixed)
    In td.py:384, send_request appends to all_tds from multiple threads without a lock. The PR adds correct exception propagation via task.result() (a genuine fix), but the shared mutation is still present. In CPython this works due to the GIL, but it's not guaranteed behavior. Consider returning results from the futures directly instead of using a shared list.

So, maybe despite 1., only minor changes needed for this PR.

@kaiprodevops
Copy link
Copy Markdown
Contributor Author

Hi @wiresio ,
Thank you so much for taking the time to provide such a detailed and insightful review.
I will implement these fixes and ping you once the updated commits are ready for another look. Thanks again for the excellent guidance!

Signed-off-by: kaiprodev <warmtigerca@gmail.com>
Signed-off-by: kaiprodev <warmtigerca@gmail.com>
Signed-off-by: kaiprodev <warmtigerca@gmail.com>
@kaiprodevops
Copy link
Copy Markdown
Contributor Author

Hi @wiresio,
quick update: all suggested fixes are in and the latest commits are pushed. Appreciate another review when you get a chance.

@wiresio wiresio merged commit 623998e into eclipse-thingweb:main Mar 31, 2026
3 checks passed
@wiresio
Copy link
Copy Markdown
Member

wiresio commented Mar 31, 2026

Thanks @kaiprodevops, looks really good now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants