Part of ToxMCP Suite -> https://github.com/ToxMCP/toxmcp
Public MCP endpoint for EPA Computational Toxicology (CompTox) evidence federation. Expose chemical identity, hazard, exposure, bioactivity, metadata, screening-prioritization summaries, contract-manifest discovery, and cross-suite handoff builders to any MCP-aware agent (Codex CLI, Gemini CLI, Claude Code, etc.).
flowchart LR
subgraph Clients["Clients and Agents"]
Codex["Codex CLI / Desktop"]
Gemini["Gemini CLI"]
Claude["Claude Code"]
Scripts["Scripts / notebooks"]
end
subgraph API["FastAPI MCP Service"]
Router["HTTP + WebSocket entrypoints\n/healthz, /readyz, /mcp, /mcp/ws"]
Registry["Tool registry\ninputSchema + outputSchema"]
Tools["Tool handlers\nretrieval, validation, handoff"]
end
subgraph Evidence["Tier-0 Evidence and Federation Layer"]
Chemical["Chemical identity"]
Hazard["Hazard datasets"]
Exposure["Exposure + HTTK"]
Bioactivity["Bioactivity + AOP link-outs"]
Metadata["Model cards + applicability"]
Prioritization["Screening prioritization\nAED / exposure signals"]
Manifest["Contract manifest\nresources / tools / schemas"]
Interop["Portable evidence packs\nAOP / PBPK handoff builders"]
end
subgraph Contracts["Contract and Artifact Layer"]
McpSchemas["MCP response schemas\n/docs/contracts/schemas"]
Portable["Portable object schemas\n/schemas"]
Tests["Catalog, schema, and handoff tests"]
end
subgraph Upstream["Upstream Sources"]
CTX["EPA CTX APIs"]
Bundles["Packaged metadata bundles"]
end
Clients --> Router
Router --> Registry
Registry --> Tools
Tools --> Chemical
Tools --> Hazard
Tools --> Exposure
Tools --> Bioactivity
Tools --> Metadata
Tools --> Prioritization
Tools --> Manifest
Tools --> Interop
Chemical --> CTX
Hazard --> CTX
Exposure --> CTX
Bioactivity --> CTX
Metadata --> Bundles
Tools --> McpSchemas
Interop --> Portable
McpSchemas --> Tests
Portable --> Tests
The current implementation follows a layered model:
FastAPI + JSON-RPCexpose/mcpand/mcp/ws, with/healthzand/readyzkept separate from domain logic.Retrieval resourcesown CompTox-native evidence access for chemical, hazard, exposure, bioactivity, cheminformatics, and metadata.Screening prioritizationstays separate from interop builders and emits explicitly caveated AED/exposure prioritization summaries instead of final risk decisions.Contract manifestpublishes the live public catalog plus schema inventory in machine-readable form for downstream MCP consumers.Interop toolspackage portable evidence objects for downstream MCP consumers without cloning AOP OECD semantics or PBPK execution semantics.Contract layersare split intentionally:docs/contracts/schemas/for MCP response wrappers,schemas/for cross-suite portable evidence objects.Regression gateskeep README, live discovery, published schemas, and AOP/PBPK handoff fixtures aligned before release.
This release focuses on audit hardening, privacy controls, provenance capture, and workflow governance in response to the ToxMCP internal audit review. No public MCP boundary changes were made.
- Deterministic audit hashing: every audit event emitted to a registered sink now carries tamper-evident metadata (
contentHash,previousHash,sequence,timestamp) and can be verified withaudit.verify_event_hash(). - Sensitive identifier scrubbing: audit logs no longer record raw DTXSID, CASRN, SMILES, InChI, or InChIKey values in plaintext. Identifiers are hashed with a deterministic salt so the same value maps to the same hash across events.
- Privacy-aware parameter logging:
MCPServer._scrub_params_for_audit()inspects tool parameters before audit emission and automatically hashes chemical identifiers and identifier-like query strings.
- Response hash capture:
BaseResourcenow records a SHA-256 hash of every serialized upstream response alongsideretrieved_atandretry_countviaget_last_provenance(). - Bundle chain integrity:
AuditBundleStore.save()links each persisted bundle to the previous bundle hash.verify_chain()detects tampering, checksum mismatches, or missing files. - Distributed trace propagation: HTTP transport extracts or generates a W3C-style
traceIdfrom thetraceparentheader and passes it through tool execution audit events. - Runtime provenance envelope: every orchestrator bundle now includes a
provenancesection withserverVersion,runtimeEnvironment,traceId,createdAt, andupstreamProvenance.
- AD hard-gating by default:
GenRAOrchestrator.run_workflow()now defaultsrequire_ad_clearancetoTruewhen predictive tasks are present. Callers who explicitly passrequireAdClearance=Falseremain unaffected. - Clearer denied vs error semantics: hard applicability-domain failures now set bundle
statusto"denied", while generic predictive errors continue to map to"error". - Advisory review checkpoints: every bundle now includes
reviewCheckpointsmetadata (chemical_id_confirmation,ad_assessment,final_report) to seed future pause/approve UX without breaking synchronous callers.
- Added
test_audit_hardening.py,test_audit_privacy.py,test_provenance_capture.py,test_trace_propagation.py,test_bundle_provenance.py, andtest_orchestrator_ad_gating.pyto cover the new controls end-to-end.
- Published a clean
v0.2.2patch-release layer over the already-shipped0.2.1public-surface hardening work, without changing the default MCP boundary. - Aligned README, architecture notes, changelog, and release metadata around the current patch version so downstream users see one coherent release story.
- Kept the protected-branch release path CI-clean by removing broken docs-link assumptions and normalizing formatting/import ordering across the touched Python and test files.
- Left the public server role unchanged: CompTox MCP remains an evidence-federation and screening-prioritization surface, while predictive and orchestrator modules stay experimental.
See the full release notes in docs/releases/v0.2.2_release_description.md.
The portable CompTox handoff objects are now published as machine-readable JSON Schemas under schemas/, with matching examples under schemas/examples/.
Published object family:
schemas/chemicalIdentityRecord.v1.jsonschemas/hazardEvidenceSummary.v1.jsonschemas/exposureEvidenceSummary.v1.jsonschemas/bioactivityEvidenceSummary.v1.jsonschemas/aopLinkageSummary.v1.jsonschemas/pbpkContextBundle.v1.jsonschemas/comptoxEvidencePack.v1.json
Design intent:
- keep the stable core fields required and allow additive convenience fields
- keep AOP OECD normalization outside CompTox MCP
- keep PBPK execution, qualification, and internal exposure objects outside CompTox MCP
- make the portable evidence layer consumable by downstream validators and orchestrators without scraping examples out of tests
See schemas/README.md, tests/test_portable_schemas.py, and tests/test_cross_suite_handoffs.py for the maintainer gates that keep published objects aligned with live payload generation.
Regulatory and research teams rely on the CompTox API for high-quality chemical, exposure, and hazard data. Traditional workflows involve bespoke scripts or manual dashboard exports that are hard to share with AI copilots.
The EPA CompTox MCP server wraps those workflows in a secure, programmable interface:
- One MCP surface (
/mcpHTTP +/mcp/wsWebSocket) delivers discovery and execution across chemical, bioactivity, exposure, hazard, metadata, interop, and supporting utility catalogues. - Screening prioritization adds a separate, caveated signal-ranking path built from CompTox AED and exposure sources without claiming final NGRA decisions.
- Contract manifest discovery exposes the live public resources, tools, and schema inventory so downstream MCPs do not need to scrape docs to integrate safely.
- Evidence federation role – CompTox acts as the suite's source-grounded evidence ingress layer for downstream AOP, PBPK, O-QT, and orchestration workflows.
- Guardrails + provenance – JSON Schema validation, metadata attachments, transport audit hooks, and signed release attestations improve downstream reproducibility.
- Agent friendly – tested with Codex CLI, Gemini CLI, and Claude (see integration guide).
Experimental predictive and orchestrator components still exist in this repository, but they are not part of the default public MCP tool catalog exposed by the server today.
| Capability | Description |
|---|---|
| 🌐 Dual MCP Transports | JSON-RPC over HTTP (/mcp) and WebSocket (/mcp/ws) with identical tool catalogues. |
| 🧬 CompTox Tooling | Chemical, bioactivity, exposure, hazard, metadata, and supporting utility helpers mapped to structured MCP tools. |
| 🔗 Evidence Federation | Designed as the suite's Tier-0 evidence ingress layer, packaging source-grounded CompTox outputs for downstream consumers. |
| 🛡️ Guardrail Enforcement | JSON Schema response validation, metadata attachments, audit hooks, and transport safety controls improve reproducibility. |
| ⚙️ Configurable by Design | Pydantic settings with .env support for API keys, retries, auth bypass, transport tuning, and observability. |
| 🤖 Agent Ready | Verified with Codex CLI, Gemini CLI, and Claude Code; includes quick-start config snippets. |
- Architecture
- Published schemas
- Quick start
- Release verification
- Configuration
- Tool catalog
- Running the server
- Integrating with coding agents
- Output artifacts
- Security checklist
- Current limitations
- Development notes
- Contributing
- Security policy
- Support
- Code of conduct
- Citation
- Roadmap
- License
# 1) install
git clone https://github.com/ToxMCP/comptox-mcp.git
cd comptox-mcp
pip install -e .
# 2) configure
cp .env.example .env
# set CTX_API_KEY in .env
# 3) run
uvicorn epacomp_tox.transport.websocket:app --host 0.0.0.0 --port 8000 --reload
# 4) verify
curl -s http://localhost:8000/healthz | jq .
curl -s http://localhost:8000/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq '.result.tools | length'git clone https://github.com/ToxMCP/comptox-mcp.git
cd comptox-mcp
pip install -e .
cp .env.example .env
uvicorn epacomp_tox.transport.websocket:app --reloadImportant: The server needs a valid EPA CompTox API key. Set
CTX_API_KEY(preferred) orEPA_COMPTOX_API_KEYin.envbefore starting the transport.
With the server running, MCP clients can connect to http://localhost:8000/mcp (HTTP) or ws://localhost:8000/mcp/ws (WebSocket).
Once the server is running:
- HTTP MCP endpoint:
http://localhost:8000/mcp - WebSocket MCP endpoint:
ws://localhost:8000/mcp/ws - Health check:
http://localhost:8000/healthz - Readiness check:
http://localhost:8000/readyz - Architecture docs:
docs/architecture_overview.md - Contract docs:
docs/contracts/README.md - Release verification guide:
docs/releases/release_artifact_verification.md
Once the server is running:
# health
curl -s http://localhost:8000/healthz | jq .
# list MCP tools
curl -s http://localhost:8000/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq '.result.tools | length'
# live interop smoke
python scripts/mcp_interop_smoke.py --endpoint http://localhost:8000/mcp --json
# release-oriented smoke
python scripts/release_smoke.py --endpoint http://localhost:8000/mcp --jsonFor published GitHub releases, signed provenance/SBOM attestation verification is documented in docs/releases/release_artifact_verification.md.
Settings are resolved via pydantic-settings with .env/.env.local support. Key environment variables:
| Variable | Required | Default | Description |
|---|---|---|---|
CTX_API_KEY |
✅ | – | CompTox API key used for all downstream requests. Fallbacks: EPA_COMPTOX_API_KEY, ctx_x_api_key. |
CTX_API_BASE_URL |
Optional | https://comptox.epa.gov/ctx-api |
Base URL for CompTox API. |
CTX_USE_LEGACY |
Optional | 0 |
Set to 1 to use the legacy https://api-ccte.epa.gov endpoint. |
CTX_RETRY_ATTEMPTS |
Optional | 3 |
Number of retry attempts for transient errors. |
CTX_RETRY_BASE |
Optional | 0.5 |
Base sleep (seconds) used in exponential backoff. |
ENVIRONMENT |
Optional | development |
Controls defaults like permissive CORS. |
LOG_LEVEL |
Optional | INFO |
Application log level. |
BYPASS_AUTH |
Optional | 0 |
Set to 1 to disable auth (development only). |
CORS_ALLOW_ORIGINS |
Optional | – | Comma-separated origins for HTTP transport. Defaults to * in development. |
EPACOMP_MCP_HEARTBEAT_TIMEOUT_SECONDS |
Optional | 120 |
Minimum heartbeat timeout negotiated with WebSocket clients. |
EPACOMP_MCP_HANDSHAKE_TIMEOUT_SECONDS |
Optional | 30 |
Minimum handshake timeout negotiated with WebSocket clients. |
EPACOMP_MCP_METRICS_ENABLED |
Optional | 1 |
Toggle /metrics endpoint exposure. |
See docs/deployment.md for production hardening tips and expanded configuration.
| Category | Highlight tools | Notes |
|---|---|---|
| Chemical discovery | search_chemical, batch_search_chemical, resolve_chemical_identifier, get_chemical_details |
Resolve identifiers deterministically, inspect ambiguous matches, and fetch structures/details with CTX retry/backoff baked in. |
| Bioactivity & AOP link-outs | search_bioactivity_terms, get_bioactivity_summary_by_dtxsid, get_bioactivity_aop |
Surface ToxCast/Tox21 summaries, assay metadata, and AOP crosswalks from CompTox bioactivity APIs. |
| Exposure & hazard | search_cpdat, search_httk, search_hazard, get_hazard_toxval |
Batch-normalized access to CTX exposure datasets plus granular hazard endpoints (ToxValDB, ToxRefDB, cancer, genetox, ADME/IVIVE, IRIS, PPRTV, HAWC). |
| Screening prioritization | prioritize_risk_signals |
Build an explicitly caveated screening-priority summary from AED, SEEM, HTTK, MMDB, and CPDat signals without presenting it as a regulatory risk decision. |
| Contract manifest | get_contract_manifest |
Publish a machine-readable inventory of the live public resources, tools, MCP response schemas, portable schemas, and boundary notes. |
| Metadata & governance | metadata_get_model_card, metadata_list_applicability_domain, metadata_get_applicability_domain |
Fetch model cards, applicability-domain policies, and explicit guardrail metadata describing documented vs locally enforced criteria. |
| Interop handoff builders | assemble_comptox_evidence_pack, build_aop_linkage_summary, build_pbpk_context_bundle |
Package portable evidence objects and downstream-ready handoff summaries for AOP and PBPK MCP consumers without duplicating their semantics. |
| Utility helpers | opsin_convert_name, indigo_convert_molfile |
Provide supporting conversions for downstream automations. |
The default server currently registers ten public resources: chemical, bioactivity, exposure, hazard, chemical list, cheminformatics, metadata, interop, prioritization, and manifest. Full schema definitions (input and output) are returned via the MCP tools/list call. See tests/test_resources.py for examples of exercising each category.
The repository also contains predictive and orchestrator code under src/epacomp_tox/predictive/ and src/epacomp_tox/orchestrator/. Treat those modules as experimental until they are registered in the default server, documented as part of the canonical tool catalog, and backed by stable public response contracts.
# install and start the dual-transport server
pip install -e .
uvicorn epacomp_tox.transport.websocket:app --host 0.0.0.0 --port 8000 --reloadThe FastAPI app exposes both transports:
- HTTP JSON-RPC:
http://localhost:8000/mcp - WebSocket JSON-RPC:
ws://localhost:8000/mcp/ws
Quick handshake + tool discovery via HTTP:
curl -s http://localhost:8000/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{}}}'
curl -s http://localhost:8000/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list"}' | jq '.result.tools | length'Validate the hazard suite once transports are online:
# Bisphenol A toxval summary (expect a 40 mg/kg-day NOEL among the records)
curl -s http://localhost:8000/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"search_hazard","arguments":{"data_type":"toxval","dtxsid":"DTXSID7020182","summary":true}}}' | jq '.result.structuredContent.data[0]'
# Perfluorooctanoic acid cancer classification (expect CalEPA and IARC calls)
curl -s http://localhost:8000/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"search_hazard","arguments":{"data_type":"cancer","dtxsid":"DTXSID8031865","summary":true}}}' | jq '.result.structuredContent.data'Bisphenol A should return HESS and HPVIS toxicity values (including the 40 mg/kg-day NOEL), while Perfluorooctanoic acid surfaces the ATSDR MRL alongside CalEPA and IARC cancer classifications. Errors typically indicate missing API credentials or upstream CompTox outages; inspect the returned metadata for rate-limit status when troubleshooting.
Before exposing the MCP server, run the endpoint checker to verify the upstream CompTox APIs are reachable:
python scripts/check_endpoints.py
# add --json for machine-readable outputThe script pings each endpoint listed in docs/contracts/endpoint-matrix.md and reports latency plus HTTP status. Provide CTX_API_KEY/EPA_COMPTOX_API_KEY in the environment to avoid 401/403 responses.
A scheduled GitHub Action (.github/workflows/endpoint-check.yml) runs python scripts/check_endpoints.py --json every day at 06:00 UTC using the CTX_API_KEY secret. The workflow uploads endpoint_status.json as an artifact so operators can review upstream availability without rerunning the checker locally. Maintainers can also trigger the workflow for a specific pull request by applying the run-endpoint-check label (the job only executes for internal branches so secrets stay protected).
- Run via Gunicorn:
gunicorn epacomp_tox.transport.websocket:app -c deploy/gunicorn_conf.py - Container image: see
deploy/Dockerfilefor a hardened, non-root runtime. - Probes:
/healthz(liveness) and/readyz(performs CTX connectivity check). Non-200 responses should trigger restarts. - Metrics:
/metricsexposes Prometheus gauges derived fromMCPServer.get_transport_metrics(). Sample scrape/OTEL configs live indeploy/prometheus_scrape.yamlanddeploy/otel_collector_metrics.yaml. - Additional rollout guidance (TLS, ingress, scaling) lives in
docs/deployment.md.
The repository includes step-by-step instructions in docs/integration_guides/mcp_integration.md. Highlights:
- Codex CLI: add an HTTP provider pointing to
http://localhost:8000/mcpwith theAuthorization: Bearer <token>header when auth is enabled. - Gemini CLI: configure the provider transport to
httpwith the same endpoint and optional headers. - Claude Code / Cursor: update the MCP provider JSON to point to the HTTP endpoint; WebSocket is optional when streaming events are required.
Each guide covers tool listing, sample calls, binary payload handling, and troubleshooting tips (timeouts, auth failures, unexpected 4xx responses).
Every successful tool invocation returns structured payloads designed for agents:
content: human-readable JSON wrapped as text for chat surfaces.structuredContent.data: machine-readable results (lists, dicts, or arrays) for programmatic chaining.structuredContent.metadata: when available, includes rate-limit information, validation metadata, and session metadata.- Default registered tools are retrieval and federation oriented; experimental predictive/orchestrator modules in this repository are not part of the canonical public surface yet.
- Disable
BYPASS_AUTHand front the MCP server with OAuth/OIDC once deployed beyond local development. - Restrict
CORS_ALLOW_ORIGINSto approved hosts when exposing the HTTP transport. - Rotate
CTX_API_KEYregularly and store secrets outside the repository (e.g. cloud secret manager or OS keychain). - Monitor
/metricsfor negotiated capability changes and unexpected spikes intools/callfailures. - Enable HTTPS/TLS at the ingress or reverse proxy layer.
- Keep GitHub branch protection, dependency review, and CodeQL scanning enabled on the canonical repository.
- Pin GitHub Actions workflows to immutable commit SHAs and update them intentionally during maintenance windows.
- Generate and retain a CycloneDX SBOM for release artifacts so downstream consumers can audit package composition.
- Publish signed provenance and SBOM attestations for release artifacts so consumers can verify what was built and released.
- Follow coordinated vulnerability disclosure guidance in
SECURITY.md.
┌────────────────┐ ┌────────────────────────────┐ ┌──────────────────────┐
│ MCP Client │ MCP │ FastAPI App │ MCP │ CompTox Resources │
│ (CLI / IDE) │──────▶│ HTTP (/mcp) & WS (/mcp/ws) │──────▶│ • chemical │
└────────────────┘ │ • tool registry │ │ • bioactivity │
│ │ • JSON-RPC dispatch │ │ • exposure / hazard │
▼ │ • response validation │ │ • metadata / interop │
└────────────────────────────┘ │ • utility catalogs │
└──────────────────────┘
- Applicability-domain definitions, policy defaults, and remediation steps live under
metadata/with JSON Schema validation. - Response contracts live under
docs/contracts/schemas/(seedocs/contracts/README.md) and are enforced before MCP responses are returned; upstream failover policies are summarized indocs/contracts/endpoint-matrix.md. - Experimental predictive/orchestrator modules remain in-repo design and implementation assets; they are not part of the default public tool catalog until explicitly registered and documented.
tests/test_mcp_conformance_suite.pycovers handshake, catalog discovery, and streaming behaviours.tests/test_tool_contracts.pyenforces output schema declarations for the registered resources.scripts/smoke_ctx.shruns integration smoke tests against the live CTX API.scripts/mcp_http_smoke.shperforms a quick JSON-RPC handshake and tool listing against the HTTP transport.scripts/mcp_interop_smoke.pyvalidates the public interop tool path end-to-end over the HTTP transport.scripts/release_smoke.pyexercises authenticated readiness, manifest discovery, deterministic identifier resolution, screening prioritization, interop builders, and WebSocket parity in one release-oriented pass..github/workflows/live-interop-smoke.ymlruns the interop smoke path in GitHub Actions on demand or on a weekly schedule whenCTX_API_KEYis configured.- Documentation builds (
scripts/build_docs.sh) and CI workflows keep diagrams and links healthy. - Experimental predictive/orchestrator suites remain valuable internal regression coverage, but they should not be presented as canonical public-surface checks.
- Completed:
v0.2.3release cleanup — audit hardening, privacy controls, provenance capture, and workflow governance. - Completed:
v0.2.2release cleanup - Completed:
v0.2.1stabilization and the matchingv0.2.1release description v0.2.4should focus on CI/release automation (promotescripts/release_smoke.pyand live interop capture into repeatable GitHub Actions), documentation polish around deterministic identifier usage, and expanding screening prioritization signals without overstating risk semantics.- Revisit predictive/orchestrator publication only after the default server, contracts, and docs all agree.
- Predictive and orchestrator code still exists in-repo, but it is not part of the default public MCP tool catalog.
- CompTox MCP publishes AOP linkage summaries, but OECD-style mechanistic normalization still belongs in
aop-mcp. - CompTox MCP publishes PBPK context bundles, but PBPK execution, qualification, uncertainty synthesis, and internal exposure objects still belong in
pbpk-mcp. - BER logic, stop/continue/refine policy, and final NGRA decisions remain out of scope for this server.
- Live evidence retrieval still depends on upstream CTX availability and API credentials.
See CONTRIBUTING.md for development workflow, coding standards, and PR expectations.
See SECURITY.md for coordinated disclosure guidance and supported reporting channels.
See SUPPORT.md for public support, bug-reporting, and non-security guidance.
See CODE_OF_CONDUCT.md for collaboration expectations across the project and suite.
If you use this project in research or derived tooling, please cite:
- Ivo Djidrovski. BioRxiv preprint. DOI / link: 10.64898/2026.02.06.703989v1
This project is licensed under the Apache License 2.0. See LICENSE for details.
- EPA's Center for Computational Toxicology and Exposure (CCTE)
- The ctx-python project for the official CompTox Python bindings
- The Model Context Protocol community for defining the automation surface we target