diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..3cce0fb5 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,91 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +NameRes (Name Resolver) is a biomedical entity name resolution service that maps lexical strings to CURIEs from vocabularies/ontologies. It is part of the NCATS Translator ecosystem. The API is built with FastAPI and backed by Apache Solr. + +## Commands + +### Running Tests +```bash +# Load test data into Solr first (requires Solr running) +./data-loading/setup-and-load-solr.sh tests/data/test-synonyms.json + +# Run all tests +python -m pytest tests/ + +# Run a single test +python -m pytest tests/test_service.py::test_function_name +``` + +### Running Locally +```bash +# Start full stack (Solr + API) +docker-compose up + +# Run API directly (requires Solr already running) +uvicorn api.server:app --host 0.0.0.0 --port 2433 +``` + +### Data Loading +```bash +# Download synonyms, start Solr, load data (production) +cd data-loading && make all + +# Individual steps +make start-solr-backup +make check-solr-backup +make stop-solr +``` + +### Dependencies +```bash +pip install -r requirements.txt +``` + +## Architecture + +### Request Flow +1. Client sends query to FastAPI endpoint on port 2433 +2. `api/server.py` constructs Solr query parameters +3. `httpx` async client queries the Solr instance (default: `localhost:8983`) +4. Results are scored, normalized, and returned as JSON + +### Key Files +- `api/server.py` - Core FastAPI application (~717 lines): all endpoints, Pydantic models, Solr query construction, environment config +- `api/apidocs.py` - Custom OpenAPI schema construction +- `api/resources/.openapi.yml` - OpenAPI 3.0.2 spec with service metadata +- `main.py` / `main.sh` - WSGI/ASGI entry points (port 2433) +- `tests/test_service.py` - Integration tests using FastAPI `TestClient` +- `tests/data/test-synonyms.json` - Test dataset for Solr + +### Environment Variables +- `SOLR_HOST` / `SOLR_PORT` - Solr connection (default: `localhost:8983`) +- `LOGLEVEL` - Logging level +- `SERVER_ROOT` - API root path prefix +- `MATURITY_VALUE` / `LOCATION_VALUE` - TRAPI metadata fields + +### API Endpoints +- `GET/POST /lookup` - Primary name-to-CURIE lookup with scoring +- `POST /bulk-lookup` - Batch queries via `NameResQuery` model +- `GET /reverse-lookup` - CURIE-to-names lookup +- `POST /synonyms` - Get synonyms for a list of CURIEs +- `POST /lookup-curies` - Filter existing CURIEs with type subsetting +- `GET /status` - Health check with Solr document counts + +### Data Model +Solr documents contain: `curie`, `preferred_name`, `names` (synonym list), and biolink type information. Lookup results are `LookupResult` objects with scoring fields. Results are conflated using GeneProtein and DrugChemical conflation rules. + +### Infrastructure +- **Stateless API container** - Python 3.11.5/FastAPI +- **Persistent Solr 9.1** - Data in volume-mounted `./data/solr` +- **Data loading** - Separate pipeline in `data-loading/` (Makefile-driven, also has Kubernetes configs) +- **CI/CD** - GitHub Actions: runs tests on push, publishes Docker image to GitHub Packages on release + +## Documentation +- `documentation/API.md` - Endpoint reference +- `documentation/Deployment.md` - Docker/Kubernetes deployment guide +- `documentation/Scoring.md` - Scoring algorithm details +- `documentation/NameResolution.ipynb` - Interactive usage examples \ No newline at end of file