This repository contains a Dockerfile for running Scispacy in a containerized environment. Scispacy is a Python package for biomedical and clinical natural language processing (NLP) that provides pre-trained models for various tasks.
To run you can use the prepackaged Docker image available on GHCR:
docker run -p "8000:8000" ghcr.io/nanth-uw/dockerized-scispacy:latestand then open http://localhost:8000/docs in your browser to view the Swagger UI for the API.
See the client example for how to use the API. The client is a simple Python script that sends a request to the Dockerized API after reading in generated notes using pandas. The API will return the processed notes with the relevant information extracted.
NOTE: if you use this example specifically you will need to generate the fake notes using the generator script first. This will create CSV file with 100 fake notes in the
datadirectory. You can change the number of notes generated by changing thenum_notesparameter in the script. The generated CSV file will be saved in thedatadirectory with the namefake_biomedical_notes.csv. You can then use this file as input to the client example.
All of this is expected to be run using uv :)