This project explores industry best practices for structuring ML projects while creating an efficient solution for medication Named Entity Recognition (NER) with relatively small language models (SLM) such as SpaCy and DistilBERT. It was developed to address the limitations of a previous LLM-based approach:
- Prior Work - A two-stage retrieval RAG system using a 3B parameter LLM for medication NER
- Limitations Addressed - High latency due to LLM communication overhead and significant compute requirements
- Current Approach - Training smaller, specialized models (SpaCy and DistilBERT) on LLM-generated datasets for:
- Lower latency inference
- Better performance on resource-constrained hardware
- Production-ready deployment capabilities
- Features
- Getting Started
- Quick Start
- Usage
- Performance
- Project Structure
- Tech Stack
- System Architecture
- Development
- License
- Dual model architecture supporting both SpaCy and Transformer-based NER
- Experiment tracking and model registry with MLflow
- Model serving with BentoML
- Containerized deployment with Docker
- Config-based ML pipeline architecture for reproducibility
- Comprehensive testing suite
- Developer-friendly setup with UV package manager and Just command runner
- CI/CD with GitHub Actions
- Python 3.12
- UV package manager
- Docker and Docker Compose (for MLflow services)
- Just command runner (optional but recommended)
-
Clone the repository:
git clone https://github.com/JackLeeJM/slm-medication-ner.git cd slm-medication-ner
-
Install
just
command runner:Assuming that you are using Debian or Ubuntu derivatives, otherwise please refer to
just
GitHub repository for more installation options.sudo apt install just
-
Install project dependencies for development setup:
just dev-setup
-
Start MLflow tracking server and PostgreSQL database:
just up
Pre-built and containerized images (i.e. for CPU workloads):
-
Pull images from Docker Hub:
# SpaCy Model docker pull jackleejm/spacy-medication-ner:v1.0.0 # DistilBERT Model docker pull jackleejm/distilbert-medication-ner:v1.0.0
-
Run docker images:
# SpaCy Model docker run -it --rm -p 3000:3000 jackleejm/spacy-medication-ner:v1.0.0 # DistilBERT Model docker run -it --rm -p 3000:3000 jackleejm/distilbert-medication-ner:v1.0.0
-
Go to
http://localhost:3000
and access BentoML service API endpoints for inference.Here's an example of the input/output of the "/predict" API endpoint:
Request Body
{ "texts": [ "Acetaminophen 325 MG Oral Tablet" ] }
Response Body
[ [ { "word": "Acetaminophen", "entity_group": "DRUG", "start": 0, "end": 13 }, { "word": "325 MG", "entity_group": "DOSAGE", "start": 14, "end": 20 }, { "word": "Oral Tablet", "entity_group": "ROUTE", "start": 21, "end": 32 } ] ]
Spacy
-
Assuming that you have the "spacy" python package installed in your local environment, proceed to installing the SpaCy package of the NER model:
!pip install "en_spacy_medication_ner @ https://huggingface.co/jackleejm/en_spacy_medication_ner/resolve/main/en_spacy_medication_ner-any-py3-none-any.whl"
-
Import the model from Huggingface repository.
# Using spacy.load(). import spacy nlp = spacy.load("en_spacy_medication_ner") # Importing as module. import en_spacy_medication_ner nlp = en_spacy_medication_ner.load()
-
Run inferences
text_inputs = ["Acetaminophen 325 MG Oral Tablet"] docs = list(nlp.pipe(text_inputs)) predictions = [ [ { "word": ent.text, "entity_group": ent.label_, "start": ent.start_char, "end": ent.end_char, } for ent in doc.ents ] for doc in docs ]
DistilBERT
-
Load model from Huggingface repository.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline model_name = "jackleejm/distilbert-medication-ner" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name)
-
Setup a pipeline with standard parameters
from transformers import pipeline ner_pipeline = pipeline( task="token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple", device_map="auto", )
-
Run inferences
text_inputs = ["Acetaminophen 325 MG Oral Tablet"] predictions = ner_pipeline(text_inputs)
Train models using the provided scripts:
# Train SpaCy model
just train-spacy
# Train DistilBERT model
just train-distilbert
# Train both models sequentially
just train-all
Evaluate trained models:
# Evaluate SpaCy model
just evaluate-spacy
# Evaluate DistilBERT model
just evaluate-distilbert
# Evaluate both models
just evaluate-all
Register trained models with MLflow registry:
# Register SpaCy model
just register-spacy
# Register DistilBERT model
just register-distilbert
# Register both models
just register-all
Deploy models locally using BentoML:
# Deploy SpaCy model locally
just deploy-spacy-local
# Deploy DistilBERT model for CPU locally
just deploy-distilbert-cpu-local
# Deploy DistilBERT model for GPU locally
just deploy-distilbert-gpu-local
For production deployment, BentoML services can be containerized with the model artifacts baked-in and deployed to your infrastructure of choice:
# Build and containerize production-ready SpaCy model for CPU deployment
just containerize-spacy <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>
# Build and containerize production-ready DistilBERT model for CPU deployment
just containerize-distilbert-cpu <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>
# Build and containerize production-ready DistilBERT model for GPU deployment
just containerize-distilbert-gpu <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>
Important
Key Highlights:
- DistilBERT slightly outperforms SpaCy in F1 score across all entities.
- SpaCy is significantly faster in CPU inference.
- Optimized GPU DistilBERT narrows the latency gap with SpaCy.
Limitations:
- Both models are trained and evaluated on limited dataset (train=n309, eval=n335).
- May incur high risk of overfitting as a result of having small sample size.
- Thus, use these models with caution.
Note
- This project serves as an exploratory venture to better understand industry best practices for NER projects, thus the use of relatively small datasets to speed things up.
- Developers who are looking to adopt this project's methodologies are encouraged to expand the dataset further for custom use cases.
Quantitative performance (Precision, Recall, F1) for SpaCy and DistilBERT, both overall and per entity.
Overall Entity-Level Metrics
Model | Precision | Recall | F1-Score |
---|---|---|---|
Spacy | 0.99876 | 0.98262 | 0.99055 |
Distilbert | 0.99877 | 0.98324 | 0.99086 |
Entity-Level Breakdown
- Spacy
Entity Type | Precision | Recall | F1-Score |
---|---|---|---|
DRUG | 0.99381 | 0.99074 | 0.99227 |
DOSAGE | 1 | 0.99692 | 0.99846 |
ROUTE | 1 | 0.99688 | 0.99844 |
BRAND | 1 | 0.95454 | 0.97674 |
QUANTITY | 1 | 0.97402 | 0.98684 |
- DistilBERT
Entity Type | Precision | Recall | F1-Score |
---|---|---|---|
DRUG | 0.99382 | 0.99382 | 0.99382 |
DOSAGE | 1 | 0.99692 | 0.99846 |
ROUTE | 1 | 0.99688 | 0.99844 |
BRAND | 1 | 0.95454 | 0.97674 |
QUANTITY | 1 | 0.97403 | 0.98684 |
Note
DistilBERT (GPU) Optimized means using batch_size
of 2000 and half-precision of float16
for inference.
Raw timing and throughput results across CPU/GPU setups, evaluated on simulated dataset of 10,000 examples.
Latency (Seconds per Batch Size)
Model | n10 | n100 | n1000 | n10000 |
---|---|---|---|---|
Spacy (CPU) | 0.004 | 0.021 | 0.184 | 1.606 |
DistilBERT (CPU) | 0.117 | 0.838 | 7.894 | 81.179 |
DistilBERT (GPU) | 0.063 | 0.288 | 3.101 | 28.832 |
DistilBERT (GPU) Optimized | 0.032 | 0.096 | 0.77 | 7.312 |
Throughput (Samples per Second)
Model | n10 | n100 | n1000 | n10000 |
---|---|---|---|---|
Spacy (CPU) | 2657 | 4858 | 5432 | 6225 |
DistilBERT (CPU) | 85 | 119 | 126 | 123 |
DistilBERT (GPU) | 157 | 346 | 322 | 346 |
DistilBERT (GPU) Optimized | 313 | 1036 | 1297 | 1367 |
slm-medication-ner/
├── configs/ # Configuration files for models and services
│ └── experiments/ # Model-specific configuration
├── data/ # Data files
│ ├── processed/ # Processed datasets
│ └── raw/ # Raw training and evaluation data
├── deployment/ # BentoML service definitions
├── docker/ # Docker configurations for services
├── models/ # Directory for saving trained models
├── notebooks/ # Jupyter notebooks for exploration
├── scripts/ # Entrypoint scripts for training/evaluation/registration
├── src/ # Source code
│ ├── core/ # Core functionality
│ ├── data/ # Data processing
│ ├── models/ # Model implementations
│ ├── registry/ # Model registry functionality
│ ├── training/ # Training and evaluation
│ └── utils/ # Utility functions
└── tests/ # Unit tests
- SpaCy
- Transformers
- MLflow
- BentoML
- Docker
- Python
The project adopts a flexible and modular approach for the codebase where different components can be plugged into the respective factories for custom modifications, while sharing a unified interface as seen in the dedicated entrypoint scripts to run specific pipelines.
The project follows a configuration-first approach where all model parameters and experiment settings are defined in YAML files:
configs/experiments/spacy.yaml
: Configuration for SpaCy modelconfigs/experiments/distilbert.yaml
: Configuration for DistilBERT model
The codebase utilizes the Factory design pattern to instantiate the appropriate components based on the model type:
ModelFactory
: Creates model instances (SpaCy or DistilBERT)DatasetFactory
: Creates dataset objects compatible with the selected modelTrainerFactory
: Creates model-specific training workflowsEvaluatorFactory
: Creates appropriate evaluation strategiesModelRegistryFactory
: Creates model registration workflows
MLflow is integrated for comprehensive experiment tracking:
- Parameters: Model hyperparameters and configuration
- Metrics: Performance metrics from training and evaluation
- Artifacts: Trained models and supplementary files
- Model Registry: Versioning and stage transitions for models
BentoML provides production-ready API endpoints for model inference:
- Standardized API contract
- Containerized deployment
- Optimized inference performance
- Health monitoring and metrics
Run the test suite:
just test
Maintain code quality:
# Run linter
just lint
# Format code
just format
# Run all quality checks and tests
just dev-all
See the LICENSE file for details.