Small Language Models for Medication NER

This project explores industry best practices for structuring ML projects while creating an efficient solution for medication Named Entity Recognition (NER) with relatively small language models (SLM) such as SpaCy and DistilBERT. It was developed to address the limitations of a previous LLM-based approach:

Prior Work - A two-stage retrieval RAG system using a 3B parameter LLM for medication NER
Limitations Addressed - High latency due to LLM communication overhead and significant compute requirements
Current Approach - Training smaller, specialized models (SpaCy and DistilBERT) on LLM-generated datasets for:
- Lower latency inference
- Better performance on resource-constrained hardware
- Production-ready deployment capabilities

Features

Dual model architecture supporting both SpaCy and Transformer-based NER
Experiment tracking and model registry with MLflow
Model serving with BentoML
Containerized deployment with Docker
Config-based ML pipeline architecture for reproducibility
Comprehensive testing suite
Developer-friendly setup with UV package manager and Just command runner
CI/CD with GitHub Actions

Getting Started

Requirements

Python 3.12
UV package manager
Docker and Docker Compose (for MLflow services)
Just command runner (optional but recommended)

Installation

Clone the repository:

git clone https://github.com/JackLeeJM/slm-medication-ner.git
cd slm-medication-ner

Install just command runner:

Assuming that you are using Debian or Ubuntu derivatives, otherwise please refer to just GitHub repository for more installation options.
```
sudo apt install just
```
Install project dependencies for development setup:
```
just dev-setup
```
Start MLflow tracking server and PostgreSQL database:
```
just up
```

Quick Start

Docker Hub

Pre-built and containerized images (i.e. for CPU workloads):

Pull images from Docker Hub:

# SpaCy Model
docker pull jackleejm/spacy-medication-ner:v1.0.0

# DistilBERT Model
docker pull jackleejm/distilbert-medication-ner:v1.0.0

Run docker images:

# SpaCy Model
docker run -it --rm -p 3000:3000  jackleejm/spacy-medication-ner:v1.0.0

# DistilBERT Model
docker run -it --rm -p 3000:3000  jackleejm/distilbert-medication-ner:v1.0.0

Go to http://localhost:3000 and access BentoML service API endpoints for inference.

Here's an example of the input/output of the "/predict" API endpoint:

Request Body

{
   "texts": [
     "Acetaminophen 325 MG Oral Tablet"
   ]
}

Response Body

[
   [
     {
       "word": "Acetaminophen",
       "entity_group": "DRUG",
       "start": 0,
       "end": 13
     },
     {
       "word": "325 MG",
       "entity_group": "DOSAGE",
       "start": 14,
       "end": 20
     },
     {
       "word": "Oral Tablet",
       "entity_group": "ROUTE",
       "start": 21,
       "end": 32
     }
   ]
]

Huggingface

Spacy

Assuming that you have the "spacy" python package installed in your local environment, proceed to installing the SpaCy package of the NER model:

!pip install "en_spacy_medication_ner @ https://huggingface.co/jackleejm/en_spacy_medication_ner/resolve/main/en_spacy_medication_ner-any-py3-none-any.whl"

Import the model from Huggingface repository.

# Using spacy.load().
import spacy
nlp = spacy.load("en_spacy_medication_ner")

# Importing as module.
import en_spacy_medication_ner
nlp = en_spacy_medication_ner.load()

Run inferences

text_inputs = ["Acetaminophen 325 MG Oral Tablet"]
docs = list(nlp.pipe(text_inputs))
predictions = [
  [
    {
      "word": ent.text,
      "entity_group": ent.label_,
      "start": ent.start_char,
      "end": ent.end_char,
    }
    for ent in doc.ents
  ]
  for doc in docs
]

DistilBERT

Load model from Huggingface repository.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_name = "jackleejm/distilbert-medication-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

Setup a pipeline with standard parameters

from transformers import pipeline

ner_pipeline = pipeline(
  task="token-classification",
  model=model,
  tokenizer=tokenizer,
  aggregation_strategy="simple",
  device_map="auto",
)

Run inferences

text_inputs = ["Acetaminophen 325 MG Oral Tablet"]
predictions = ner_pipeline(text_inputs)

Usage

Model Training

Train models using the provided scripts:

# Train SpaCy model
just train-spacy

# Train DistilBERT model
just train-distilbert

# Train both models sequentially
just train-all

Model Evaluation

Evaluate trained models:

# Evaluate SpaCy model
just evaluate-spacy

# Evaluate DistilBERT model
just evaluate-distilbert

# Evaluate both models
just evaluate-all

Model Registration

Register trained models with MLflow registry:

# Register SpaCy model
just register-spacy

# Register DistilBERT model
just register-distilbert

# Register both models
just register-all

Deployment

Deploy models locally using BentoML:

# Deploy SpaCy model locally
just deploy-spacy-local

# Deploy DistilBERT model for CPU locally
just deploy-distilbert-cpu-local

# Deploy DistilBERT model for GPU locally
just deploy-distilbert-gpu-local

For production deployment, BentoML services can be containerized with the model artifacts baked-in and deployed to your infrastructure of choice:

# Build and containerize production-ready SpaCy model for CPU deployment
just containerize-spacy <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>

# Build and containerize production-ready DistilBERT model for CPU deployment
just containerize-distilbert-cpu <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>

# Build and containerize production-ready DistilBERT model for GPU deployment
just containerize-distilbert-gpu <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>

Performance

Evaluation Summary

Important

Key Highlights:

DistilBERT slightly outperforms SpaCy in F1 score across all entities.
SpaCy is significantly faster in CPU inference.
Optimized GPU DistilBERT narrows the latency gap with SpaCy.

Limitations:

Both models are trained and evaluated on limited dataset (train=n309, eval=n335).
May incur high risk of overfitting as a result of having small sample size.
Thus, use these models with caution.

Note

This project serves as an exploratory venture to better understand industry best practices for NER projects, thus the use of relatively small datasets to speed things up.
Developers who are looking to adopt this project's methodologies are encouraged to expand the dataset further for custom use cases.

Quantitative performance (Precision, Recall, F1) for SpaCy and DistilBERT, both overall and per entity.

Overall Entity-Level Metrics

Model	Precision	Recall	F1-Score
Spacy	0.99876	0.98262	0.99055
Distilbert	0.99877	0.98324	0.99086

Entity-Level Breakdown

Spacy

Entity Type	Precision	Recall	F1-Score
DRUG	0.99381	0.99074	0.99227
DOSAGE	1	0.99692	0.99846
ROUTE	1	0.99688	0.99844
BRAND	1	0.95454	0.97674
QUANTITY	1	0.97402	0.98684

DistilBERT

Entity Type	Precision	Recall	F1-Score
DRUG	0.99382	0.99382	0.99382
DOSAGE	1	0.99692	0.99846
ROUTE	1	0.99688	0.99844
BRAND	1	0.95454	0.97674
QUANTITY	1	0.97403	0.98684

Inference Benchmark

Note

DistilBERT (GPU) Optimized means using batch_size of 2000 and half-precision of float16 for inference.

Raw timing and throughput results across CPU/GPU setups, evaluated on simulated dataset of 10,000 examples.

Latency (Seconds per Batch Size)

Model	n10	n100	n1000	n10000
Spacy (CPU)	0.004	0.021	0.184	1.606
DistilBERT (CPU)	0.117	0.838	7.894	81.179
DistilBERT (GPU)	0.063	0.288	3.101	28.832
DistilBERT (GPU) Optimized	0.032	0.096	0.77	7.312

Throughput (Samples per Second)

Model	n10	n100	n1000	n10000
Spacy (CPU)	2657	4858	5432	6225
DistilBERT (CPU)	85	119	126	123
DistilBERT (GPU)	157	346	322	346
DistilBERT (GPU) Optimized	313	1036	1297	1367

Project Structure

slm-medication-ner/
├── configs/                  # Configuration files for models and services
│   └── experiments/          # Model-specific configuration
├── data/                     # Data files
│   ├── processed/            # Processed datasets
│   └── raw/                  # Raw training and evaluation data
├── deployment/               # BentoML service definitions
├── docker/                   # Docker configurations for services
├── models/                   # Directory for saving trained models
├── notebooks/                # Jupyter notebooks for exploration
├── scripts/                  # Entrypoint scripts for training/evaluation/registration
├── src/                      # Source code
│   ├── core/                 # Core functionality
│   ├── data/                 # Data processing
│   ├── models/               # Model implementations
│   ├── registry/             # Model registry functionality
│   ├── training/             # Training and evaluation
│   └── utils/                # Utility functions
└── tests/                    # Unit tests

Tech Stack

SpaCy
Transformers
MLflow
BentoML
Docker
Python

System Architecture

The project adopts a flexible and modular approach for the codebase where different components can be plugged into the respective factories for custom modifications, while sharing a unified interface as seen in the dedicated entrypoint scripts to run specific pipelines.

Config-Based ML Pipeline

The project follows a configuration-first approach where all model parameters and experiment settings are defined in YAML files:

configs/experiments/spacy.yaml: Configuration for SpaCy model
configs/experiments/distilbert.yaml: Configuration for DistilBERT model

Factory Pattern Implementation

The codebase utilizes the Factory design pattern to instantiate the appropriate components based on the model type:

ModelFactory: Creates model instances (SpaCy or DistilBERT)
DatasetFactory: Creates dataset objects compatible with the selected model
TrainerFactory: Creates model-specific training workflows
EvaluatorFactory: Creates appropriate evaluation strategies
ModelRegistryFactory: Creates model registration workflows

Experiment Tracking

MLflow is integrated for comprehensive experiment tracking:

Parameters: Model hyperparameters and configuration
Metrics: Performance metrics from training and evaluation
Artifacts: Trained models and supplementary files
Model Registry: Versioning and stage transitions for models

Model Serving

BentoML provides production-ready API endpoints for model inference:

Standardized API contract
Containerized deployment
Optimized inference performance
Health monitoring and metrics

Development

Running Tests

Run the test suite:

just test

Code Quality

Maintain code quality:

# Run linter
just lint

# Format code
just format

# Run all quality checks and tests
just dev-all

License

See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
configs/experiments		configs/experiments
data/raw		data/raw
deployment		deployment
docker		docker
docs/assets		docs/assets
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.bentoignore		.bentoignore
.env.sample		.env.sample
.env.test		.env.test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
justfile		justfile
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Small Language Models for Medication NER

Table of Contents

Features

Getting Started

Requirements

Installation

Quick Start

Docker Hub

Huggingface

Usage

Model Training

Model Evaluation

Model Registration

Deployment

Performance

Evaluation Summary

Inference Benchmark

Project Structure

Tech Stack

System Architecture

Config-Based ML Pipeline

Factory Pattern Implementation

Experiment Tracking

Model Serving

Development

Running Tests

Code Quality

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

JackLeeJM/slm-medication-ner

Folders and files

Latest commit

History

Repository files navigation

Small Language Models for Medication NER

Table of Contents

Features

Getting Started

Requirements

Installation

Quick Start

Docker Hub

Huggingface

Usage

Model Training

Model Evaluation

Model Registration

Deployment

Performance

Evaluation Summary

Inference Benchmark

Project Structure

Tech Stack

System Architecture

Config-Based ML Pipeline

Factory Pattern Implementation

Experiment Tracking

Model Serving

Development

Running Tests

Code Quality

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages