Skip to content

JackLeeJM/slm-medication-ner

Repository files navigation

Small Language Models for Medication NER

This project explores industry best practices for structuring ML projects while creating an efficient solution for medication Named Entity Recognition (NER) with relatively small language models (SLM) such as SpaCy and DistilBERT. It was developed to address the limitations of a previous LLM-based approach:

  • Prior Work - A two-stage retrieval RAG system using a 3B parameter LLM for medication NER
  • Limitations Addressed - High latency due to LLM communication overhead and significant compute requirements
  • Current Approach - Training smaller, specialized models (SpaCy and DistilBERT) on LLM-generated datasets for:
    • Lower latency inference
    • Better performance on resource-constrained hardware
    • Production-ready deployment capabilities

Table of Contents

Features

  • Dual model architecture supporting both SpaCy and Transformer-based NER
  • Experiment tracking and model registry with MLflow
  • Model serving with BentoML
  • Containerized deployment with Docker
  • Config-based ML pipeline architecture for reproducibility
  • Comprehensive testing suite
  • Developer-friendly setup with UV package manager and Just command runner
  • CI/CD with GitHub Actions

Getting Started

Requirements

  • Python 3.12
  • UV package manager
  • Docker and Docker Compose (for MLflow services)
  • Just command runner (optional but recommended)

Installation

  1. Clone the repository:

    git clone https://github.com/JackLeeJM/slm-medication-ner.git
    cd slm-medication-ner
  2. Install just command runner:

    Assuming that you are using Debian or Ubuntu derivatives, otherwise please refer to just GitHub repository for more installation options.

    sudo apt install just
  3. Install project dependencies for development setup:

    just dev-setup
  4. Start MLflow tracking server and PostgreSQL database:

    just up

Quick Start

Docker Hub

Pre-built and containerized images (i.e. for CPU workloads):

  1. Pull images from Docker Hub:

    # SpaCy Model
    docker pull jackleejm/spacy-medication-ner:v1.0.0
    
    # DistilBERT Model
    docker pull jackleejm/distilbert-medication-ner:v1.0.0
  2. Run docker images:

    # SpaCy Model
    docker run -it --rm -p 3000:3000  jackleejm/spacy-medication-ner:v1.0.0
    
    # DistilBERT Model
    docker run -it --rm -p 3000:3000  jackleejm/distilbert-medication-ner:v1.0.0
  3. Go to http://localhost:3000 and access BentoML service API endpoints for inference.

    Here's an example of the input/output of the "/predict" API endpoint:

    Request Body

    {
       "texts": [
         "Acetaminophen 325 MG Oral Tablet"
       ]
    }

    Response Body

    [
       [
         {
           "word": "Acetaminophen",
           "entity_group": "DRUG",
           "start": 0,
           "end": 13
         },
         {
           "word": "325 MG",
           "entity_group": "DOSAGE",
           "start": 14,
           "end": 20
         },
         {
           "word": "Oral Tablet",
           "entity_group": "ROUTE",
           "start": 21,
           "end": 32
         }
       ]
    ]

Huggingface

Spacy

  1. Assuming that you have the "spacy" python package installed in your local environment, proceed to installing the SpaCy package of the NER model:

    !pip install "en_spacy_medication_ner @ https://huggingface.co/jackleejm/en_spacy_medication_ner/resolve/main/en_spacy_medication_ner-any-py3-none-any.whl"
  2. Import the model from Huggingface repository.

    # Using spacy.load().
    import spacy
    nlp = spacy.load("en_spacy_medication_ner")
    
    # Importing as module.
    import en_spacy_medication_ner
    nlp = en_spacy_medication_ner.load()
  3. Run inferences

    text_inputs = ["Acetaminophen 325 MG Oral Tablet"]
    docs = list(nlp.pipe(text_inputs))
    predictions = [
      [
        {
          "word": ent.text,
          "entity_group": ent.label_,
          "start": ent.start_char,
          "end": ent.end_char,
        }
        for ent in doc.ents
      ]
      for doc in docs
    ]

DistilBERT

  1. Load model from Huggingface repository.

    from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
    
    model_name = "jackleejm/distilbert-medication-ner"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForTokenClassification.from_pretrained(model_name)
  2. Setup a pipeline with standard parameters

    from transformers import pipeline
    
    ner_pipeline = pipeline(
      task="token-classification",
      model=model,
      tokenizer=tokenizer,
      aggregation_strategy="simple",
      device_map="auto",
    )
  3. Run inferences

    text_inputs = ["Acetaminophen 325 MG Oral Tablet"]
    predictions = ner_pipeline(text_inputs)

Usage

Model Training

Train models using the provided scripts:

# Train SpaCy model
just train-spacy

# Train DistilBERT model
just train-distilbert

# Train both models sequentially
just train-all

Model Evaluation

Evaluate trained models:

# Evaluate SpaCy model
just evaluate-spacy

# Evaluate DistilBERT model
just evaluate-distilbert

# Evaluate both models
just evaluate-all

Model Registration

Register trained models with MLflow registry:

# Register SpaCy model
just register-spacy

# Register DistilBERT model
just register-distilbert

# Register both models
just register-all

Deployment

Deploy models locally using BentoML:

# Deploy SpaCy model locally
just deploy-spacy-local

# Deploy DistilBERT model for CPU locally
just deploy-distilbert-cpu-local

# Deploy DistilBERT model for GPU locally
just deploy-distilbert-gpu-local

For production deployment, BentoML services can be containerized with the model artifacts baked-in and deployed to your infrastructure of choice:

# Build and containerize production-ready SpaCy model for CPU deployment
just containerize-spacy <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>

# Build and containerize production-ready DistilBERT model for CPU deployment
just containerize-distilbert-cpu <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>

# Build and containerize production-ready DistilBERT model for GPU deployment
just containerize-distilbert-gpu <REGISTERED_MLFLOW_MODEL_NAME> <MLFLOW_TRACKING_SERVER_URI>

Performance

Evaluation Summary

Important

Key Highlights:

  • DistilBERT slightly outperforms SpaCy in F1 score across all entities.
  • SpaCy is significantly faster in CPU inference.
  • Optimized GPU DistilBERT narrows the latency gap with SpaCy.

Limitations:

  • Both models are trained and evaluated on limited dataset (train=n309, eval=n335).
  • May incur high risk of overfitting as a result of having small sample size.
  • Thus, use these models with caution.

Note

  • This project serves as an exploratory venture to better understand industry best practices for NER projects, thus the use of relatively small datasets to speed things up.
  • Developers who are looking to adopt this project's methodologies are encouraged to expand the dataset further for custom use cases.

Quantitative performance (Precision, Recall, F1) for SpaCy and DistilBERT, both overall and per entity.

Overall Entity-Level Metrics

Model Precision Recall F1-Score
Spacy 0.99876 0.98262 0.99055
Distilbert 0.99877 0.98324 0.99086

Entity-Level Breakdown

  • Spacy
Entity Type Precision Recall F1-Score
DRUG 0.99381 0.99074 0.99227
DOSAGE 1 0.99692 0.99846
ROUTE 1 0.99688 0.99844
BRAND 1 0.95454 0.97674
QUANTITY 1 0.97402 0.98684
  • DistilBERT
Entity Type Precision Recall F1-Score
DRUG 0.99382 0.99382 0.99382
DOSAGE 1 0.99692 0.99846
ROUTE 1 0.99688 0.99844
BRAND 1 0.95454 0.97674
QUANTITY 1 0.97403 0.98684

Inference Benchmark

Note

DistilBERT (GPU) Optimized means using batch_size of 2000 and half-precision of float16 for inference.

Raw timing and throughput results across CPU/GPU setups, evaluated on simulated dataset of 10,000 examples.

Latency (Seconds per Batch Size)

Model n10 n100 n1000 n10000
Spacy (CPU) 0.004 0.021 0.184 1.606
DistilBERT (CPU) 0.117 0.838 7.894 81.179
DistilBERT (GPU) 0.063 0.288 3.101 28.832
DistilBERT (GPU) Optimized 0.032 0.096 0.77 7.312

Throughput (Samples per Second)

Model n10 n100 n1000 n10000
Spacy (CPU) 2657 4858 5432 6225
DistilBERT (CPU) 85 119 126 123
DistilBERT (GPU) 157 346 322 346
DistilBERT (GPU) Optimized 313 1036 1297 1367

Project Structure

slm-medication-ner/
├── configs/                  # Configuration files for models and services
│   └── experiments/          # Model-specific configuration
├── data/                     # Data files
│   ├── processed/            # Processed datasets
│   └── raw/                  # Raw training and evaluation data
├── deployment/               # BentoML service definitions
├── docker/                   # Docker configurations for services
├── models/                   # Directory for saving trained models
├── notebooks/                # Jupyter notebooks for exploration
├── scripts/                  # Entrypoint scripts for training/evaluation/registration
├── src/                      # Source code
│   ├── core/                 # Core functionality
│   ├── data/                 # Data processing
│   ├── models/               # Model implementations
│   ├── registry/             # Model registry functionality
│   ├── training/             # Training and evaluation
│   └── utils/                # Utility functions
└── tests/                    # Unit tests

Tech Stack

  • SpaCy
  • Transformers
  • MLflow
  • BentoML
  • Docker
  • Python

System Architecture

The project adopts a flexible and modular approach for the codebase where different components can be plugged into the respective factories for custom modifications, while sharing a unified interface as seen in the dedicated entrypoint scripts to run specific pipelines.

Config-Based ML Pipeline

The project follows a configuration-first approach where all model parameters and experiment settings are defined in YAML files:

  • configs/experiments/spacy.yaml: Configuration for SpaCy model
  • configs/experiments/distilbert.yaml: Configuration for DistilBERT model

Factory Pattern Implementation

The codebase utilizes the Factory design pattern to instantiate the appropriate components based on the model type:

  • ModelFactory: Creates model instances (SpaCy or DistilBERT)
  • DatasetFactory: Creates dataset objects compatible with the selected model
  • TrainerFactory: Creates model-specific training workflows
  • EvaluatorFactory: Creates appropriate evaluation strategies
  • ModelRegistryFactory: Creates model registration workflows

Experiment Tracking

MLflow is integrated for comprehensive experiment tracking:

  • Parameters: Model hyperparameters and configuration
  • Metrics: Performance metrics from training and evaluation
  • Artifacts: Trained models and supplementary files
  • Model Registry: Versioning and stage transitions for models

Model Serving

BentoML provides production-ready API endpoints for model inference:

  • Standardized API contract
  • Containerized deployment
  • Optimized inference performance
  • Health monitoring and metrics

Development

Running Tests

Run the test suite:

just test

Code Quality

Maintain code quality:

# Run linter
just lint

# Format code
just format

# Run all quality checks and tests
just dev-all

License

See the LICENSE file for details.

About

Small Language Models (SLM) for medication Named Entity Recognition (NER)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published