Skip to content

A script to extract Lung RADS from Radiology Notes using LLM

Notifications You must be signed in to change notification settings

vikram0230/Lung-RADS--Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Lung RADS Score Extractor

This tool automatically extracts Lung RADS (Lung Imaging Reporting and Data System) scores from medical notes using AI. It reads a CSV file containing medical notes and outputs a CSV file with the extracted scores.

📋 Prerequisites

Before you begin, make sure you have the following installed:

  1. Python 3.8 or higher - Download Python
  2. Ollama - Download Ollama
  3. Git (if cloning from GitHub) - Download Git

🚀 Step-by-Step Setup Instructions

Step 1: Install Ollama

  1. Go to https://ollama.ai/download
  2. Download Ollama for your operating system (Windows, Mac, or Linux)
  3. Install Ollama by running the installer
  4. Verify installation by opening a terminal/command prompt and running:
    ollama --version

Step 2: Download the Required AI Model

  1. Open a terminal/command prompt

  2. Run the following command to download the recommended model:

    ollama pull llama3.2:3b

    Note: This may take a few minutes depending on your internet connection. The model is approximately 2GB in size.

    Alternative models (if you want to use a different one):

    • ollama pull llama3.1:8b (more accurate, but larger and slower)
    • ollama pull phi3:latest (good alternative)
    • ollama pull mistral:latest (another good option)

Step 3: Clone or Download This Repository

Option A: Using Git (Recommended)

git clone https://github.com/vikram0230/Lung-RADS--Extractor.git
cd ollama

Option B: Download as ZIP

  1. Click the "Code" button on GitHub
  2. Select "Download ZIP"
  3. Extract the ZIP file to a folder
  4. Open a terminal/command prompt in that folder

Step 4: Install Python Dependencies

  1. Open a terminal/command prompt in the project folder

  2. Install the required Python packages:

    pip install -r requirements.txt

    If you encounter permission errors, try:

    pip install --user -r requirements.txt

Step 5: Prepare Your Input File

  1. Create a folder named data in the project directory (if it doesn't exist)

  2. Place your CSV file in the data folder

  3. Important: Your CSV file must have a column named NOTE_CONTENTS containing the medical notes

    Example CSV structure:

    PATIENT_ID,ENCOUNTER_ID,NOTE_DATE,NOTE_CONTENTS
    12345,67890,2022-01-01,"Medical note text here..."
    12346,67891,2022-01-02,"Another medical note..."

Step 6: Run the Script

Basic usage (using default file paths):

python lung_rads_extractor.py

This will:

  • Read from: data/notes_filtered.csv
  • Write to: data/lung_rads_extracted.csv

Custom input/output files:

python lung_rads_extractor.py --input data/my_notes.csv --output data/my_results.csv

Using a different AI model:

python lung_rads_extractor.py --model llama3.1:8b

Get help:

python lung_rads_extractor.py --help

📊 Understanding the Output

The script will create a new CSV file with all the original columns plus a new column called Lung Rad Score. This column will contain:

  • The extracted score (e.g., "4B", "3", "2A", "1") if found
  • Empty/blank if no score was found in the note

📝 Log Files

The script automatically creates log files in the logs/lung_rads_extraction/ folder. Each run creates a new log file with a timestamp. These logs contain detailed information about the extraction process.

⚙️ Command-Line Options

Option Short Description Default
--input -i Input CSV file path data/notes_filtered.csv
--output -o Output CSV file path data/lung_rads_extracted.csv
--model -m Ollama model name llama3.2:3b

🔄 Progress Saving

The script automatically saves progress every 20 rows, so if the process is interrupted, you won't lose all your work. The output file will contain all processed rows up to the point of interruption.

❓ Troubleshooting

Problem: "Cannot connect to Ollama"

Solution: Make sure Ollama is running. Open a terminal and run:

ollama serve

Then run the script again in a different terminal window.

Problem: "Model 'llama3.2:3b' not found"

Solution: Download the model by running:

ollama pull llama3.2:3b

Problem: "Input file not found"

Solution:

  • Check that your input file exists at the specified path
  • Make sure you're running the script from the correct directory
  • Use the full path: python lung_rads_extractor.py --input /full/path/to/your/file.csv

Problem: "Required column 'NOTE_CONTENTS' not found"

Solution:

  • Make sure your CSV file has a column named exactly NOTE_CONTENTS
  • Check for typos or extra spaces in the column name
  • The column name is case-sensitive

Problem: "ModuleNotFoundError: No module named 'ollama'"

Solution: Install the required packages:

pip install -r requirements.txt

Problem: Script runs very slowly

Solutions:

  • Use a smaller model: --model llama3.2:3b (default, fastest)
  • Make sure Ollama is running locally (not over network)
  • Close other applications to free up system resources

📞 Getting Help

If you encounter issues:

  1. Check the log files in logs/lung_rads_extraction/ for detailed error messages
  2. Make sure all prerequisites are installed correctly
  3. Verify your input CSV file format matches the requirements
  4. Check that Ollama is running and the model is downloaded

🙏 Acknowledgments

This tool uses Ollama for local AI processing and pandas for data handling.

About

A script to extract Lung RADS from Radiology Notes using LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages