DigiScribe MedChat - An automated ICD code extractor from Clinical Notes

Overview

This is an python based app to convert clinical notes into series of ICD-10 codes.In healthcare the communication between EMR/EHR and Other systems are through codes.Insurance claims need diseases to be coded in the ICD format. Huge number of Mdedical coders are involved in translating the clinical notes to ICD codes.This automated solution can reduce manual work, reduce cost, reduce responsive time and coding errors

LLM Guided Tree-Search Algorithm

This project usesoff-the-shelf Large Language Models (LLMs) for ICD coding called as LLM-guided tree-search. Each ICD code is part of a hierarchical relationship where parent codes encompass broader conditions and child codes represent specific ailments. The search begins at the root and uses the LLM to choose which branches to investigate, proceeding iteratively until no further paths remain. This process determines the most relevant ICD codes, adding them as predicted labels for the clinical note.

Process of Tree Search

The search process operates as follows:

Initiation: Begins at the ontology's root.
Navigation: Uses the LLM to determine which branches to explore.
Iteration: Proceeds iteratively, exploring further until no viable paths remain.
Conclusion: Identifies the most relevant ICD codes and adds them as predicted labels for the medical note.

Running the code

Configure the OpenAI credentials for the client

In order to run this code, you need to configure the credentials for accessing the LLMs via API. For this, you need API keys for GPT-3.5-turbo
For GPT-3.5-turbo you need to create an account with OpenAI and create an API key.
LLM can be accessed through the openai library, and in helpers.py need to be configured as follows:

For GPT-3.5-turbo

client = OpenAI(api_key=<OPEN_AI_API_KEY>)

Run the Application

Streamlit run Client.py

Accessing Demo Application and Presentation

Demo App Link : https://digiscribe-9yddqupanzft34vnatyblu.streamlit.app/

Presentation : https://docs.google.com/presentation/d/1C0ltPk751WjfcOuT5vCfVFZeboKnTd1O/edit?usp=sharing&ouid=115531925957283463722&rtpof=true&sd=true

Using the App

System prompt gives the context of the app aiding in helping the LLM in prompt construction
In user input box type the clinical notes entered generally by a medical doctor
on clicking the submit button the LLM works extracting the diseases from the clinical notes and finds the appropriate ICD-10 codes
the tree search Algorithm traverses the ICD-10 codes tree structure in getting the respective parent and child codes
the output will be displayed as a series of codes seperated by comma

Extracting the predicted code descriptions from the LLM's predictions:

In this implementation each predicted ICD code descriptions is on a per-line basis and split each line by ":". This is because ideally the predictions should be of the format "<ICD Description: <yes/no> ...", and splitting by the first colon should give the description and the LLM prediction. I drop the predicted line if the description extracted by the LLM is not an exact match against the ICD tree. This may have an impact on the final performance.

While I've used the versions of the GPT-3.5 model with the parameters mentioned in the paper for reproducibility, it is possible that there could be some randomness inherent to the LLM calls for various reasons, that can impact the final performance.
I've reconstructed the prompts based on the examples and there could be some minor differences in the prompt that may affect the performance.

Python Packages used

streamlit
re
simple_icd_10_cm
OpenAI
os
dotenv

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
Client.py		Client.py
README.md		README.md
helpers.py		helpers.py
prompt_templates.py		prompt_templates.py
requirements.txt		requirements.txt
search_tree.py		search_tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DigiScribe MedChat - An automated ICD code extractor from Clinical Notes

Overview

LLM Guided Tree-Search Algorithm

Process of Tree Search

Running the code

Configure the OpenAI credentials for the client

For GPT-3.5-turbo

Run the Application

Accessing Demo Application and Presentation

Using the App

Python Packages used

About

Uh oh!

Releases

Packages

Uh oh!

Languages

smraji/DigiScribe

Folders and files

Latest commit

History

Repository files navigation

DigiScribe MedChat - An automated ICD code extractor from Clinical Notes

Overview

LLM Guided Tree-Search Algorithm

Process of Tree Search

Running the code

Configure the OpenAI credentials for the client

For GPT-3.5-turbo

Run the Application

Accessing Demo Application and Presentation

Using the App

Python Packages used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages