Skip to content

smraji/DigiScribe

Repository files navigation

DigiScribe MedChat - An automated ICD code extractor from Clinical Notes

Overview

This is an python based app to convert clinical notes into series of ICD-10 codes.In healthcare the communication between EMR/EHR and Other systems are through codes.Insurance claims need diseases to be coded in the ICD format. Huge number of Mdedical coders are involved in translating the clinical notes to ICD codes.This automated solution can reduce manual work, reduce cost, reduce responsive time and coding errors

LLM Guided Tree-Search Algorithm

This project usesoff-the-shelf Large Language Models (LLMs) for ICD coding called as LLM-guided tree-search. Each ICD code is part of a hierarchical relationship where parent codes encompass broader conditions and child codes represent specific ailments. The search begins at the root and uses the LLM to choose which branches to investigate, proceeding iteratively until no further paths remain. This process determines the most relevant ICD codes, adding them as predicted labels for the clinical note.

Process of Tree Search

The search process operates as follows:

  1. Initiation: Begins at the ontology's root.
  2. Navigation: Uses the LLM to determine which branches to explore.
  3. Iteration: Proceeds iteratively, exploring further until no viable paths remain.
  4. Conclusion: Identifies the most relevant ICD codes and adds them as predicted labels for the medical note.

Running the code

Configure the OpenAI credentials for the client

  • In order to run this code, you need to configure the credentials for accessing the LLMs via API. For this, you need API keys for GPT-3.5-turbo

  • For GPT-3.5-turbo you need to create an account with OpenAI and create an API key.

  • LLM can be accessed through the openai library, and in helpers.py need to be configured as follows:

For GPT-3.5-turbo
client = OpenAI(api_key=<OPEN_AI_API_KEY>)

Run the Application

Streamlit run Client.py  

Accessing Demo Application and Presentation

Demo App Link : https://digiscribe-9yddqupanzft34vnatyblu.streamlit.app/

Presentation : https://docs.google.com/presentation/d/1C0ltPk751WjfcOuT5vCfVFZeboKnTd1O/edit?usp=sharing&ouid=115531925957283463722&rtpof=true&sd=true

Using the App

  • System prompt gives the context of the app aiding in helping the LLM in prompt construction
  • In user input box type the clinical notes entered generally by a medical doctor
  • on clicking the submit button the LLM works extracting the diseases from the clinical notes and finds the appropriate ICD-10 codes
  • the tree search Algorithm traverses the ICD-10 codes tree structure in getting the respective parent and child codes
  • the output will be displayed as a series of codes seperated by comma

Extracting the predicted code descriptions from the LLM's predictions:

In this implementation each predicted ICD code descriptions is on a per-line basis and split each line by ":". This is because ideally the predictions should be of the format "<ICD Description: <yes/no> ...", and splitting by the first colon should give the description and the LLM prediction. I drop the predicted line if the description extracted by the LLM is not an exact match against the ICD tree. This may have an impact on the final performance.

  • While I've used the versions of the GPT-3.5 model with the parameters mentioned in the paper for reproducibility, it is possible that there could be some randomness inherent to the LLM calls for various reasons, that can impact the final performance.
  • I've reconstructed the prompts based on the examples and there could be some minor differences in the prompt that may affect the performance.

Python Packages used

  • streamlit
  • re
  • simple_icd_10_cm
  • OpenAI
  • os
  • dotenv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages