This project aims to enhance medical information retrieval for remote patients in the era of Healthcare 5.0. It leverages Language Model (LLM) finetuning and Semantic Chunking within a Retrieval-Augmented Generation (RAG) based Chatbot framework to provide personalized information from various medical documents, addressing the challenge of LLM hallucinations.
- Localized Setup: The entire setup, including the LLM and the database, runs locally on the CPU.
- Qdrant Vector Database: The project utilizes a Qdrant Docker database for storing chunks of medical documents.
- Semantic Chunking: Semantic chunking techniques are employed to derive chunks from the provided medical documents.
- Finetuned LLM: The project uses the BioMistral-7B model, a MEDICAL DOMAIN FINE-tuned model, for enhanced understanding and learning of medical information.
- Word Embeddings: Pubmed-bert is used for obtaining dense embeddings of words.
- Docker (for running the Qdrant database)
- Python 3.x
- Required Python packages (specified in the
requirements.txtfile)
-
Clone the repository: git clone https://github.com/your-repo/semantic-chunking-rag-chatbot.git
-
Install the required Python packages: pip install -r requirements.txt
-
Download the BioMistral-7B.Q4_K_M.gguf model from Hugging Face and place it in the project directory.
- Download Docker Desktop.
- Open PowerShell.
- Run
docker pull qdrant/qdrantto pull the Qdrant Docker image. - Run
docker imagesto check the available images. - Run
docker lsto check the running containers. - Run
docker run -p 6333:6333 qdrant/qdrantto start the Qdrant container.
- Run
python ingest.pyto create the database and ingest the medical documents. The Qdrant dashboard will be available athttp://localhost:6333/dashboard.
- Run
python retriever.pyto check if the model is responding well.
Top 2 retrived chunks with meta-data based on question : What is Metastatic disease?

- Run
uvicorn rag:appto start the FastAPI and Flask-based application. (check at: http://127.0.0.1:8000)
- the app will give output on an average 25-30 seconds due to LLM running on local machine with CPU. It will also give the context along with meta-data such as from which document, from which page etc.

If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them with descriptive commit messages.
- Push your changes to your forked repository.
- Create a pull request, describing your changes in detail.
This project is licensed under the MIT License.
- Qdrant for the vector database.
- Hugging Face for the BioMistral-7B model and Pubmed-bert.
- FastAPI and Flask for the web framework.


