This is a Python gui application that demonstrates how to build a custom PDF chatbot using LangChain and GPT 3.5 / Llama 2.
- The application gui is built using streamlit
- The application reads text from PDF files, splits it into chunks
- Uses OpenAI Embedding API to generate embedding vectors used to find the most relevant content to a user's question
- Build a conversational retrieval chain using Langchain
- Use OpenAI GPT API to generate respond based on content in PDF
- Install the following Python packages:
pip install streamlit pypdf2 langchain python-dotenv faiss-cpu openai sentence_transformers
- Create a
.envfile in the root directory of the project and add the following environment variables:
OPENAI_API_KEY= # Your OpenAI API key
The code is structured as follows:
app_db.py: The main application file that defines the Streamlit gui app and the user interface.- get_pdf_text function: reads text from PDF files
- get_text_chunks function: splits text into chunks
- get_vectorstore function: creates a FAISS vectorstore from text chunks and their embeddings
- get_conversation_chain function: creates a retrieval chain from vectorstore
- handle_userinput function: generates response from OpenAI GPT API
- create_connection: connect to MySQL DB
- initialize_db: create tables if not exist
- create_new_session: create new session id for identifying purposes
- get_previous_sessions: load preivous session from DB
- load_chat_history_for_session: load chat history from DB to display on the streamlit app
- save_message_to_db: save chat history to DB
htmlTemplates.py: A module that defines HTML templates for the user interface.create_chat_history_db.sql: sql script that creates chat history DB and tables to store/retrieve chat data.
streamlit run app_db.py
- Install Python bindings for llama.cpp library
pip install llama-cpp-python
- Download the llama 2 7B GGML model from https://huggingface.co/TheBloke/LLaMa-7B-GGML/blob/main/llama-7b.ggmlv3.q4_1.bin and place it in the models folder
- Switch language model to use Llama 2 loaded by LlamaCpp
- Switch embedding model to MiniLM-L6-v2 using HuggingFaceEmbeddings