MultiPDF Chatbot is a Streamlit application that allows users to interact with multiple PDF documents through a conversational interface. The application leverages various libraries and tools to process PDFs, extract text, and enable users to ask questions about the content of their documents.
- PDF Upload: Users can upload multiple PDF documents.
- Text Extraction: The application extracts text from the uploaded PDFs.
- Conversational Interface: Users can ask questions about the content of their PDFs and receive answers in a chat-like interface.
- Vector Store: The application uses a vector store to manage and retrieve text chunks efficiently.
- Memory Management: The conversation history is managed to provide context-aware responses.
- PDF Upload: Users upload their PDF documents through the Streamlit interface.
- Text Extraction: The text from the PDFs is extracted using the
PyPDF2library. - Text Chunking: The extracted text is split into manageable chunks using the
CharacterTextSplitterfrom thelangchainlibrary. - Vector Store: The text chunks are embedded using
GoogleGenerativeAIEmbeddingsand stored in a vector store usingFAISS. - Conversational Chain: A conversational retrieval chain is created using the
ConversationalRetrievalChainfrom thelangchainlibrary, which retrieves relevant text chunks based on the user's questions. - User Interaction: Users can ask questions, and the application provides answers based on the retrieved text chunks.
The application relies on the following dependencies:
streamlit: For creating the web interface.PyPDF2: For extracting text from PDFs.langchain: For text splitting, embedding, and conversational retrieval.faiss: For efficient similarity search and clustering.dotenv: For loading environment variables.
- Clone the repository:
git clone <repository_url>
- Install the required dependencies:
pip install -r requirements.txt
- Run the application:
streamlit run app.py
- Upload your PDF documents and start asking questions.