- Introduction
- Features
- Demo
- Installation
- Configuration
- Usage
- Project Structure
- Dependencies
- Environment Variables
- Running the Application
- Contributing
- License
Austin's Gemma Model Document Q&A is a Streamlit-based web application that leverages LangChain, LangChain-Groq, and Google Generative AI Embeddings to provide an interactive Question & Answer system over a collection of PDF documents. Users can upload their documents, embed them into a vector store, and perform semantic searches to get precise answers based on the content.
- Interactive UI: Built with Streamlit for a seamless user experience.
- Document Ingestion: Loads and processes PDF documents from a specified directory.
- Text Splitting: Breaks down large documents into manageable chunks for efficient processing.
- Vector Embeddings: Utilizes Google Generative AI Embeddings for creating vector representations of documents.
- Semantic Search: Implements FAISS for efficient similarity searches within the document embeddings.
- Custom Prompting: Uses LangChain's
ChatPromptTemplateto generate accurate and context-based responses. - Performance Tracking: Measures and displays response times for queries.
- Expandable Insights: Provides detailed document similarity search results within expandable sections.
Replace path_to_demo_screenshot.png with the actual path to your screenshot.
-
Clone the Repository
git clone https://github.com/yourusername/austins-gemma-model-document-qa.git cd austins-gemma-model-document-qa -
Set Up a Virtual Environment
It's recommended to use a virtual environment to manage dependencies.
python3 -m venv venv source venv/bin/activate # On Unix or MacOS # or venv\Scripts\activate # On Windows
-
Install Dependencies
pip install -r requirements.txt
-
Environment Variables
Create a
.envfile in the root directory of the project with the following content:GROQ_API_KEY="your_groq_api_key_here" GOOGLE_API_KEY="your_google_api_key_here"
- Replace
"your_groq_api_key_here"with your actual GROQ API key. - Replace
"your_google_api_key_here"with your actual Google API key.
- Replace
-
.gitignore
Ensure that your
.envfile is listed in.gitignoreto prevent sensitive information from being committed to the repository.# Environment Variables .env
-
Prepare Your Documents
Place your PDF documents in the
./us_censusdirectory. Ensure that the directory exists and contains the PDFs you want to use for the Q&A system. -
Run the Application
streamlit run app.py
Replace
app.pywith the name of your main Python script if it's different. -
Interact with the App
- Title: "Austin's Gemma Model Document Q&A"
- Enter Your Question: Input your query in the provided text box.
- Embed Documents: Click the "Documents Embedding" button to process and embed your documents.
- View Answers: After embedding, enter a question to receive an answer based on your documents.
- Expand for Similar Documents: Click on the "Document Similarity Search" expander to view relevant document excerpts.
austins-gemma-model-document-qa/
├── .env
├── requirements.txt
├── app.py
├── us_census/
│ ├── document1.pdf
│ ├── document2.pdf
│ └── ...
├── venv/
├── README.md
└── LICENSE
- .env: Contains environment variables for API keys.
- requirements.txt: Lists all Python dependencies.
- app.py: Main Streamlit application script.
- us_census/: Directory containing PDF documents for Q&A.
- venv/: Virtual environment directory.
- README.md: Project documentation.
- LICENSE: License information.
All dependencies are listed in the requirements.txt file. Key dependencies include:
- Streamlit: For building the web application interface.
- LangChain: For chaining language model calls and managing prompts.
- LangChain-Groq: Integration with GROQ for enhanced language model capabilities.
- FAISS: For efficient similarity search and vector storage.
- Google Generative AI Embeddings: For creating vector embeddings of documents.
- PyPDF2 & pypdf: For loading and processing PDF documents.
- dotenv: For managing environment variables securely.
Ensure that the following environment variables are set in your .env file:
- GROQ_API_KEY: Your GROQ API key.
- GOOGLE_API_KEY: Your Google API key.
Example .env file:
GROQ_API_KEY="gsk_50W2pjGxioT0W7H09CGHWGdyb..."
GOOGLE_API_KEY="AIzaSyCBUl8J-cB4sEdB..."Security Note: Never expose your API keys publicly. Always keep the .env file secure and exclude it from version control.
-
Activate Virtual Environment
source venv/bin/activate # On Unix or MacOS # or venv\Scripts\activate # On Windows
-
Run Streamlit App
streamlit run app.py
-
Access the App
Open your web browser and navigate to
http://localhost:8501to interact with the application.
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeatureName
-
Commit Your Changes
git commit -m "Add Your Feature" -
Push to the Branch
git push origin feature/YourFeatureName
-
Open a Pull Request
-
Performance Optimization: The application uses Streamlit's caching mechanisms to optimize performance, especially during the embedding and vector store creation processes.
-
Error Handling: Basic error handling is implemented to ensure the application runs smoothly even if certain operations fail.
-
Extensibility: The application is modular, allowing for easy addition of new features or integration with other services.
If you encounter any issues or have suggestions for improvements, feel free to open an issue or submit a pull request!
Happy Coding!
