Austin's Gemma Model Document Q&A

Introduction

Austin's Gemma Model Document Q&A is a Streamlit-based web application that leverages LangChain, LangChain-Groq, and Google Generative AI Embeddings to provide an interactive Question & Answer system over a collection of PDF documents. Users can upload their documents, embed them into a vector store, and perform semantic searches to get precise answers based on the content.

Features

Interactive UI: Built with Streamlit for a seamless user experience.
Document Ingestion: Loads and processes PDF documents from a specified directory.
Text Splitting: Breaks down large documents into manageable chunks for efficient processing.
Vector Embeddings: Utilizes Google Generative AI Embeddings for creating vector representations of documents.
Semantic Search: Implements FAISS for efficient similarity searches within the document embeddings.
Custom Prompting: Uses LangChain's ChatPromptTemplate to generate accurate and context-based responses.
Performance Tracking: Measures and displays response times for queries.
Expandable Insights: Provides detailed document similarity search results within expandable sections.

Demo

Replace path_to_demo_screenshot.png with the actual path to your screenshot.

Installation

Clone the Repository

git clone https://github.com/yourusername/austins-gemma-model-document-qa.git
cd austins-gemma-model-document-qa

Set Up a Virtual Environment

It's recommended to use a virtual environment to manage dependencies.

python3 -m venv venv
source venv/bin/activate  # On Unix or MacOS
# or
venv\Scripts\activate     # On Windows

Install Dependencies
```
pip install -r requirements.txt
```

Configuration

Environment Variables

Create a .env file in the root directory of the project with the following content:
```
GROQ_API_KEY="your_groq_api_key_here"
GOOGLE_API_KEY="your_google_api_key_here"
```
- Replace "your_groq_api_key_here" with your actual GROQ API key.
- Replace "your_google_api_key_here" with your actual Google API key.
.gitignore

Ensure that your .env file is listed in .gitignore to prevent sensitive information from being committed to the repository.
```
# Environment Variables
.env
```

Usage

Prepare Your Documents

Place your PDF documents in the ./us_census directory. Ensure that the directory exists and contains the PDFs you want to use for the Q&A system.
Run the Application
```
streamlit run app.py
```
Replace app.py with the name of your main Python script if it's different.
Interact with the App
- Title: "Austin's Gemma Model Document Q&A"
- Enter Your Question: Input your query in the provided text box.
- Embed Documents: Click the "Documents Embedding" button to process and embed your documents.
- View Answers: After embedding, enter a question to receive an answer based on your documents.
- Expand for Similar Documents: Click on the "Document Similarity Search" expander to view relevant document excerpts.

Project Structure

austins-gemma-model-document-qa/
├── .env
├── requirements.txt
├── app.py
├── us_census/
│   ├── document1.pdf
│   ├── document2.pdf
│   └── ...
├── venv/
├── README.md
└── LICENSE

.env: Contains environment variables for API keys.
requirements.txt: Lists all Python dependencies.
app.py: Main Streamlit application script.
us_census/: Directory containing PDF documents for Q&A.
venv/: Virtual environment directory.
README.md: Project documentation.
LICENSE: License information.

Dependencies

All dependencies are listed in the requirements.txt file. Key dependencies include:

Streamlit: For building the web application interface.
LangChain: For chaining language model calls and managing prompts.
LangChain-Groq: Integration with GROQ for enhanced language model capabilities.
FAISS: For efficient similarity search and vector storage.
Google Generative AI Embeddings: For creating vector embeddings of documents.
PyPDF2 & pypdf: For loading and processing PDF documents.
dotenv: For managing environment variables securely.

Environment Variables

Ensure that the following environment variables are set in your .env file:

GROQ_API_KEY: Your GROQ API key.
GOOGLE_API_KEY: Your Google API key.

Example .env file:

GROQ_API_KEY="gsk_50W2pjGxioT0W7H09CGHWGdyb..."
GOOGLE_API_KEY="AIzaSyCBUl8J-cB4sEdB..."

Security Note: Never expose your API keys publicly. Always keep the .env file secure and exclude it from version control.

Running the Application

Activate Virtual Environment

source venv/bin/activate  # On Unix or MacOS
# or
venv\Scripts\activate     # On Windows

Run Streamlit App
```
streamlit run app.py
```
Access the App

Open your web browser and navigate to http://localhost:8501 to interact with the application.

Contributing

Contributions are welcome! Please follow these steps:

Fork the Repository
Create a Feature Branch
```
git checkout -b feature/YourFeatureName
```
Commit Your Changes
```
git commit -m "Add Your Feature"
```
Push to the Branch
```
git push origin feature/YourFeatureName
```
Open a Pull Request

Additional Notes

Performance Optimization: The application uses Streamlit's caching mechanisms to optimize performance, especially during the embedding and vector store creation processes.
Error Handling: Basic error handling is implemented to ensure the application runs smoothly even if certain operations fail.
Extensibility: The application is modular, allowing for easy addition of new features or integration with other services.

If you encounter any issues or have suggestions for improvements, feel free to open an issue or submit a pull request!

Happy Coding!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Austin's Gemma Model Document Q&A

Table of Contents

Introduction

Features

Demo

Installation

Configuration

Usage

Project Structure

Dependencies

Environment Variables

Running the Application

Contributing

Additional Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
us_census		us_census
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Algorithmia-SE/gemma-model-document-qa

Folders and files

Latest commit

History

Repository files navigation

Austin's Gemma Model Document Q&A

Table of Contents

Introduction

Features

Demo

Installation

Configuration

Usage

Project Structure

Dependencies

Environment Variables

Running the Application

Contributing

Additional Notes

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages