PDF Q&A System with Google Gemini API and ChromaDB

This repository provides a Q&A application that allows users to upload a PDF, parse its content, and query it in natural language. By combining the Google Gemini API for embeddings and content generation with ChromaDB for efficient text storage and retrieval, the system provides a seamless way to interact with static documents.

Features

PDF Parsing: Extracts text from PDFs, processing page-by-page for modularity.
Semantic Search: Leverages embeddings to identify relevant document passages based on user queries.
Dynamic Q&A: Generates answers using Google Gemini API based on user queries and relevant document content.

Libraries Used

1. `google.generativeai`

Used for generating text embeddings and natural-language responses.
Enables semantic understanding of the document and queries.

2. `chromadb`

A vector database for embedding storage and retrieval.
Offers scalability and fast similarity searches for document querying.

3. `PyPDF2`

A robust library for extracting text from PDFs.
Splits documents into manageable chunks (pages).

4. `dotenv`

Manages environment variables securely, ensuring API keys are not hardcoded.

Architecture

The system consists of three main components:

PDF Parsing:
- Extracts text from the PDF and organizes it page-by-page using PyPDF2.
- Each page is stored as a document in the vector database (ChromaDB).
Semantic Embedding and Storage:
- Text embeddings are generated using Google Gemini API.
- These embeddings are stored in ChromaDB for similarity-based retrieval.
Q&A Workflow:
- The user’s query is embedded using the same model and matched against stored embeddings in ChromaDB.
- The most relevant passage is used as context for generating an answer via the Google Gemini API.

Installation

Prerequisites

Python 3.8 or higher.
Google Cloud account with Gemini API access.

Install the required libraries:

pip install google-generativeai chromadb PyPDF2 python-dotenv

Steps to Set Up

Clone the repository:

git clone https://github.com/your-username/pdf-qna.git
cd pdf-qna

Create a .env file in the root directory and add your Google API key:
```
GOOGLE_API_KEY=your-google-api-key
```
Run the application:
```
python main.py
```

Usage

Adding a PDF

If the database is empty, the system prompts you to upload a PDF:
```
Enter the path to your PDF: example.pdf
```
The text is parsed and stored as embeddings in ChromaDB.

Querying the PDF

Ask questions in natural language, such as:

Your question: What is discussed in the introduction?

Receive an AI-generated answer and the relevant passage:

Answer: The introduction outlines the importance of...
Passage: In the introduction, the author discusses...

Type exit to quit the program.

Challenges and Pitfalls

Major Challenges

Handling Large PDFs:
- Large PDFs can overwhelm memory or processing capabilities.
- Solution: Process and embed text page-by-page for modularity.
Text Retrieval Accuracy:
- Embedding models may misinterpret queries or retrieve irrelevant passages.
- Solution: Use high-quality embeddings and fine-tune retrieval parameters.

Pitfalls

API Dependency: Relies heavily on Google Gemini API, making it vulnerable to changes in service or pricing.
Limited Context Window: Only retrieves one passage at a time, which might miss broader context.

Safeguards for a Commercial Product

If this system were to be developed into a commercial product, the following safeguards would be critical:

1. Data Privacy

Encrypt PDF uploads and text data stored in databases.
Ensure compliance with data privacy laws like GDPR and HIPAA for sensitive documents.

2. Query Moderation

Implement filters to prevent misuse of the system (e.g., inappropriate or malicious queries).

3. Rate Limiting

Restrict API calls to prevent excessive usage or abuse, reducing costs and ensuring system availability.

4. Backup and Failover

Use backup databases and failover mechanisms to ensure reliability in case of server or service downtime.

5. User Authentication

Require users to authenticate before using the system to maintain security and track usage.

Future Improvements

FAISS Integration:
- Use FAISS for faster and more scalable vector searches, improving performance on large datasets.
LangChain Framework:
- Streamline interaction logic with LangChain’s prompt engineering and chained workflows.
Multi-Passage Retrieval:
- Retrieve multiple relevant passages for more comprehensive answers.
Web Interface:
- Build a web app with frameworks like Flask or React for better user experience.
Mobile Support:
- Extend the system to support mobile platforms for on-the-go document querying.

This project demonstrates the potential of combining cutting-edge AI tools with efficient storage solutions to unlock static document content. Engineers and developers are encouraged to fork the repository, experiment, and contribute to the system’s growth!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
Screenshot 2024-11-30 at 4.08.53 PM.png		Screenshot 2024-11-30 at 4.08.53 PM.png
question-answering.py		question-answering.py
readme.md		readme.md
requirements.txt		requirements.txt
test_sample_1.pdf		test_sample_1.pdf
test_sample_2.pdf		test_sample_2.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Q&A System with Google Gemini API and ChromaDB

Table of Contents

Features

Libraries Used

1. `google.generativeai`

2. `chromadb`

3. `PyPDF2`

4. `dotenv`

Architecture

Installation

Prerequisites

Steps to Set Up

Usage

Adding a PDF

Querying the PDF

Challenges and Pitfalls

Major Challenges

Pitfalls

Safeguards for a Commercial Product

1. Data Privacy

2. Query Moderation

3. Rate Limiting

4. Backup and Failover

5. User Authentication

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

drcocktail/PDF-RAG-With-Gemini

Folders and files

Latest commit

History

Repository files navigation

PDF Q&A System with Google Gemini API and ChromaDB

Table of Contents

Features

Libraries Used

1. google.generativeai

2. chromadb

3. PyPDF2

4. dotenv

Architecture

Installation

Prerequisites

Steps to Set Up

Usage

Adding a PDF

Querying the PDF

Challenges and Pitfalls

Major Challenges

Pitfalls

Safeguards for a Commercial Product

1. Data Privacy

2. Query Moderation

3. Rate Limiting

4. Backup and Failover

5. User Authentication

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. `google.generativeai`

2. `chromadb`

3. `PyPDF2`

4. `dotenv`

Packages