Contract Answerer

An AI powered document question answering application that allows users to ask natural language questions about long PDF documents and receive accurate, source grounded answers.

This project demonstrates a production minded implementation of retrieval augmented generation using modern LLM tooling.

Demo

90 second walkthrough and setup video:

Why this project matters

Many real world workflows depend on large and complex documents that are difficult to search or reason about manually. This project explores how large language models, embeddings, and vector search can be combined to make long form documents immediately useful while minimizing hallucinations.

Key features

Ask natural language questions over arbitrary PDF documents
Embeddings based retrieval for precise context selection
Context constrained LLM responses to improve accuracy
Simple and fast UI designed for iteration and experimentation

Tech stack

Python
Streamlit
OpenAI API
ChromaDB
LangChain

Quickstart two minutes

Step 1 Clone the repository

git clone https://github.com/Aeh961/Contract-Answerer.git
cd Contract-Answerer

Step 2 Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate

Step 3 Install dependencies

pip install -r requirements.txt

Step 4 Add your documents

mkdir -p data/contracts

Place any PDF files you want to query into the data contracts folder.

Step 5 Set your API key

export OPENAI_API_KEY="your_key_here"

Step 6 Run the application

streamlit run src/app.py

How it works

PDF documents are loaded and split into semantic chunks
Each chunk is embedded and stored in a vector database
User questions retrieve the most relevant chunks
The language model generates answers using only retrieved context

Project structure

src
  app.py          Streamlit application
  load.py         Document ingestion and embedding
data
  contracts       User provided PDFs
README.md
requirements.txt

Future improvements

Inline citations with highlighted source text
Streaming responses for improved user experience
Document metadata filtering
Cost and latency optimizations

Author

Built by Abdallah Elhamawi as part of a broader exploration into practical and reliable AI systems.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contract Answerer

Demo

Why this project matters

Key features

Tech stack

Quickstart two minutes

How it works

Project structure

Future improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Contract Answerer

Demo

Why this project matters

Key features

Tech stack

Quickstart two minutes

How it works

Project structure

Future improvements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages