Document Classifier and Discrepancy Detector

A multi-agent system that analyzes multiple documents on the same topic to identify discrepancies, contradictions, and inconsistencies.

Features

Multi-document analysis (3-5 documents)
Cross-document reasoning and comparison
Discrepancy classification (contradictions, inconsistencies, omissions)
Alignment scoring (0-100 scale)
Human-readable explanations
REST API for web integration
Streamlit Web Interface for easy document analysis
CLI interface for command-line usage
Checkpoint system for resumable processing

Installation

# Install core dependencies
uv sync

# For API functionality, also install:
pip install -r requirements-api.txt

Usage

Streamlit Web Interface (Easiest)

Start both the API server and Streamlit frontend:

# Start both services at once
./start_full_app.sh

# Or start them separately:
# Terminal 1 - Start API server
python -m src.doc_classifier.run_api

# Terminal 2 - Start Streamlit frontend
python src/doc_classifier/frontend/run_frontend.py

Then open your browser to http://localhost:8501 to use the web interface.

Features:

Upload files or enter text directly
View results with formatted output
Track execution history
Download results in multiple formats
Monitor API status

See frontend/README.md for detailed frontend documentation.

REST API

Start the FastAPI server:

python src/run_api.py

Then use the API at http://localhost:8000:

import requests

# Process documents via API
response = requests.post("http://localhost:8000/process/content", json={
    "documents": [
        {"id": "doc1", "content": "Policy text 1..."},
        {"id": "doc2", "content": "Policy text 2..."},
        {"id": "doc3", "content": "Policy text 3..."}
    ]
})

result = response.json()
print(f"Alignment Score: {result['alignment_score']}/100")

API Documentation: See API_README.md for complete API documentation.

CLI Interface

# Process documents from files
python -m src.doc_classifier process doc1.txt doc2.txt doc3.txt

# Get help
python -m src.doc_classifier --help

Python Library

from src.doc_classifier import DocumentClassifier

# Initialize the classifier
classifier = DocumentClassifier()

# Process documents
result = classifier.process_documents([
    "document1.txt",
    "document2.txt", 
    "document3.txt"
])

print(f"Alignment Score: {result.alignment_result.score}")
print(f"Explanation: {result.explanation}")

Architecture

The system uses a multi-agent architecture with LangGraph orchestration:

Ingestion Agent - Document reading and normalization
Summarization Agent - Claim extraction
Comparison Agent - Cross-document analysis
Discrepancy Detection Agent - Issue classification
Alignment Scoring Agent - Consistency scoring
Explanation Generator - Human-readable output

Development

# Install development dependencies
uv sync --dev

# Run tests
uv run pytest

# Run property-based tests
uv run pytest -k "property"

# Format code
uv run black .
uv run isort .

# Type checking
uv run mypy src.doc_classifier

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.kiro		.kiro
.vscode		.vscode
data		data
dev-docs		dev-docs
examples		examples
src/doc_classifier		src/doc_classifier
tests		tests
.env.example		.env.example
.gitignore		.gitignore
HACKATHON_DEPLOY.md		HACKATHON_DEPLOY.md
README.md		README.md
WORKFLOW.md		WORKFLOW.md
deploy_hackathon.sh		deploy_hackathon.sh
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start_api.sh		start_api.sh
start_frontend_only.sh		start_frontend_only.sh
start_full_app.sh		start_full_app.sh
streamlit_app.py		streamlit_app.py
test_api.py		test_api.py
test_frontend.py		test_frontend.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Classifier and Discrepancy Detector

Features

Installation

Usage

Streamlit Web Interface (Easiest)

REST API

CLI Interface

Python Library

Architecture

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Classifier and Discrepancy Detector

Features

Installation

Usage

Streamlit Web Interface (Easiest)

REST API

CLI Interface

Python Library

Architecture

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages