Skip to content

dhvanitmonpara/doc-classifier

Repository files navigation

Document Classifier and Discrepancy Detector

A multi-agent system that analyzes multiple documents on the same topic to identify discrepancies, contradictions, and inconsistencies.

Features

  • Multi-document analysis (3-5 documents)
  • Cross-document reasoning and comparison
  • Discrepancy classification (contradictions, inconsistencies, omissions)
  • Alignment scoring (0-100 scale)
  • Human-readable explanations
  • REST API for web integration
  • Streamlit Web Interface for easy document analysis
  • CLI interface for command-line usage
  • Checkpoint system for resumable processing

Installation

# Install core dependencies
uv sync

# For API functionality, also install:
pip install -r requirements-api.txt

Usage

Streamlit Web Interface (Easiest)

Start both the API server and Streamlit frontend:

# Start both services at once
./start_full_app.sh

# Or start them separately:
# Terminal 1 - Start API server
python -m src.doc_classifier.run_api

# Terminal 2 - Start Streamlit frontend
python src/doc_classifier/frontend/run_frontend.py

Then open your browser to http://localhost:8501 to use the web interface.

Features:

  • Upload files or enter text directly
  • View results with formatted output
  • Track execution history
  • Download results in multiple formats
  • Monitor API status

See frontend/README.md for detailed frontend documentation.

REST API

Start the FastAPI server:

python src/run_api.py

Then use the API at http://localhost:8000:

import requests

# Process documents via API
response = requests.post("http://localhost:8000/process/content", json={
    "documents": [
        {"id": "doc1", "content": "Policy text 1..."},
        {"id": "doc2", "content": "Policy text 2..."},
        {"id": "doc3", "content": "Policy text 3..."}
    ]
})

result = response.json()
print(f"Alignment Score: {result['alignment_score']}/100")

API Documentation: See API_README.md for complete API documentation.

CLI Interface

# Process documents from files
python -m src.doc_classifier process doc1.txt doc2.txt doc3.txt

# Get help
python -m src.doc_classifier --help

Python Library

from src.doc_classifier import DocumentClassifier

# Initialize the classifier
classifier = DocumentClassifier()

# Process documents
result = classifier.process_documents([
    "document1.txt",
    "document2.txt", 
    "document3.txt"
])

print(f"Alignment Score: {result.alignment_result.score}")
print(f"Explanation: {result.explanation}")

Architecture

The system uses a multi-agent architecture with LangGraph orchestration:

  1. Ingestion Agent - Document reading and normalization
  2. Summarization Agent - Claim extraction
  3. Comparison Agent - Cross-document analysis
  4. Discrepancy Detection Agent - Issue classification
  5. Alignment Scoring Agent - Consistency scoring
  6. Explanation Generator - Human-readable output

Development

# Install development dependencies
uv sync --dev

# Run tests
uv run pytest

# Run property-based tests
uv run pytest -k "property"

# Format code
uv run black .
uv run isort .

# Type checking
uv run mypy src.doc_classifier

About

doc-classifier and Discrepancy Detection system built in AutonoumousHacks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors