Extract contract terms in seconds, not minutes. Local-first, structured output, built for batch processing.
Zero setup required:
git clone https://github.com/Qleric-labs/contract-extraction-assistant
cd contract-extraction-assistant
docker compose up
Open http://localhost:5173 → Upload PDF → Extract
Demo mode active - Works immediately with shared API key.
- Shared quota (refresh daily)
- Data may be used for training
- Don't upload confidential contracts
Want unlimited + private? Add your own Mistral key (free tier available) - see Production Setup below.
Extracts 4 key fields from contracts:
- Start date
- End date
- Renewal terms
- Termination notice period
Output formats: JSON, CSV, PDF, TXT
Key features:
- Concurrent batch processing
- Page references + audit trail
- Local PDF processing (only prompts sent to API)
- Structured data ready for analysis
Hardware: M1 Mac
Test | Time | Details |
---|---|---|
Single contract (5 pages) | 3s | Includes page refs + snippets |
Batch (10 contracts, 150 pages) | <10s | Concurrent processing |
General LLM (same 10 contracts) | ~2-3 min | Sequential upload/processing |
Why it's faster:
- Parallel processing (not one-at-a-time)
- Purpose-built prompts (not general chat)
- Structured output (no manual formatting)
Get your own Mistral API key:
- Sign up at console.mistral.ai
- Free tier or Paid
- Create
.env
file:MISTRAL_API_KEY=your_key_here
- Restart:
docker compose restart
Your key = unlimited usage + data privacy control
Shipping next:
- Additional extraction fields (payment terms, liability, jurisdiction)
- Multi-provider support (OpenAI, Anthropic, Cohere)
- Accuracy benchmarks (testing across 500 contracts)
Under consideration:
- Visual PDF annotation (highlight extracted text)
- Multi-language support
- Custom extraction rules builder
Backend: Flask + PyMuPDF + Mistral SDK + spaCy
Frontend: React + Vite + Tailwind
Extraction: Hybrid LLM + regex patterns
See Technical Details for architecture breakdown.
All extractions export as:
- JSON - Machine-readable
- CSV - Spreadsheet-ready
- PDF - Formatted report
- TXT - Plain text
- PDFs processed locally (only prompts sent to API)
- BYOK model (you control the key)
- No vendor lock-in (multi-provider support planned)
- Self-hosted (run on your infrastructure)
Demo mode uses shared key - suitable for testing only.
⭐ Star this repo if you find it useful - helps others discover it.
Want to contribute?
- New extraction patterns
- UI/UX improvements
- Multi-language support
- Performance optimizations
See CONTRIBUTING.md for guidelines.
Click to expand architecture details
app.py
- Flask API with/api/analyze-contract
endpointcontract_extractor.py
- LLM + regex hybrid logicpatterns/
- YAML pattern definitionsrequirements.txt
- Python dependencies
UploadModal.tsx
- Multi-file uploadDashboard.tsx
- Results display + export controls- Tailwind-styled responsive UI
- LLM-first - Mistral handles nuanced language
- Regex fallback - Catches what LLMs miss
- Structured output - Clean JSON every time
- Parallel processing for batch jobs
- Windowed context (not full document to LLM)
- Efficient PDF parsing (PyMuPDF)
- Regex pre-filters reduce API calls
{
"extraction_timestamp": "2025-10-09T14:32:11Z",
"contract_type": "Service Agreement",
"contract_length": 12345,
"pages_analysed": 7,
"analysis": {
"start_date": {
"value": "January 1, 2024",
"source": "LLM",
"page": 2
},
"end_date": {
"value": "December 31, 2024",
"source": "Regex",
"page": 2
},
"renewal_terms": {
"value": "Auto-renews annually unless 60 days written notice given",
"source": "LLM",
"page": 5
},
"termination_notice_period": {
"value": "30 days written notice",
"source": "Regex",
"page": 6
}
}
}
MIT License - See LICENSE for details.
Use commercially, modify, distribute freely. Attribution appreciated but not required.
- Issues: GitHub Issues
- Questions: Open an issue with "question" label
Built for anyone processing contracts at scale.