Project developed for the Cyfuture AI Hackathon
Legal documents are notoriously lengthy, dense, and rich in specialized jargon, making comprehension a significant challenge. Our platform integrates a robust AI stack combining Gemini, LangChain, and DuckDuckGo to deliver end-to-end legal intelligence.
- 🔐 Login & Access: Secure Google/OTP authentication.
- 📄 Contract Upload: Upload contracts (PDF/images) for instant analysis.
- 📊 Analysis & Red Flag Detection: OCR (Tesseract/Azure Vision) extracts text; LangChain + Gemini identify clauses and compliance issues.
- 💬 Contract Chat Interface: Chat with your document using Gemini-powered semantic chat (RAG), with contextual Q&A and clause referencing.
- 🔍 Similar Contract Search: DuckDuckGo scraping retrieves similar contracts; Gemini ranks them by similarity and success probability.
- 📥 PDF Export: Export chat transcripts and insights as polished PDFs for compliance, audit, and collaboration.
- 🧠 Gemini Prompted Chatbot: Domain-aware Gemini chatbot for general legal queries.
- Contextual Contract Benchmarking: One-click search for similar contracts or past agreements for negotiation insights.
- Semantic Document Chat: RAG-powered Q&A with full context—no keyword guessing.
- Downloadable Knowledge Artifacts: Export conversations with embedded context as polished PDFs.
- Frontend: Next.js, React, Tailwind CSS
- Backend: Node.js, Express, MongoDB, Passport.js
- AI: Google Gemini, LangChain, DuckDuckGo
- OCR: Tesseract, PyMuPDF, Python integration
- PDF: pdfkit
git clone <repo-url>
cd legal_aicd client
npm installcd ../server
npm installpip install -r requirements.txt- Windows: Download from here and add to PATH.
- macOS:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr
- Copy
.env.exampleto.envin bothclientandserverfolders and fill in required keys (Google, Cloudinary, MongoDB, Gemini, etc).
# In /server
npm start
# In /client
npm run devPOST /api/auth/send-otp— Send OTP to emailPOST /api/auth/verify-otp— Verify OTP, get JWTGET /api/auth/me— Get user info (JWT required)GET /api/auth/google— Google OAuth login
POST /api/ocr/upload-single— Upload and OCR a single filePOST /api/ocr/upload-multiple— Upload and OCR multiple filesGET /api/ocr/result/:fileId— Get OCR result by fileIdGET /api/ocr/chunks/:fileId?page=1&limit=10— Paginated text chunksGET /api/ocr/history— User's OCR upload history
POST /api/legal/analyze/:fileId— Analyze contract for legal structure, risks, compliance, red flagsPOST /api/legal/summary/:fileId— Executive summaryPOST /api/legal/entities/:fileId— Extract key entities/terms
POST /api/legal/initialize— Start chat sessionPOST /api/legal/ask— Chat with legal AI (contextual)GET /api/legal/history— Get chat historyPOST /api/legal/clear— Clear chat history
POST /api/websearch/search-contracts— Find and rank similar contracts using DuckDuckGo + Gemini
POST /api/document/generate— Generate PDF summary of consultation
- Script:
server/ocr.py - Install dependencies:
pip install -r requirements.txt - Run as part of backend Node.js service (auto-invoked)
- Libraries:
pytesseract,Pillow,PyMuPDF,langchain-text-splitters,langchain-core
- Legal teams & law firms (NDAs, vendor agreements, M&A)
- In-house corporate counsel (compliance, onboarding)
- 2024: US $31.6 B → 2032: US $63.6 B (CAGR 9.4%)
- North America ~50% share; APAC fastest-growing
- Legal AI/Contract AI: strong VC interest (Harvey, Ivo, etc)
- Webinars, demos, case studies, legal tech conferences
- Freemium model: limited analysis, premium for PDF/benchmarking
- Built for the Cyfuture AI Hackathon
- Powered by Google Gemini, LangChain, DuckDuckGo, Tesseract, PyMuPDF
