DrLAW is an innovative AI-powered legal assistance system designed to make legal information more accessible to the Indian population. The project combines several advanced technologies to provide accurate, context-aware legal advice in multiple Indian languages.
-
Retrieval Augmented Generation (RAG) System
- Uses a hybrid approach combining retrieval and generation
- Indexes legal documents for accurate context retrieval
- Employs semantic search for finding relevant legal information
- Components:
- Document Processor (PDF extraction)
- Text Chunker (semantic splitting)
- Embedding Model (sentence-transformers)
- Vector Database (FAISS)
- LLM Integration (Google Gemini)
-
Multilingual Support
- Supports 11 Indian languages:
- Hindi, Bengali, Telugu, Marathi, Tamil
- Gujarati, Kannada, Malayalam, Punjabi, Odia
- Uses Google Translate API for accurate translations
- Maintains legal terminology accuracy
- Supports 11 Indian languages:
-
Knowledge Base
- Processed legal documents including:
- Indian Constitution
- Major Acts and Laws
- Legal Precedents
- Legal Procedures
- Organized in semantic chunks for efficient retrieval
- Regularly updated with new legal information
- Processed legal documents including:
-
Backend Architecture (Flask)
/backend ├── app.py # Main application logic ├── static/ # Static assets ├── templates/ # HTML templates ├── storage/ # Vector DB and chunks └── requirements.txt # Dependencies -
Frontend Structure
/frontend ├── index.html # Main interface ├── login.html # Authentication ├── front.html # Landing page └── config.js # Configuration -
Database Schema (Supabase)
-- Users table CREATE TABLE users ( user_id SERIAL PRIMARY KEY, username VARCHAR(255) NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(255), google_id VARCHAR(255) UNIQUE ); -- Chats table CREATE TABLE chats ( chat_id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(user_id), question TEXT NOT NULL, answer TEXT NOT NULL, timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP );
-
Document Processing
- PDFs are processed and split into semantic chunks
- Each chunk is embedded using sentence-transformers
- Embeddings are stored in FAISS index
- Metadata maintained for source tracking
-
Query Processing
- User question is embedded
- Similar chunks retrieved from FAISS
- Context assembled from relevant chunks
- Prompt constructed with legal format
-
Response Generation
- Gemini API generates detailed response
- Response structured with:
- Legal analysis
- Applicable laws
- Required documentation
- Step-by-step guidance
-
Translation Flow
- Original response in English
- Translation to requested language
- Format preservation in translation
FLASK_SECRET_KEY=<secret>
GEMINI_API_KEY=<your-key>
SUPABASE_URL=<url>
SUPABASE_KEY=<key>
GOOGLE_CLIENT_ID=<id>
GOOGLE_CLIENT_SECRET=<secret>- Clone and setup:
git clone https://github.com/utknig123/DRLAW.git
cd DRLAW- Backend setup:
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt- Start server:
flask run- Python 3.8+
- 4GB+ RAM (for embeddings)
- Storage for vector database
- Internet connection for APIs
-
Authentication
- Session-based auth
- Google OAuth integration
- Password hashing
- CORS protection
-
Data Protection
- Encrypted storage
- Secure API calls
- Rate limiting
- Input sanitization
-
Technical Improvements
- Real-time document updates
- Improved semantic search
- Caching system
- Load balancing
-
Feature Additions
- Document upload interface
- Expert verification system
- Legal form generation
- Citation system
Common issues and solutions:
- Missing API keys
- Database connection errors
- Memory issues with embeddings
- Session management problems
- CORS configuration issues
This project is licensed under the MIT License.