Skip to content

siruihan2024/LexMind

Repository files navigation

LexMind Banner

LexMind

Open-Source AI Legal Research Framework
Research smarter, cite with confidence.

License Python React Jurisdictions PRs Welcome

Features · Architecture · Quick Start · Frontend · Docs · Contributing · 中文文档


What is LexMind?

LexMind is an open-source, end-to-end AI legal research framework that automates the entire lifecycle of cross-jurisdictional legal research — from data crawling to cited answer generation. Built for junior lawyers, legal researchers, and compliance professionals who spend hours on desktop research, LexMind reduces research time by up to 95% while maintaining >95% citation accuracy.

Unlike general-purpose RAG systems, LexMind is purpose-built for the legal domain with jurisdiction-aware retrieval, risk-level classification, and authoritative citation tracing across four major legal systems: China (CN), European Union (EU), United Kingdom (UK), and United States (US).

Metric Traditional Research With LexMind
Research Time 4–8 hours 10–15 minutes
Citation Accuracy ~70% >95%
Jurisdictions per Query 1–2 4 simultaneously
Source Updates Manual (weekly) Automated

Features

End-to-End Automated Pipeline

LexMind provides a complete, zero-manual-intervention pipeline that covers every stage of legal research:

Stage 0 — Crawl: Automated crawlers fetch statutes and case law from authoritative legal databases including EUR-Lex (EU), Find Case Law / National Archives (UK), CourtListener REST API v4 (US), and China's NPC National Law Database. Built-in rate limiting, retry logic, HTTP caching, and 429 auto-wait ensure respectful and reliable data acquisition.

Stage 1 — Clean & Index: Raw legal texts are parsed, deduplicated, and structured with rich metadata (jurisdiction, document type, risk level, legal topic). Documents are chunked using domain-specific strategies — atomic article-level splitting for statutes and three-part (facts/reasoning/holding) splitting for case law — then vectorized and stored in ChromaDB.

Stage 2 — Retrieve & Re-rank: A hybrid retrieval engine combines semantic vector search with metadata filtering (jurisdiction, document type, risk level). Results are re-ranked using LLM-powered relevance scoring to surface the most pertinent sources.

Stage 3 — Generate & Cite: GPT-4.1-mini generates structured legal analysis with inline citation markers. Every claim is traceable to its authoritative source, enabling full citation verification.

Jurisdiction-Aware Intelligence

LexMind understands that legal research is inherently multi-jurisdictional. The system automatically detects jurisdiction entities in queries, expands searches across relevant legal systems, and tags every result with color-coded jurisdiction badges (CN/EU/UK/US) and risk-level classifications aligned with the EU AI Act framework (Unacceptable, High, Limited, Minimal).

Modern Web Interface

A polished React 19 + Tailwind CSS frontend provides an intuitive research experience designed for legal professionals. The interface includes an AI-powered research page with real-time streaming answers, a filterable source management dashboard, research history with timeline navigation, and an analytics dashboard for tracking research coverage and activity patterns.

Research Interface
AI-powered research with structured analysis, risk classification, and inline citations


Architecture

LexMind RAG Architecture

The architecture follows a modular design where each component can be independently extended or replaced. The backend is written in Python with clear separation between crawlers, data processing, vector storage, retrieval, and generation modules. The frontend communicates through a clean API layer, making it straightforward to integrate LexMind's engine into existing legal tech workflows.

lexmind/
├── backend/                    # Python RAG Engine
│   ├── main.py                 # CLI entry point & pipeline orchestrator
│   ├── requirements.txt        # Python dependencies
│   ├── data/
│   │   └── sample_data.json    # Pre-loaded legal data (4 jurisdictions)
│   └── modules/
│       ├── crawlers/           # Jurisdiction-specific web crawlers
│       │   ├── base.py         # Base crawler with rate limiting & caching
│       │   ├── eurlex_crawler.py
│       │   ├── uk_caselaw_crawler.py
│       │   ├── courtlistener_crawler.py
│       │   └── cn_law_crawler.py
│       ├── data_processor.py   # Document chunking & metadata extraction
│       ├── data_cleaner.py     # Text normalization & deduplication
│       ├── data_ingestion.py   # Ingestion pipeline orchestrator
│       ├── vector_store.py     # ChromaDB + SentenceTransformer embeddings
│       ├── retriever.py        # Hybrid search + LLM re-ranking
│       └── generator.py        # GPT-4.1-mini answer generation with citations
├── frontend/                   # React 19 Web Interface
│   ├── src/
│   │   ├── pages/              # Research, Sources, History, Analytics, Settings
│   │   ├── components/         # DashboardLayout, Badges, UI components
│   │   ├── lib/                # Mock data, utilities
│   │   └── contexts/           # Theme management
│   ├── index.html
│   └── package.json
├── docs/                       # Documentation & assets
│   ├── images/                 # Screenshots & architecture diagrams
│   └── api/                    # API reference documentation
├── .github/                    # GitHub templates & CI
│   ├── ISSUE_TEMPLATE/
│   ├── workflows/
│   └── PULL_REQUEST_TEMPLATE.md
├── CONTRIBUTING.md
├── CHANGELOG.md
├── LICENSE
└── README.md                   # You are here

Quick Start

Prerequisites

LexMind requires Python 3.10 or higher and an OpenAI-compatible API key for the generation module. The retrieval and indexing modules use local SentenceTransformer models, so no API key is needed for those stages.

Backend Setup

# Clone the repository
git clone https://github.com/your-org/lexmind.git
cd lexmind/backend

# Install dependencies
pip install -r requirements.txt

# Configure your environment (required for LLM features)
cp ENV_TEMPLATE .env
# Edit .env and replace 'your-api-key-here' with your actual OpenAI API key
# Note: Vector search and crawling work WITHOUT an API key

Run the Pipeline

# Interactive research mode (uses pre-loaded sample data)
python main.py

# Single query mode
python main.py --query "How does the GDPR regulate personal data processing?"

# Demo mode with pre-defined queries
python main.py --demo

# Crawl fresh data from official legal databases
python main.py --ingest

# Crawl specific jurisdictions only
python main.py --ingest -j UK EU

# Rebuild vector index from existing data
python main.py --index

Frontend Setup

cd lexmind/frontend

# Install dependencies
pnpm install

# Start development server
pnpm dev

The frontend will be available at http://localhost:3000. Currently the frontend uses mock data for demonstration purposes. To connect it to the live backend, configure the API endpoint in the Settings page.


Screenshots


Source Management

Analytics Dashboard

Research History

Landing Page

Data Sources

LexMind crawls from the following authoritative legal databases. Each crawler is designed to respect rate limits and terms of service.

Jurisdiction Database Content Type Crawler Module
CN NPC National Law Database Statutes & Regulations cn_law_crawler.py
EU EUR-Lex Regulations & Directives eurlex_crawler.py
UK Find Case Law (National Archives) Case Law & Judgments uk_caselaw_crawler.py
US CourtListener (REST API v4) Opinions & Case Law courtlistener_crawler.py

The pre-loaded sample_data.json includes 12 representative legal sources covering AI safety regulation across all four jurisdictions, organized around six core legal interests: right to life, personal freedom, privacy, property rights, fair trial, and freedom of expression.


Legal Topics Covered

LexMind's knowledge base is structured around six fundamental legal interests that are most impacted by AI technologies. These topics provide the analytical framework for cross-jurisdictional comparison and risk assessment.

Legal Interest CN Example EU Example UK Example US Example
Right to Life Criminal Law Art. 232-233 EU AI Act Art. 6 (High-risk) R v. Meadow [2007] State v. Loomis (2016)
Personal Freedom Criminal Law Art. 238 GDPR Art. 22 (Automated decisions) Bridges v. SSWP [2020] Carpenter v. US (2018)
Privacy Personal Info Protection Law GDPR Art. 5-9 Lloyd v. Google [2021] Clearview AI (FTC 2024)
Property Rights Civil Code Art. 1032-1039 EU AI Act Art. 52 Thaler v. Comptroller [2023] Thaler v. Vidal (2022)
Fair Trial Criminal Procedure Law Art. 50 EU AI Act Art. 6(2) R (Bridges) EWCA [2020] Loomis v. Wisconsin (2016)
Freedom of Expression Cybersecurity Law Art. 12 DSA Art. 14-17 Online Safety Act 2023 Section 230 CDA

Extending LexMind

LexMind is designed to be modular and extensible. Here are the most common extension points:

Adding a new jurisdiction: Create a new crawler in backend/modules/crawlers/ that extends BaseCrawler. Implement the search() and fetch_detail() methods for your target legal database, then register the new jurisdiction code in data_ingestion.py.

Adding a new data source: Add entries to backend/data/sample_data.json following the existing schema (type, jurisdiction, title, content, metadata). Run python main.py --index to rebuild the vector index.

Customizing the chunking strategy: Modify backend/modules/data_processor.py to implement domain-specific chunking logic. The current implementation uses atomic article splitting for statutes and three-part splitting for case law, but you can add custom strategies for other document types.

Swapping the LLM provider: The generator module uses the OpenAI-compatible API format. Set OPENAI_BASE_URL to point to any compatible endpoint (Azure OpenAI, Anthropic via proxy, local models via Ollama/vLLM, etc.).

Swapping the embedding model: Modify backend/modules/vector_store.py to use a different SentenceTransformer model or switch to an API-based embedding service.


Documentation

Detailed documentation is available in the docs/ directory:

Document Description
Architecture Guide Deep dive into the RAG pipeline design
API Reference Backend module API documentation
Crawler Guide How to build and configure legal data crawlers
Deployment Guide Production deployment instructions
FAQ Frequently asked questions

Contributing

We welcome contributions from the legal tech and AI communities. Whether you're a lawyer who wants to add jurisdiction coverage, a developer who wants to improve the retrieval engine, or a designer who wants to enhance the frontend, there's a place for you.

Please read our Contributing Guide before submitting a pull request. Key areas where we especially welcome contributions include new jurisdiction crawlers, improved chunking strategies for legal documents, multilingual support, and accessibility improvements to the frontend.


Roadmap

Timeline Milestone Status
Q1 2026 Core RAG pipeline + 4 jurisdiction crawlers Done
Q1 2026 React frontend with Research, Sources, Analytics Done
Q2 2026 PDF/Word export, cross-jurisdiction comparison view Planned
Q3 2026 Team collaboration, shared knowledge bases Planned
Q4 2026 Real-time regulatory alert system Planned
2027 Multi-language support (JA, KO, DE) Planned

License

LexMind is released under the Apache License 2.0. You are free to use, modify, and distribute this software for both commercial and non-commercial purposes.


Acknowledgments

LexMind builds upon the following open-source projects and public legal databases:

Technologies: ChromaDB, SentenceTransformers, OpenAI API, React, Tailwind CSS, shadcn/ui, Recharts

Legal Databases: EUR-Lex (European Union), Find Case Law / National Archives (United Kingdom), CourtListener / Free Law Project (United States), NPC National Law Database (China)


LexMind — Research smarter, cite with confidence.
Built with care for the legal research community.

About

Open-Source AI Legal Research Framework | 开源 AI 法律研究框架

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages