LexMind

Open-Source AI Legal Research Framework
Research smarter, cite with confidence.

Features · Architecture · Quick Start · Frontend · Docs · Contributing · 中文文档

What is LexMind?

LexMind is an open-source, end-to-end AI legal research framework that automates the entire lifecycle of cross-jurisdictional legal research — from data crawling to cited answer generation. Built for junior lawyers, legal researchers, and compliance professionals who spend hours on desktop research, LexMind reduces research time by up to 95% while maintaining >95% citation accuracy.

Unlike general-purpose RAG systems, LexMind is purpose-built for the legal domain with jurisdiction-aware retrieval, risk-level classification, and authoritative citation tracing across four major legal systems: China (CN), European Union (EU), United Kingdom (UK), and United States (US).

Metric	Traditional Research	With LexMind
Research Time	4–8 hours	10–15 minutes
Citation Accuracy	~70%	>95%
Jurisdictions per Query	1–2	4 simultaneously
Source Updates	Manual (weekly)	Automated

Features

End-to-End Automated Pipeline

LexMind provides a complete, zero-manual-intervention pipeline that covers every stage of legal research:

Stage 0 — Crawl: Automated crawlers fetch statutes and case law from authoritative legal databases including EUR-Lex (EU), Find Case Law / National Archives (UK), CourtListener REST API v4 (US), and China's NPC National Law Database. Built-in rate limiting, retry logic, HTTP caching, and 429 auto-wait ensure respectful and reliable data acquisition.

Stage 1 — Clean & Index: Raw legal texts are parsed, deduplicated, and structured with rich metadata (jurisdiction, document type, risk level, legal topic). Documents are chunked using domain-specific strategies — atomic article-level splitting for statutes and three-part (facts/reasoning/holding) splitting for case law — then vectorized and stored in ChromaDB.

Stage 2 — Retrieve & Re-rank: A hybrid retrieval engine combines semantic vector search with metadata filtering (jurisdiction, document type, risk level). Results are re-ranked using LLM-powered relevance scoring to surface the most pertinent sources.

Stage 3 — Generate & Cite: GPT-4.1-mini generates structured legal analysis with inline citation markers. Every claim is traceable to its authoritative source, enabling full citation verification.

Jurisdiction-Aware Intelligence

LexMind understands that legal research is inherently multi-jurisdictional. The system automatically detects jurisdiction entities in queries, expands searches across relevant legal systems, and tags every result with color-coded jurisdiction badges (CN/EU/UK/US) and risk-level classifications aligned with the EU AI Act framework (Unacceptable, High, Limited, Minimal).

Modern Web Interface

A polished React 19 + Tailwind CSS frontend provides an intuitive research experience designed for legal professionals. The interface includes an AI-powered research page with real-time streaming answers, a filterable source management dashboard, research history with timeline navigation, and an analytics dashboard for tracking research coverage and activity patterns.

AI-powered research with structured analysis, risk classification, and inline citations

Architecture

The architecture follows a modular design where each component can be independently extended or replaced. The backend is written in Python with clear separation between crawlers, data processing, vector storage, retrieval, and generation modules. The frontend communicates through a clean API layer, making it straightforward to integrate LexMind's engine into existing legal tech workflows.

lexmind/
├── backend/                    # Python RAG Engine
│   ├── main.py                 # CLI entry point & pipeline orchestrator
│   ├── requirements.txt        # Python dependencies
│   ├── data/
│   │   └── sample_data.json    # Pre-loaded legal data (4 jurisdictions)
│   └── modules/
│       ├── crawlers/           # Jurisdiction-specific web crawlers
│       │   ├── base.py         # Base crawler with rate limiting & caching
│       │   ├── eurlex_crawler.py
│       │   ├── uk_caselaw_crawler.py
│       │   ├── courtlistener_crawler.py
│       │   └── cn_law_crawler.py
│       ├── data_processor.py   # Document chunking & metadata extraction
│       ├── data_cleaner.py     # Text normalization & deduplication
│       ├── data_ingestion.py   # Ingestion pipeline orchestrator
│       ├── vector_store.py     # ChromaDB + SentenceTransformer embeddings
│       ├── retriever.py        # Hybrid search + LLM re-ranking
│       └── generator.py        # GPT-4.1-mini answer generation with citations
├── frontend/                   # React 19 Web Interface
│   ├── src/
│   │   ├── pages/              # Research, Sources, History, Analytics, Settings
│   │   ├── components/         # DashboardLayout, Badges, UI components
│   │   ├── lib/                # Mock data, utilities
│   │   └── contexts/           # Theme management
│   ├── index.html
│   └── package.json
├── docs/                       # Documentation & assets
│   ├── images/                 # Screenshots & architecture diagrams
│   └── api/                    # API reference documentation
├── .github/                    # GitHub templates & CI
│   ├── ISSUE_TEMPLATE/
│   ├── workflows/
│   └── PULL_REQUEST_TEMPLATE.md
├── CONTRIBUTING.md
├── CHANGELOG.md
├── LICENSE
└── README.md                   # You are here

Quick Start

Prerequisites

LexMind requires Python 3.10 or higher and an OpenAI-compatible API key for the generation module. The retrieval and indexing modules use local SentenceTransformer models, so no API key is needed for those stages.

Backend Setup

# Clone the repository
git clone https://github.com/your-org/lexmind.git
cd lexmind/backend

# Install dependencies
pip install -r requirements.txt

# Configure your environment (required for LLM features)
cp ENV_TEMPLATE .env
# Edit .env and replace 'your-api-key-here' with your actual OpenAI API key
# Note: Vector search and crawling work WITHOUT an API key

Run the Pipeline

# Interactive research mode (uses pre-loaded sample data)
python main.py

# Single query mode
python main.py --query "How does the GDPR regulate personal data processing?"

# Demo mode with pre-defined queries
python main.py --demo

# Crawl fresh data from official legal databases
python main.py --ingest

# Crawl specific jurisdictions only
python main.py --ingest -j UK EU

# Rebuild vector index from existing data
python main.py --index

Frontend Setup

cd lexmind/frontend

# Install dependencies
pnpm install

# Start development server
pnpm dev

The frontend will be available at http://localhost:3000. Currently the frontend uses mock data for demonstration purposes. To connect it to the live backend, configure the API endpoint in the Settings page.

Screenshots

Source Management	Analytics Dashboard
Research History	Landing Page

Data Sources

LexMind crawls from the following authoritative legal databases. Each crawler is designed to respect rate limits and terms of service.

Jurisdiction	Database	Content Type	Crawler Module
CN	NPC National Law Database	Statutes & Regulations	`cn_law_crawler.py`
EU	EUR-Lex	Regulations & Directives	`eurlex_crawler.py`
UK	Find Case Law (National Archives)	Case Law & Judgments	`uk_caselaw_crawler.py`
US	CourtListener (REST API v4)	Opinions & Case Law	`courtlistener_crawler.py`

The pre-loaded sample_data.json includes 12 representative legal sources covering AI safety regulation across all four jurisdictions, organized around six core legal interests: right to life, personal freedom, privacy, property rights, fair trial, and freedom of expression.

Legal Topics Covered

LexMind's knowledge base is structured around six fundamental legal interests that are most impacted by AI technologies. These topics provide the analytical framework for cross-jurisdictional comparison and risk assessment.

Legal Interest	CN Example	EU Example	UK Example	US Example
Right to Life	Criminal Law Art. 232-233	EU AI Act Art. 6 (High-risk)	R v. Meadow [2007]	State v. Loomis (2016)
Personal Freedom	Criminal Law Art. 238	GDPR Art. 22 (Automated decisions)	Bridges v. SSWP [2020]	Carpenter v. US (2018)
Privacy	Personal Info Protection Law	GDPR Art. 5-9	Lloyd v. Google [2021]	Clearview AI (FTC 2024)
Property Rights	Civil Code Art. 1032-1039	EU AI Act Art. 52	Thaler v. Comptroller [2023]	Thaler v. Vidal (2022)
Fair Trial	Criminal Procedure Law Art. 50	EU AI Act Art. 6(2)	R (Bridges) EWCA [2020]	Loomis v. Wisconsin (2016)
Freedom of Expression	Cybersecurity Law Art. 12	DSA Art. 14-17	Online Safety Act 2023	Section 230 CDA

Extending LexMind

LexMind is designed to be modular and extensible. Here are the most common extension points:

Adding a new jurisdiction: Create a new crawler in backend/modules/crawlers/ that extends BaseCrawler. Implement the search() and fetch_detail() methods for your target legal database, then register the new jurisdiction code in data_ingestion.py.

Adding a new data source: Add entries to backend/data/sample_data.json following the existing schema (type, jurisdiction, title, content, metadata). Run python main.py --index to rebuild the vector index.

Customizing the chunking strategy: Modify backend/modules/data_processor.py to implement domain-specific chunking logic. The current implementation uses atomic article splitting for statutes and three-part splitting for case law, but you can add custom strategies for other document types.

Swapping the LLM provider: The generator module uses the OpenAI-compatible API format. Set OPENAI_BASE_URL to point to any compatible endpoint (Azure OpenAI, Anthropic via proxy, local models via Ollama/vLLM, etc.).

Swapping the embedding model: Modify backend/modules/vector_store.py to use a different SentenceTransformer model or switch to an API-based embedding service.

Documentation

Detailed documentation is available in the docs/ directory:

Document	Description
Architecture Guide	Deep dive into the RAG pipeline design
API Reference	Backend module API documentation
Crawler Guide	How to build and configure legal data crawlers
Deployment Guide	Production deployment instructions
FAQ	Frequently asked questions

Contributing

We welcome contributions from the legal tech and AI communities. Whether you're a lawyer who wants to add jurisdiction coverage, a developer who wants to improve the retrieval engine, or a designer who wants to enhance the frontend, there's a place for you.

Please read our Contributing Guide before submitting a pull request. Key areas where we especially welcome contributions include new jurisdiction crawlers, improved chunking strategies for legal documents, multilingual support, and accessibility improvements to the frontend.

Roadmap

Timeline	Milestone	Status
Q1 2026	Core RAG pipeline + 4 jurisdiction crawlers	Done
Q1 2026	React frontend with Research, Sources, Analytics	Done
Q2 2026	PDF/Word export, cross-jurisdiction comparison view	Planned
Q3 2026	Team collaboration, shared knowledge bases	Planned
Q4 2026	Real-time regulatory alert system	Planned
2027	Multi-language support (JA, KO, DE)	Planned

License

LexMind is released under the Apache License 2.0. You are free to use, modify, and distribute this software for both commercial and non-commercial purposes.

Acknowledgments

LexMind builds upon the following open-source projects and public legal databases:

Technologies: ChromaDB, SentenceTransformers, OpenAI API, React, Tailwind CSS, shadcn/ui, Recharts

Legal Databases: EUR-Lex (European Union), Find Case Law / National Archives (United Kingdom), CourtListener / Free Law Project (United States), NPC National Law Database (China)

LexMind — Research smarter, cite with confidence.
_{Built with care for the legal research community.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexMind

What is LexMind?

Features

End-to-End Automated Pipeline

Jurisdiction-Aware Intelligence

Modern Web Interface

Architecture

Quick Start

Prerequisites

Backend Setup

Run the Pipeline

Frontend Setup

Screenshots

Data Sources

Legal Topics Covered

Extending LexMind

Documentation

Contributing

Roadmap

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
ENV_TEMPLATE		ENV_TEMPLATE
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

LexMind

What is LexMind?

Features

End-to-End Automated Pipeline

Jurisdiction-Aware Intelligence

Modern Web Interface

Architecture

Quick Start

Prerequisites

Backend Setup

Run the Pipeline

Frontend Setup

Screenshots

Data Sources

Legal Topics Covered

Extending LexMind

Documentation

Contributing

Roadmap

License

Acknowledgments

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages