JurisFind

AI-powered legal document search and analysis platform. Search across 46,456+ legal cases using semantic similarity, read AI-generated summaries, ask follow-up questions, upload confidential documents for isolated analysis, and consult a legal domain chatbot — all backed by a FastAPI backend deployed on Azure.

Features

Semantic Search

Natural language search over 46,456 indexed legal cases. Queries are embedded using sentence-transformers/all-mpnet-base-v2, compared against a pre-built FAISS index using cosine similarity, and results are returned ranked by relevance score. See docs/query_pipeline.md for the full search flow.

PDF Analysis and Contextual Q&A

Clicking any search result opens a document analysis view. The API downloads the PDF from Azure Blob Storage (or local fallback), extracts text with PyMuPDF, chunks it, builds a temporary per-session FAISS vector store, and passes it to a Groq LLM for summarization. Users can then ask follow-up questions against the same temporary store without re-processing the document. The temporary embeddings are cleaned up at the end of the session.

Confidential Document Analysis

Users can upload their own PDF directly from the browser. The file is saved to an ephemeral Docker volume (confidential_tmp), processed identically to the PDF analysis flow, and the session is cleared the moment the user uploads a new file or requests deletion. The document never reaches Azure Blob Storage — it stays on the VM in the ephemeral volume.

Legal Chatbot

A general-purpose AI assistant pre-prompted for legal domain queries. Accepts a message and conversation history, passes them through a LangChain agent backed by Groq, and returns a streamed response.

Architecture

For a deeper breakdown of individual components, see docs/architecture.md and docs/technical_documentation.md.

    flowchart TD

%% ---------------- DATA PREPARATION ----------------
subgraph DATA_PREP [Document Preparation]
direction TB
DOCS[Legal Documents]
EMBED[Extract Text and Generate Embeddings]
FAISS[FAISS Vector Store]

DOCS --> EMBED --> FAISS
end


%% ---------------- MAIN APPLICATION ----------------
subgraph APP [Application]
direction TB

USER[User]

subgraph UI [User Interfaces]

SEARCH[Search Interface]
UPLOAD[Confidential Upload Interface]
CHATBOT[Legal Chatbot Interface]

end

subgraph SEARCH_FLOW [Search Flow]
QUERY[User Query]
RETRIEVE[Retrieve Relevant Documents]
RESULTS[Display Results]
ANALYZE_BTN[Analyze Document]
DOWNLOAD[Download Document]

QUERY --> RETRIEVE --> RESULTS
RESULTS --> ANALYZE_BTN
RESULTS --> DOWNLOAD
end


subgraph CONF_FLOW [Confidential Upload Flow]
UPLOAD_DOC[Upload Document]
SUMMARIZE_DOC[Summarize Uploaded Document]
ASK_Q_DOC[Ask Questions on Uploaded Document]
SIMILAR_CASES[Retrieve Similar Legal Cases]

UPLOAD_DOC --> SUMMARIZE_DOC
UPLOAD_DOC --> ASK_Q_DOC
UPLOAD_DOC --> SIMILAR_CASES
end


subgraph ANALYSIS_PAGE [Document Analysis Page]
DOC_VIEW[Selected Document]
RAG_PIPELINE[RAG Processing]
SUMMARY[Document Summary]
QNA[Question and Answer]

DOC_VIEW --> RAG_PIPELINE --> SUMMARY
DOC_VIEW --> RAG_PIPELINE --> QNA
end


subgraph CHATBOT_FLOW [Legal Chatbot]
CHAT_QUERY[User Question]
LEGAL_FILTER[Check Legal Domain]
LLM_RESPONSE[LLM Response]

CHAT_QUERY --> LEGAL_FILTER --> LLM_RESPONSE
end

end


%% ---------------- CONNECTIONS ----------------
USER --> SEARCH
USER --> UPLOAD
USER --> CHATBOT

SEARCH --> QUERY
QUERY --> FAISS
FAISS --> RETRIEVE

ANALYZE_BTN --> DOC_VIEW
SIMILAR_CASES --> DOC_VIEW

UPLOAD --> UPLOAD_DOC

CHATBOT --> CHAT_QUERY

Tech Stack

Layer	Technology	Notes
Frontend	React 18, Vite, TailwindCSS, react-markdown, lucide-react	Deployed on Azure Static Web Apps
Backend	FastAPI, uvicorn (factory mode), Python 3.11	Dockerized, running on Azure VM
LLM	Groq `llama-3.3-70b-versatile` via LangChain	Used for summarization, Q&A, chatbot
Embeddings	`sentence-transformers/all-mpnet-base-v2`	HuggingFace, runs inside the container
Search	FAISS (cosine similarity)	46,456 cases indexed, loaded from Blob on startup
PDF Processing	PyMuPDF, LangChain RecursiveCharacterTextSplitter	Chunk size 1000, overlap 200
Storage	Azure Blob Storage (`jurisfindstore`, container `data`)	Holds PDFs (5.3 GB) and FAISS index (136 MB)
Reverse Proxy	Nginx	Proxies port 80 to FastAPI port 8000, 20 MB upload limit
Hosting — Backend	Azure VM, Standard D2alds v7, Ubuntu 24.04, East US 2	`20.186.113.106`
Hosting — Frontend	Azure Static Web Apps (free tier)	`https://blue-cliff-0dfeb910f.2.azurestaticapps.net`
CI/CD	GitHub Actions	Frontend auto-deploys on push to main

Quick Start

Prerequisites

Python 3.9+
Node.js 18+
A Groq API key from console.groq.com
The FAISS index files (see Generating the Index below)

Backend

cd api
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Open .env and set GROQ_API_KEY
uvicorn main:create_app --factory --host 0.0.0.0 --port 8000 --reload

Verify: curl http://localhost:8000/api/health

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173

Generating the Index

If you have PDFs in api/data/pdfs/, generate the FAISS index locally:

cd api
python helpers/generate_embeddings.py

This produces api/data/faiss_store/legal_cases.index and api/data/faiss_store/id2name.json. See docs/ingestion_pipeline.md for details on the chunking and embedding strategy.

For Azure Blob Storage setup (production), see docs/azure_integration.md.

Project Structure

JurisFind/
├── api/
│   ├── main.py                      # FastAPI app factory, CORS, router registration
│   ├── Dockerfile                   # Python 3.11-slim, uvicorn factory mode
│   ├── requirements.txt
│   ├── .env.example                 # Template — copy to .env
│   ├── agents/
│   │   ├── legal_agent.py           # LangChain agent: PDF summarization + Q&A
│   │   └── legal_chatbot.py         # LangChain agent: general legal chatbot
│   ├── confidential/
│   │   └── confidential_pdf.py      # Confidential PDF processing
│   ├── helpers/
│   │   ├── azure_blob_helper.py     # Blob upload, download, FAISS sync
│   │   ├── azure_data_manager.py    # CLI tool for blob management
│   │   └── generate_embeddings.py   # One-time FAISS index builder
│   ├── routes/
│   │   └── routes.py                # All API endpoints
│   ├── services/
│   │   └── search_service.py        # FAISS search logic, downloads from Blob on startup
│   ├── upload_to_blob.py            # One-time script: upload FAISS files to Blob
│   └── data/
│       ├── faiss_store/             # legal_cases.index + id2name.json (gitignored)
│       └── pdfs/                    # 48K legal case PDFs (gitignored)
├── frontend/
│   ├── vite.config.ts
│   ├── tailwind.config.js
│   └── src/
│       ├── App.jsx                  # Routes definition
│       ├── config/api.js            # Base URL from VITE_API_BASE_URL env var
│       ├── components/
│       │   ├── Navigation.jsx
│       │   └── Footer.jsx
│       └── pages/
│           ├── LandingPage.jsx
│           ├── SearchPage.jsx       # Semantic search UI
│           ├── PdfAnalysis.jsx      # Document summary + chat UI
│           ├── LegalChatbot.jsx     # Legal AI chatbot UI
│           └── ConfidentialUpload.jsx  # Private PDF upload + analysis UI
├── nginx.conf                       # VM reverse proxy config
├── docker-compose.yml               # API container + confidential_tmp volume
└── .github/
    └── workflows/
        └── azure-static-web-apps-*.yml  # Auto-generated by Azure portal

Environment Variables

Full reference in api/.env.example.

Variable	Required	Description
`GROQ_API_KEY`	Yes	Groq API key for LLM inference
`GROQ_MODEL`	No	Defaults to `llama-3.3-70b-versatile`
`AZURE_STORAGE_CONNECTION_STRING`	Production only	Full Azure Blob connection string
`AZURE_DATA_CONTAINER`	Production only	Blob container name, defaults to `data`
`USE_LOCAL_FILES`	No	`true` = local FAISS files, `false` = download from Blob on startup
`API_HOST`	No	Defaults to `0.0.0.0` in Docker, `localhost` locally
`API_PORT`	No	Defaults to `8000`

Frontend (Vite):

Variable	Description
`VITE_API_BASE_URL`	Full base URL of the API, e.g. `http://20.186.113.106`. Defaults to `http://localhost:8000`

API Reference

Method	Endpoint	Body / Params	Description
GET	`/api/health`	—	Returns `{status, message, total_cases}`
POST	`/api/search`	`{query, top_k}`	Semantic search — returns ranked `{filename, score, similarity_percentage}` list
POST	`/api/unified/analyze`	`{filename, source}`	Analyze a PDF (`source`: `"database"` or `"uploaded"`) — returns AI summary
POST	`/api/unified/ask`	`{filename, question, source}`	Q&A against an analyzed document's embeddings
GET	`/api/pdf/{filename}`	—	Streams PDF binary from Blob Storage or local fallback
GET	`/api/document-stats/{filename}`	—	Returns embedding and document statistics
POST	`/api/upload-confidential-pdf`	`multipart/form-data` (file)	Upload a confidential PDF to ephemeral Docker volume
POST	`/api/retrieve-similar-cases`	`?filename=X&top_k=5`	Find similar cases from the main index matching an uploaded PDF
DELETE	`/api/cleanup-confidential/{filename}`	—	Delete confidential session and temp files
POST	`/api/legal-chat`	`{question}`	General legal AI chatbot (domain-filtered)

Interactive Swagger UI: http://20.186.113.106/docs

Full request/response schemas: docs/api_reference.md

Documentation

Document	Contents
docs/architecture.md	Component diagram, layer responsibilities, data flow overview
docs/ingestion_pipeline.md	PDF text extraction, chunking strategy, FAISS index construction
docs/query_pipeline.md	Query embedding, FAISS search, LangChain agent flow, prompt templates
docs/api_reference.md	Full endpoint reference with request/response schemas
docs/azure_integration.md	Azure Blob Storage setup, container structure, ingestion to Blob, env config
docs/deployment.md	VM setup, Docker, Nginx, Azure Static Web Apps, CI/CD, update workflow
docs/TECHNICAL_DOCUMENTATION.md	Comprehensive internal reference covering all subsystems

Deployment

See docs/deployment.md for the complete guide covering:

Azure VM provisioning and Docker setup
Nginx reverse proxy configuration
Azure Static Web Apps setup and GitHub Actions workflow
Updating the backend after a code change
Environment variable management on the VM

Contributing

Fork the repository
Create a feature branch off main
Follow PEP 8 for Python and ESLint config for JavaScript
Add or update tests in api/tests/ for backend changes
Open a pull request with a clear description of the change

Backend tests:

cd api && python -m pytest tests/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
api		api
docs		docs
frontend		frontend
.gitignore		.gitignore
README.md		README.md
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JurisFind

Table of Contents

Features

Semantic Search

PDF Analysis and Contextual Q&A

Confidential Document Analysis

Legal Chatbot

Architecture

Tech Stack

Quick Start

Prerequisites

Backend

Frontend

Generating the Index

Project Structure

Environment Variables

API Reference

Documentation

Deployment

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JurisFind

Table of Contents

Features

Semantic Search

PDF Analysis and Contextual Q&A

Confidential Document Analysis

Legal Chatbot

Architecture

Tech Stack

Quick Start

Prerequisites

Backend

Frontend

Generating the Index

Project Structure

Environment Variables

API Reference

Documentation

Deployment

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages