GitHub - poojithdevan4D/Munshi: Private AI assistant for Indian Chartered Accountant firms, runs entirely on-premise with zero internet calls.

# 📒 Munshi — Private AI Assistant for Indian CA Firms

*Munshi (मुंशी)* — the traditional Hindi term for a record-keeper or clerk. This Munshi is digital, private, and never leaves your office.

## What Is This?

Munshi is a **fully offline, on-premise AI assistant** built for Indian Chartered Accountant firms. It runs entirely on a server inside the firm's office — no client data ever leaves the premises. No GPT, no Claude, no internet calls.

Staff access Munshi through a clean web browser interface. Munshi reads client documents (GST returns, ITRs, audit reports, scanned notices), answers questions about them with proper citations, and is built to scale into automated GST notice reply drafting and other CA-specific workflows.

## Why Build This?

Indian CA firms handle highly confidential client data:

- GSTRs, ITRs, audit reports, bank statements

- Tax demand notices, scrutiny notices

- Personal financial records of clients

ICAI rules and client confidentiality agreements prevent firms from putting any of this into ChatGPT or other cloud AI services. But CAs spend hours every day searching through PDFs, drafting routine documents, and reconciling data.

Munshi solves this — a private AI inside the firm, accessible to all staff, that never sends data outside.

## Features (Current Prototype)

- ✅ **Local LLM inference** — Qwen 2.5-3B running on consumer GPU (RTX 3050 4GB tested)

- ✅ **Document Q&A** — Ask questions in natural language, get cited answers

- ✅ **OCR for scanned PDFs** — Tesseract + Poppler pipeline auto-detects scanned vs typed

- ✅ **Multi-client support** — Documents organized by client folder with isolation

- ✅ **Live document upload** — Drag-and-drop new PDFs through the UI; auto-indexed

- ✅ **Source citations** — Every answer shows source files with relevance scores

- ✅ **Branded web UI** — Clean Streamlit interface, professional appearance

- ✅ **Zero internet calls** — Verified offline operation

## Architecture


┌─────────────────────────────────────────┐

│  Browser (any office desktop or laptop) │

│         http://munshi.local             │

└─────────────────┬───────────────────────┘

&#x20;                 │

┌─────────────────▼───────────────────────┐

│  Streamlit Web UI (munshi\_ui.py)        │

│  - Chat interface                       │

│  - Document upload                      │

│  - Source citations                     │

└─────────────────┬───────────────────────┘

&#x20;                 │

┌─────────────────▼───────────────────────┐

│  RAG Pipeline (LlamaIndex)              │

│  - Embeddings (BGE-small-en-v1.5)       │

│  - OCR detection + Tesseract            │

└─────────┬───────────────────┬───────────┘

&#x20;         │                   │

┌─────────▼─────────┐   ┌─────▼────────────┐

│  Qdrant Vector DB │   │  llama.cpp       │

│  (Docker, port    │   │  serving Qwen    │

│   6333)           │   │  (port 8000)     │

└───────────────────┘   └──────────────────┘

## Tech Stack

| Component | Choice | Why |

|---|---|---|

| LLM Engine | llama.cpp (native binary) | Fastest CUDA inference on consumer GPUs |

| Model | Qwen 2.5-3B-Instruct Q4_K_M | Best quality at 4GB VRAM |

| Embeddings | BAAI/bge-small-en-v1.5 | Local, fast, accurate for English |

| Vector DB | Qdrant (Docker) | Production-ready, simple ops |

| OCR | Tesseract 5.5 + Poppler | Industry standard, free |

| RAG Framework | LlamaIndex | Best document handling for our use case |

| UI | Streamlit | Fast iteration, good defaults |

## Setup

### Prerequisites

- Windows 10/11 or Linux (tested on Windows 11)

- NVIDIA GPU with CUDA support (RTX 3050 4GB minimum, RTX 4090 24GB recommended for production)

- Python 3.11

- Docker Desktop

- Tesseract OCR 5.5+

- Git

### Step 1 — Clone The Repo

git clone https://github.com/poojithdevan4D/Munshi.git

cd Munshi

### Step 2 — Set Up Python Environment

python -m venv venv

.\\venv\\Scripts\\activate

pip install -r requirements.txt

### Step 3 — Download The Model

Download [Qwen 2.5-3B-Instruct Q4_K_M GGUF](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/blob/main/qwen2.5-3b-instruct-q4\_k\_m.gguf) and place in models/.

### Step 4 — Build llama.cpp With CUDA

Follow [llama.cpp build instructions](https://github.com/ggerganov/llama.cpp). Place the binary in llama-server/.

### Step 5 — Start Qdrant

docker run -d --name munshi-qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant

### Step 6 — Install Tesseract + Poppler

- Tesseract: https://github.com/UB-Mannheim/tesseract/wiki

- Poppler: https://github.com/oschwartz10612/poppler-windows/releases

Update paths in app/munshi\_ui.py if installed elsewhere.

### Step 7 — Start Munshi

Two terminals:

**Terminal 1 — Start the LLM server:**

cd llama-server

.\\start\_server.ps1

**Terminal 2 — Ingest documents and start UI:**

cd code

python ingest\_all\_clients.py

cd ..\\app

streamlit run munshi\_ui.py

Open http://localhost:8501 in your browser.

## Project Structure


Munshi/

├── app/

│   ├── munshi\_ui.py           # Streamlit web interface

│   └── .streamlit/

│       └── config.toml         # Theme + server config

├── code/

│   ├── generate\_full\_dataset.py  # Synthetic CA dataset generator

│   ├── ingest\_all\_clients.py     # PDF ingestion pipeline

│   ├── query\_full\_firm.py        # CLI query tool

│   ├── rag\_first\_query.py        # Quick RAG test

│   └── test\_\*.py                 # Various sanity tests

├── data/

│   └── sharma\_associates/      # Synthetic dataset (fictional firm)

│       ├── acme\_trading/

│       ├── krishna\_restaurant/

│       ├── mehta\_clinic/

│       ├── patel\_textiles/

│       └── techflow\_solutions/

├── start\_server.ps1            # Launch llama-server with CUDA

├── requirements.txt            # Python dependencies

├── .gitignore

├── LICENSE

└── README.md

## Performance

Tested on Acer Nitro 5 — RTX 3050 4GB, i5-12500H, 16GB RAM:

- **Cold load:** ~30 seconds (model + embeddings)

- **Inference:** 49 tok/s (full GPU offload, flash attention)

- **Warm queries:** ~70 tok/s

- **VRAM usage:** 2.3 GB / 3.3 GB available

- **Cross-client RAG queries:** 4-15 seconds end-to-end

For production deployment with Qwen 14B on RTX 4090 24GB, expect ~50 tok/s and dramatically better answer quality.

## Roadmap

### Phase 1 — Local Prototype ✅ COMPLETE

- [x] Local LLM serving with CUDA

- [x] RAG pipeline with Qdrant

- [x] OCR for scanned documents

- [x] Multi-client document organization

- [x] Web UI with citations

- [x] Live document upload via UI

### Phase 2 — Production Architecture (Next)

- [ ] Hardware spec for CA firm server

- [ ] Network architecture (LAN + VPN for WFH)

- [ ] Multi-user authentication

- [ ] Role-based access control

- [ ] Audit logging

### Phase 3 — High-Value Workflows

- [ ] GST notice reply drafter (DRC-01, ASMT-10)

- [ ] GSTR-2B vs Purchase Register reconciliation

- [ ] Form 26AS vs TDS book reconciliation

- [ ] Client communication drafter

### Phase 4 — Deployment Kit

- [ ] One-click installer

- [ ] Hardware test suite

- [ ] Backup automation

- [ ] Update mechanism

## License

MIT — see [LICENSE](LICENSE) file.

## Author

**Poojith Devan**

MCA (Generative AI), SRM University

MSc (AI & Data Science), O.P. Jindal Global University

GitHub: [@poojithdevan4D](https://github.com/poojithdevan4D)

---

*Built with care for Indian CA firms who deserve modern AI without compromising client confidentiality.*

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
code		code
data/sharma_associates		data/sharma_associates
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
Release-25.12.0-0.zip		Release-25.12.0-0.zip
munshi_demo.png		munshi_demo.png
munshi_ui.py		munshi_ui.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages