KnowledgeManager MCP is a highly efficient local server that implements the Model Context Protocol (MCP). It acts as a bridge, providing Large Language Models (LLMs) with direct, tool-based access to a curated knowledge base built from local PDF documents.
It automatically ingests PDF files from a specified data directory, processes them into semantic text chunks, and creates a fast, in-memory vector database. It then exposes two primary tools to any MCP-compatible LLM client (such as Claude Desktop):
- Query Knowledge Base: Allows the LLM to ask general questions and retrieve the most relevant information across the entire document corpus.
- Search Specific Documents: Allows the LLM to target its questions to specific PDF files by name, narrowing down the search scope for higher accuracy.
Additionally, it silently maintains a local SQLite database (logs.db) to audit and track every tool execution, query, and output performed by the AI.
When the server starts src.py:
- Database Initialization: It creates or connects to a SQLite database (
logs.db) for secure usage logging. - Document Loading: It asynchronously scans the
../datadirectory for.pdffiles and loads them in parallel using LangChain'sPyPDFLoader. - Chunking: The extracted text is smartly chunked using
RecursiveCharacterTextSplitter(1000 character chunks with 200 character overlaps) to preserve context continuity. - Remote Embedding Generation: It uses
HuggingFaceInferenceAPIEmbeddingsto generate rich text embeddings via theBAAI/bge-large-en-v1.5model on the HuggingFace API. - Vector Indexing: It indexes these embeddings into an in-memory
FAISS(Facebook AI Similarity Search) database. - Tool Exposure: Finally, the standard MCP server boots up via
FastMCPoverstdiotransport. When the LLM requests a search, the server performs asynchronous similarity searches against the FAISS index, intelligently filters results based on query length and confidence scores, and constructs an answer for the LLM.
- Model Context Protocol (FastMCP): Provides a standardized, zero-configuration way to expose Python code as LLM tools.
- LangChain: Simplifies the pipeline for loading, splitting, and managing complex text documents.
- FAISS: Provides blazingly fast in-memory similarity search without the overhead of spinning up a dedicated vector database service like Pinecone or Weaviate.
- HuggingFace Inference API: By offloading the heavy embedding generation to a cloud API, the host machine uses virtually zero CPU/GPU resources for inference, allowing it to run smoothly in the background on any hardware.
- SQLite: A self-contained, serverless database that provides out-of-the-box audit logs without complex setup requirements.
- Python Asyncio: Used extensively throughout document loading and vector querying to ensure the server remains highly responsive and non-blocking.
- Zero-Hallucination Answers: LLMs lack access to your private files natively. This system directly connects them to your PDFs, forcing the AI to base its answers on your actual documents.
- Granular Auditing: In enterprise or professional environments, knowing exactly what the AI was searching for and what context it received is critical. The built-in logging provides complete transparency.
- Resource Efficiency: By avoiding heavy local embedding models and avoiding permanent, disk-hugging vector databases, this project is lightweight enough to be run 24/7 on a laptop while still providing enterprise-grade search speed.
- Intelligent Filtering: The dynamic filtering logic actively adjusts confidence thresholds based on whether the AI is asking a simple or complex question, stripping out irrelevant noise and improving AI performance.