KnowledgeManager MCP

What is this project?

KnowledgeManager MCP is a highly efficient local server that implements the Model Context Protocol (MCP). It acts as a bridge, providing Large Language Models (LLMs) with direct, tool-based access to a curated knowledge base built from local PDF documents.

What does it do?

It automatically ingests PDF files from a specified data directory, processes them into semantic text chunks, and creates a fast, in-memory vector database. It then exposes two primary tools to any MCP-compatible LLM client (such as Claude Desktop):

Query Knowledge Base: Allows the LLM to ask general questions and retrieve the most relevant information across the entire document corpus.
Search Specific Documents: Allows the LLM to target its questions to specific PDF files by name, narrowing down the search scope for higher accuracy.

Additionally, it silently maintains a local SQLite database (logs.db) to audit and track every tool execution, query, and output performed by the AI.

How it works (Technical Level)

When the server starts src.py:

Database Initialization: It creates or connects to a SQLite database (logs.db) for secure usage logging.
Document Loading: It asynchronously scans the ../data directory for .pdf files and loads them in parallel using LangChain's PyPDFLoader.
Chunking: The extracted text is smartly chunked using RecursiveCharacterTextSplitter (1000 character chunks with 200 character overlaps) to preserve context continuity.
Remote Embedding Generation: It uses HuggingFaceInferenceAPIEmbeddings to generate rich text embeddings via the BAAI/bge-large-en-v1.5 model on the HuggingFace API.
Vector Indexing: It indexes these embeddings into an in-memory FAISS (Facebook AI Similarity Search) database.
Tool Exposure: Finally, the standard MCP server boots up via FastMCP over stdio transport. When the LLM requests a search, the server performs asynchronous similarity searches against the FAISS index, intelligently filters results based on query length and confidence scores, and constructs an answer for the LLM.

Technology Used & Benefits

Model Context Protocol (FastMCP): Provides a standardized, zero-configuration way to expose Python code as LLM tools.
LangChain: Simplifies the pipeline for loading, splitting, and managing complex text documents.
FAISS: Provides blazingly fast in-memory similarity search without the overhead of spinning up a dedicated vector database service like Pinecone or Weaviate.
HuggingFace Inference API: By offloading the heavy embedding generation to a cloud API, the host machine uses virtually zero CPU/GPU resources for inference, allowing it to run smoothly in the background on any hardware.
SQLite: A self-contained, serverless database that provides out-of-the-box audit logs without complex setup requirements.
Python Asyncio: Used extensively throughout document loading and vector querying to ensure the server remains highly responsive and non-blocking.

Why is it useful to have/use?

Zero-Hallucination Answers: LLMs lack access to your private files natively. This system directly connects them to your PDFs, forcing the AI to base its answers on your actual documents.
Granular Auditing: In enterprise or professional environments, knowing exactly what the AI was searching for and what context it received is critical. The built-in logging provides complete transparency.
Resource Efficiency: By avoiding heavy local embedding models and avoiding permanent, disk-hugging vector databases, this project is lightweight enough to be run 24/7 on a laptop while still providing enterprise-grade search speed.
Intelligent Filtering: The dynamic filtering logic actively adjusts confidence thresholds based on whether the AI is asking a simple or complex question, stripping out irrelevant noise and improving AI performance.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
src.py		src.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowledgeManager MCP

What is this project?

What does it do?

How it works (Technical Level)

Technology Used & Benefits

Why is it useful to have/use?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KnowledgeManager MCP

What is this project?

What does it do?

How it works (Technical Level)

Technology Used & Benefits

Why is it useful to have/use?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages