An OpenClaw plugin that gives your AI bot semantic search over your local files — research papers, experiment logs, code, and personal notes.
Built as a standalone plugin project, modeled after openclaw-claude-code. Does not modify OpenClaw source code.
User asks bot → bot calls kb_search → query embedded → cosine search in LanceDB → top-k chunks returned → bot answers
Two components:
| Component | Language | Role |
|---|---|---|
ingest/kb_ingest.py |
Python | One-time ingestion: chunk files → embed → write to LanceDB |
index.ts + src/ |
TypeScript | OpenClaw plugin: registers kb_search tool at runtime |
Both share the same LanceDB directory at ~/.openclaw/kb/lancedb, separate from conversation memory.
- OpenClaw installed and running
- Node.js ≥ 18
- Python ≥ 3.10
- OpenAI API key (for embeddings —
text-embedding-3-small)
Planned: Ollama local embedding support (
nomic-embed-text) to remove the OpenAI dependency entirely.
cd ~/Workplace/BotSpace/openclaw-local-kb
npm install
pip install -r ingest/requirements.txt# Add to ~/.zshrc or ~/.bashrc
export OPENAI_API_KEY="sk-proj-..."openclaw plugins install -l ~/Workplace/BotSpace/openclaw-local-kbOptional config fields: dbPath, embeddingModel, defaultLimit.
openclaw gateway restartRun ingestion before using the plugin. Three domains are supported:
| Domain | Contents |
|---|---|
research |
Research projects, papers, experiment logs |
projects |
Competition / Kaggle / engineering projects |
personal |
Blog posts, study notes, handbooks |
# Research
python ingest/kb_ingest.py --source ~/Workplace/SIR-SID --domain research
python ingest/kb_ingest.py --source ~/Workplace/WettingDynamic --domain research
python ingest/kb_ingest.py --source ~/Workplace/Efficient-SAV --domain research
# Projects
python ingest/kb_ingest.py --source ~/Workplace/MainPowerGrid --domain projects
python ingest/kb_ingest.py --source ~/Workplace/2024MathorCup --domain projects
python ingest/kb_ingest.py --source ~/Workplace/Kaggle --domain projects
# Personal (exclude sensitive folders with --exclude)
python ingest/kb_ingest.py --source ~/Documents/Blog --domain personal
python ingest/kb_ingest.py --source ~/Documents/Study\ Materials --domain personal
python ingest/kb_ingest.py --source ~/Documents/Personal\ Documents --domain personal --exclude "Security Codes"
# Check what's indexed
python ingest/kb_ingest.py --status
# Preview without writing
python ingest/kb_ingest.py --source ~/Workplace/SIR-SID --domain research --dry-run
# Re-embed from scratch
python ingest/kb_ingest.py --source ~/Workplace/SIR-SID --domain research --rebuildRe-running is safe. Already-indexed chunks (matched by source + chunk_id) are skipped automatically.
Each batch of 50 chunks is written immediately after embedding. If interrupted, already-written batches are preserved and skipped on the next run.
| Extension | Chunking strategy |
|---|---|
.md, .rst |
Split by H1/H2/H3 heading |
.py |
Split by top-level def / class |
.pdf |
Extract text, split by paragraph |
.ipynb |
One chunk per cell |
.txt |
Split by paragraph |
Binary files (.hdf5, .h5, .pkl, images, etc.) and system directories (.git, node_modules, __pycache__, etc.) are skipped automatically.
| Metric | Value |
|---|---|
| Storage per chunk | ~6 KB (1536-dim float32 vector + text) |
| ~2,500 chunks (one project) | ~20 MB |
| Full workspace | ~50–100 MB |
| Ingestion cost (2,500 chunks) | ~$0.01 |
| Per search query | ~$0.000001 |
The bot calls kb_search automatically when relevant. Example queries:
SIR-SID Exp 10 用了哪些超参数?
RelaxSAV 的稳定性条件是什么?
prewetting solver 里 compute_gamma 的物理含义?
MathorCup 里我用的 LSTM 结构是什么?
openclaw-local-kb/
├── index.ts # Plugin entry — registers kb_search tool
├── src/
│ ├── embeddings.ts # OpenAI embedding wrapper
│ ├── lancedb-client.ts # LanceDB read/search client
│ └── kb-search.ts # kb_search tool definition
├── ingest/
│ ├── kb_ingest.py # Ingestion CLI script
│ └── requirements.txt
├── openclaw.plugin.json # Plugin metadata
├── package.json
└── README.md
"Knowledge base table not found" → Run the ingestion script first.
"No OpenAI API key found" → Set OPENAI_API_KEY env var or add openaiApiKey to plugin config.
Plugin not loading
openclaw plugins doctor
openclaw gateway logsrm -rf ~/.openclaw/kb/
openclaw plugins uninstall local-kb
# Remove "local-kb" entry from ~/.openclaw/openclaw.json
openclaw gateway restartSource code is not affected.
- Ollama local embedding support (
nomic-embed-text) — no OpenAI dependency - BM25 keyword search fallback for offline scenarios
- File watcher for automatic re-ingestion on changes
-
--exclude-patternglob support
{ "plugins": { "entries": { "local-kb": { "enabled": true, "config": { "openaiApiKey": "${OPENAI_API_KEY}" } } } } }