Skip to content

Phoenizard/openclaw-local-kb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

openclaw-local-kb

An OpenClaw plugin that gives your AI bot semantic search over your local files — research papers, experiment logs, code, and personal notes.

Built as a standalone plugin project, modeled after openclaw-claude-code. Does not modify OpenClaw source code.


How It Works

User asks bot → bot calls kb_search → query embedded → cosine search in LanceDB → top-k chunks returned → bot answers

Two components:

Component Language Role
ingest/kb_ingest.py Python One-time ingestion: chunk files → embed → write to LanceDB
index.ts + src/ TypeScript OpenClaw plugin: registers kb_search tool at runtime

Both share the same LanceDB directory at ~/.openclaw/kb/lancedb, separate from conversation memory.


Prerequisites

  • OpenClaw installed and running
  • Node.js ≥ 18
  • Python ≥ 3.10
  • OpenAI API key (for embeddings — text-embedding-3-small)

Planned: Ollama local embedding support (nomic-embed-text) to remove the OpenAI dependency entirely.


Installation

1. Install dependencies

cd ~/Workplace/BotSpace/openclaw-local-kb
npm install
pip install -r ingest/requirements.txt

2. Set OpenAI API key

# Add to ~/.zshrc or ~/.bashrc
export OPENAI_API_KEY="sk-proj-..."

3. Install the plugin

openclaw plugins install -l ~/Workplace/BotSpace/openclaw-local-kb

4. Configure ~/.openclaw/openclaw.json

{
  "plugins": {
    "entries": {
      "local-kb": {
        "enabled": true,
        "config": {
          "openaiApiKey": "${OPENAI_API_KEY}"
        }
      }
    }
  }
}

Optional config fields: dbPath, embeddingModel, defaultLimit.

5. Restart gateway

openclaw gateway restart

Ingesting Files

Run ingestion before using the plugin. Three domains are supported:

Domain Contents
research Research projects, papers, experiment logs
projects Competition / Kaggle / engineering projects
personal Blog posts, study notes, handbooks

Full ingestion commands

# Research
python ingest/kb_ingest.py --source ~/Workplace/SIR-SID --domain research
python ingest/kb_ingest.py --source ~/Workplace/WettingDynamic --domain research
python ingest/kb_ingest.py --source ~/Workplace/Efficient-SAV --domain research

# Projects
python ingest/kb_ingest.py --source ~/Workplace/MainPowerGrid --domain projects
python ingest/kb_ingest.py --source ~/Workplace/2024MathorCup --domain projects
python ingest/kb_ingest.py --source ~/Workplace/Kaggle --domain projects

# Personal (exclude sensitive folders with --exclude)
python ingest/kb_ingest.py --source ~/Documents/Blog --domain personal
python ingest/kb_ingest.py --source ~/Documents/Study\ Materials --domain personal
python ingest/kb_ingest.py --source ~/Documents/Personal\ Documents --domain personal --exclude "Security Codes"

# Check what's indexed
python ingest/kb_ingest.py --status

# Preview without writing
python ingest/kb_ingest.py --source ~/Workplace/SIR-SID --domain research --dry-run

# Re-embed from scratch
python ingest/kb_ingest.py --source ~/Workplace/SIR-SID --domain research --rebuild

Incremental ingestion

Re-running is safe. Already-indexed chunks (matched by source + chunk_id) are skipped automatically.

Checkpoint writes

Each batch of 50 chunks is written immediately after embedding. If interrupted, already-written batches are preserved and skipped on the next run.


Supported File Types

Extension Chunking strategy
.md, .rst Split by H1/H2/H3 heading
.py Split by top-level def / class
.pdf Extract text, split by paragraph
.ipynb One chunk per cell
.txt Split by paragraph

Binary files (.hdf5, .h5, .pkl, images, etc.) and system directories (.git, node_modules, __pycache__, etc.) are skipped automatically.


Disk Usage & Cost

Metric Value
Storage per chunk ~6 KB (1536-dim float32 vector + text)
~2,500 chunks (one project) ~20 MB
Full workspace ~50–100 MB
Ingestion cost (2,500 chunks) ~$0.01
Per search query ~$0.000001

Usage in Slack

The bot calls kb_search automatically when relevant. Example queries:

SIR-SID Exp 10 用了哪些超参数?
RelaxSAV 的稳定性条件是什么?
prewetting solver 里 compute_gamma 的物理含义?
MathorCup 里我用的 LSTM 结构是什么?

Project Structure

openclaw-local-kb/
├── index.ts                  # Plugin entry — registers kb_search tool
├── src/
│   ├── embeddings.ts         # OpenAI embedding wrapper
│   ├── lancedb-client.ts     # LanceDB read/search client
│   └── kb-search.ts          # kb_search tool definition
├── ingest/
│   ├── kb_ingest.py          # Ingestion CLI script
│   └── requirements.txt
├── openclaw.plugin.json      # Plugin metadata
├── package.json
└── README.md

Troubleshooting

"Knowledge base table not found" → Run the ingestion script first.

"No OpenAI API key found" → Set OPENAI_API_KEY env var or add openaiApiKey to plugin config.

Plugin not loading

openclaw plugins doctor
openclaw gateway logs

Uninstalling

rm -rf ~/.openclaw/kb/
openclaw plugins uninstall local-kb
# Remove "local-kb" entry from ~/.openclaw/openclaw.json
openclaw gateway restart

Source code is not affected.


Roadmap

  • Ollama local embedding support (nomic-embed-text) — no OpenAI dependency
  • BM25 keyword search fallback for offline scenarios
  • File watcher for automatic re-ingestion on changes
  • --exclude-pattern glob support

About

Search your local research papers, code, and experiment records via vector similarity

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors