Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
696cf70
feat: add traul_meta table for version tracking
dandaka Mar 18, 2026
0c97078
feat: export CHUNKER_VERSION constant
dandaka Mar 18, 2026
4d35356
feat: add getMeta/setMeta for version tracking
dandaka Mar 18, 2026
aa84ee6
feat: add resetSyncCursors and resetChunks methods
dandaka Mar 18, 2026
d660d54
feat: add auto-migration for chunker/embed version changes
dandaka Mar 18, 2026
b5ec39c
feat: run auto-migration on startup
dandaka Mar 18, 2026
abf14d0
feat: add traul reset command for manual data layer resets
dandaka Mar 18, 2026
4271e9f
docs: document traul reset command and auto-migration
dandaka Mar 18, 2026
5593df4
chore: bump version to 0.2.0
dandaka Mar 18, 2026
f2ca8ea
chore: add node-llama-cpp dependency
dandaka Mar 18, 2026
fbe2cfb
feat: add llama.ts formatting helpers with tests
dandaka Mar 18, 2026
ad8d2e1
feat: add llama.ts model wrapper with singleton, embed methods
dandaka Mar 18, 2026
7f49611
feat: route embeddings through llama.ts with Ollama fallback
dandaka Mar 18, 2026
b38fdef
feat: use embedQuery() for search queries
dandaka Mar 18, 2026
22c064b
fix: resolve mock.module conflicts between test files
dandaka Mar 18, 2026
30eebcb
docs: document node-llama-cpp embedding backend
dandaka Mar 18, 2026
ed31a55
chore: bump CHUNKER_VERSION to 2 to trigger rechunking
dandaka Mar 18, 2026
f20262c
fix: avoid SQLITE_BUSY by skipping unchanged meta writes in migrations
dandaka Mar 19, 2026
8d88a88
fix: suppress noisy node-llama-cpp token type warning
dandaka Mar 19, 2026
15dd5c7
fix: add LlamaLogLevel to node-llama-cpp mock in tests
dandaka Mar 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 257 additions & 7 deletions bun.lock

Large diffs are not rendered by default.

10 changes: 7 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "traul",
"version": "0.1.0",
"version": "0.2.0",
"description": "Personal Intelligence Engine — watches communication streams, identifies patterns, surfaces actionable insights",
"license": "AGPL-3.0-only",
"repository": {
Expand All @@ -18,13 +18,17 @@
},
"dependencies": {
"@slack/web-api": "^7.9.1",
"commander": "^13.1.0",
"commander": "^13.1.0",
"googleapis": "^171.4.0",
"html-to-text": "^9.0.5",
"node-llama-cpp": "^3.18.1",
"sqlite-vec": "^0.1.7-alpha.2"
},
"devDependencies": {
"@types/bun": "latest",
"@types/html-to-text": "^9.0.4"
}
},
"trustedDependencies": [
"node-llama-cpp"
]
}
21 changes: 17 additions & 4 deletions skill.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ allowed-tools:

CLI tool that watches communication streams (Slack, Telegram, Discord, Linear, Gmail, Claude Code sessions, Markdown files, WhatsApp), indexes messages, detects patterns via signals, and surfaces actionable insights.

**Runtime:** Bun + TypeScript | **DB:** SQLite (WAL mode, FTS5, sqlite-vec) | **Embeddings:** Ollama + nomic-embed-text | **Version:** 0.1.0
**Runtime:** Bun + TypeScript | **DB:** SQLite (WAL mode, FTS5, sqlite-vec) | **Embeddings:** node-llama-cpp (Qwen3-Embedding-0.6B), Ollama fallback | **Version:** 0.2.0

**Project:** `/Users/dandaka/projects/traul`

Expand All @@ -39,10 +39,10 @@ Sync messages from communication sources incrementally.

### `traul search <query>`

Hybrid search combining vector similarity (semantic) and FTS5 keyword matching with Reciprocal Rank Fusion. Falls back to FTS-only if Ollama is unavailable.
Hybrid search combining vector similarity (semantic) and FTS5 keyword matching with Reciprocal Rank Fusion. Falls back to FTS-only if embedding is unavailable.

**Search modes:**
- **Hybrid (default)** — best for multi-word and exploratory queries. Finds semantically related messages even when exact keywords don't appear. Requires Ollama running with `snowflake-arctic-embed2`. Prints coverage ratio to stderr (e.g. `88% vector, 12% FTS`). Falls back to FTS-only with a warning if Ollama is unavailable.
- **Hybrid (default)** — best for multi-word and exploratory queries. Finds semantically related messages even when exact keywords don't appear. Uses node-llama-cpp with Qwen3-Embedding-0.6B (auto-downloads ~639MB model on first use). Falls back to Ollama, then FTS-only. Prints coverage ratio to stderr (e.g. `88% vector, 12% FTS`).
- **FTS-only (`--fts`)** — keyword matching with BM25 ranking. Faster, but requires ALL terms to match (implicit AND). Brittle with multi-word queries, especially combined with source/channel filters.
- **OR mode (`--or`)** — joins search terms with OR instead of AND. Works with both `--fts` and hybrid. Use for broad exploratory queries where any term is relevant.
- **Substring (`--like`)** — bypasses FTS entirely, uses SQL LIKE. Useful for exact phrases that FTS tokenization breaks (e.g. "how do I").
Expand Down Expand Up @@ -163,6 +163,19 @@ Structured overview with three sections:
2. **Stats** — total messages, channels, contacts, active signals
3. **Volume** — last 7 days message bar chart

### `traul reset`

Reset a data layer to force regeneration. Useful when you need to re-sync, re-chunk, or re-embed data.

| Subcommand | Description |
|------------|-------------|
| `traul reset sync [--source <source>]` | Clear sync cursors; full refetch on next sync. Optional `--source` flag filters to a specific connector (e.g., `markdown`, `slack`). |
| `traul reset chunks` | Delete all chunks and embeddings; rechunk on next sync. |
| `traul reset embed` | Drop and recreate vector tables; re-embed with `traul embed`. |
| `traul reset all` | Reset everything: sync cursors + chunks + embeddings. |

**Auto-migration:** Traul automatically detects version changes on startup. If the chunking algorithm or embedding model/dimensions change between versions, affected data layers are reset automatically. No manual action needed after upgrading.

### Global Options

| Option | Description |
Expand Down Expand Up @@ -286,7 +299,7 @@ SQL-based pattern detection engine.
|-------|---------|
| `messages` | Primary message store (source, channel, author, content, sent_at, metadata JSON) |
| `messages_fts` | FTS5 virtual table (content, author_name, channel_name) with porter tokenizer |
| `vec_messages` | sqlite-vec virtual table for vector embeddings (float[768]) |
| `vec_messages` | sqlite-vec virtual table for vector embeddings (float[1024]) |
| `contacts` | Unified contact directory (display_name unique) |
| `contact_identities` | Multi-source user mapping (source + source_user_id unique) |
| `sync_cursors` | Incremental sync state per source+key |
Expand Down
36 changes: 36 additions & 0 deletions src/commands/reset.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import type { TraulDB } from "../db/database";
import { EMBED_DIMS } from "../lib/embeddings";

type Layer = "sync" | "chunks" | "embed" | "all";

const VALID_LAYERS: Layer[] = ["sync", "chunks", "embed", "all"];

export function runReset(
db: TraulDB,
layer: string,
options: { source?: string }
): void {
if (!VALID_LAYERS.includes(layer as Layer)) {
throw new Error(`Unknown layer: ${layer}. Valid layers: ${VALID_LAYERS.join(", ")}`);
}

const doSync = layer === "sync" || layer === "all";
const doChunks = layer === "chunks" || layer === "all";
const doEmbed = layer === "embed" || layer === "all" || layer === "chunks";

if (doSync) {
db.resetSyncCursors(options.source);
const scope = options.source ? `${options.source} sync cursors` : "all sync cursors";
console.log(`Reset ${scope}. Run 'traul sync' to refetch.`);
}

if (doChunks) {
db.resetChunks();
console.log("Reset all chunks. They will be regenerated on next 'traul sync' or 'traul embed'.");
}

if (doEmbed) {
db.resetEmbeddings(EMBED_DIMS);
console.log("Reset all embeddings. Run 'traul embed' to regenerate.");
}
}
6 changes: 3 additions & 3 deletions src/commands/search.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import type { TraulDB } from "../db/database";
import { formatMessage, writeJSON } from "../lib/formatter";
import { embed, vecToBytes } from "../lib/embeddings";
import { embedQuery, vecToBytes } from "../lib/embeddings";

export async function runSearch(
db: TraulDB,
Expand Down Expand Up @@ -45,15 +45,15 @@ export async function runSearch(
results = db.ftsSearchAll(ftsQuery, searchOpts);
} else {
try {
const vec = await embed(query);
const vec = await embedQuery(query);
results = db.hybridSearchAll(vecToBytes(vec), ftsQuery, searchOpts);
const { total_messages, embedded_messages } = db.getEmbeddingStats();
const pct = total_messages > 0 ? Math.round((embedded_messages / total_messages) * 100) : 0;
if (pct < 100) {
console.warn(`search: hybrid mode — ${pct}% vector, ${100 - pct}% FTS`);
}
} catch {
console.warn("search: Ollama unavailable, falling back to FTS-only");
console.warn("search: embedding unavailable, falling back to FTS-only");
results = db.ftsSearchAll(ftsQuery, searchOpts);
}
}
Expand Down
29 changes: 29 additions & 0 deletions src/db/database.ts
Original file line number Diff line number Diff line change
Expand Up @@ -817,6 +817,35 @@ export class TraulDB {
this.db.run("DELETE FROM sync_cursors WHERE source = ? AND key = ?", [source, key]);
}

resetSyncCursors(source?: string): void {
if (source) {
this.db.run("DELETE FROM sync_cursors WHERE source = ?", [source]);
} else {
this.db.run("DELETE FROM sync_cursors");
}
}

resetChunks(): void {
this.db.run("DELETE FROM vec_chunks");
this.db.run("DELETE FROM chunks");
}

getMeta(key: string): string | null {
const row = this.db
.query<{ value: string }, [string]>(
"SELECT value FROM traul_meta WHERE key = ?"
)
.get(key);
return row?.value ?? null;
}

setMeta(key: string, value: string): void {
this.db.run(
"INSERT INTO traul_meta (key, value) VALUES (?, ?) ON CONFLICT(key) DO UPDATE SET value = excluded.value",
[key, value]
);
}

close(): void {
this.db.close();
}
Expand Down
63 changes: 63 additions & 0 deletions src/db/migrations.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import type { TraulDB } from "./database";
import { CHUNKER_VERSION } from "../lib/chunker";
import { EMBED_MODEL, EMBED_DIMS } from "../lib/embeddings";
import * as log from "../lib/logger";

export interface MigrationResult {
chunksReset: boolean;
embeddingsReset: boolean;
syncCursorsReset: boolean;
}

export function runMigrations(db: TraulDB): MigrationResult {
const result: MigrationResult = {
chunksReset: false,
embeddingsReset: false,
syncCursorsReset: false,
};

const storedChunkerVersion = db.getMeta("chunker_version");
const storedEmbedModel = db.getMeta("embed_model");
const storedEmbedDims = db.getMeta("embed_dims");

const currentDims = String(EMBED_DIMS);

// Chunker version change → reset chunks + embeddings + markdown cursors
if (storedChunkerVersion !== null && storedChunkerVersion !== CHUNKER_VERSION) {
log.info(`Chunker updated (v${storedChunkerVersion} → v${CHUNKER_VERSION}), rechunking on next sync...`);
db.resetChunks();
db.resetEmbeddings(EMBED_DIMS);
db.resetSyncCursors("markdown");
result.chunksReset = true;
result.embeddingsReset = true;
result.syncCursorsReset = true;
}

// Embed model or dims change → reset embeddings only
if (
!result.embeddingsReset &&
storedEmbedModel !== null &&
(storedEmbedModel !== EMBED_MODEL || storedEmbedDims !== currentDims)
) {
const reason =
storedEmbedModel !== EMBED_MODEL
? `model changed (${storedEmbedModel} → ${EMBED_MODEL})`
: `dimensions changed (${storedEmbedDims} → ${currentDims})`;
log.info(`Embedding ${reason}, re-embed with 'traul embed'...`);
db.resetEmbeddings(EMBED_DIMS);
result.embeddingsReset = true;
}

// Update stored values only if changed (avoid unnecessary writes that cause SQLITE_BUSY)
if (storedChunkerVersion !== CHUNKER_VERSION) {
db.setMeta("chunker_version", CHUNKER_VERSION);
}
if (storedEmbedModel !== EMBED_MODEL) {
db.setMeta("embed_model", EMBED_MODEL);
}
if (storedEmbedDims !== currentDims) {
db.setMeta("embed_dims", currentDims);
}

return result;
}
5 changes: 5 additions & 0 deletions src/db/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@ const SCHEMA_SQL = `
INSERT INTO chunks_fts(chunks_fts, rowid, content) VALUES ('delete', old.id, old.content);
INSERT INTO chunks_fts(rowid, content) VALUES (new.id, new.content);
END;

CREATE TABLE IF NOT EXISTS traul_meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
`;

export function initializeDatabase(path: string): Database {
Expand Down
23 changes: 17 additions & 6 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,20 @@ import { runWhatsAppAuth } from "./commands/whatsapp-auth";
import { runDaemonStart, runDaemonStop, runDaemonStatus } from "./commands/daemon";
import { runSql, runSchema } from "./commands/sql";
import { runGet } from "./commands/get";
import { runReset } from "./commands/reset";
import { runMigrations } from "./db/migrations";

const config = loadConfig();
ensureDbDir(config.database.path);
const db = new TraulDB(config.database.path);
runMigrations(db);

const program = new Command();

program
.name("traul")
.description("Traul — Personal Intelligence Engine")
.version("0.1.0")
.version("0.2.0")
.option("-v, --verbose", "enable verbose output")
.hook("preAction", () => {
if (program.opts().verbose) {
Expand Down Expand Up @@ -137,14 +140,22 @@ program
db.close();
});

program
.command("reset")
.description("Reset a data layer (sync, chunks, embed, all)")
.argument("<layer>", "layer to reset: sync, chunks, embed, all")
.option("-s, --source <source>", "filter by source (for sync layer)")
.action(async (layer: string, options) => {
runReset(db, layer, options);
db.close();
});

program
.command("reset-embed")
.description("Drop all embeddings and recreate vec tables (run 'embed' after to regenerate)")
.description("(deprecated: use 'traul reset embed') Drop all embeddings")
.action(async () => {
const { EMBED_DIMS } = await import("./lib/embeddings");
console.log(`Resetting vec tables to ${EMBED_DIMS} dimensions...`);
db.resetEmbeddings(EMBED_DIMS);
console.log("Done. Run 'traul embed' to regenerate embeddings.");
console.log("Note: 'reset-embed' is deprecated, use 'traul reset embed' instead.");
runReset(db, "embed", {});
db.close();
});

Expand Down
1 change: 1 addition & 0 deletions src/lib/chunker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ export interface Chunk {
const DEFAULT_CHUNK_SIZE = 1500;
const DEFAULT_OVERLAP = 200;
export const CHUNK_THRESHOLD = 2000;
export const CHUNKER_VERSION = "2";

export function shouldChunk(text: string, threshold: number = CHUNK_THRESHOLD): boolean {
return text.length > threshold;
Expand Down
Loading
Loading