Token-efficiency primitives for AI coding agents — genome RAG, multi-tier context compression, provider-aware budgeting, and prompt-caching helpers.
Extracted from ashlrcode and shared by the ashlrcode CLI and the ashlr-plugin for Claude Code.
Platform support: macOS / Linux / Windows on Bun >= 1.0 and Node >= 20.
| Subpath | LOC | Purpose |
|---|---|---|
/genome |
~2,342 | Self-evolving project specs via RAG + scribe protocol. Manifest CRUD, TF-IDF/Ollama retrieval, fitness-based strategy evolution, mutation audit trail. |
/compression |
~470 | 3-tier context compression: autoCompact (LLM summarize old turns), snipCompact (truncate tool results > 2 KB), contextCollapse (drop short/dup). PromptPriority enum. |
/budget |
~50 | Provider-aware prompt budgeting. getProviderContextLimit, systemPromptBudget. |
/tokens |
~50 | Token estimation: estimateTokensFromString, estimateTokensFromMessages. |
/anthropic |
~200 | Anthropic SDK helpers: withGenome, cacheBreakpoints, ashlrMcpConfig. |
/session-log |
~150 | Structured session event log (tool calls, costs, savings). |
/local |
~120 | Context-window manager for small-context local models. |
/types |
~60 | Shared types: Message, ContentBlock, LLMSummarizer, StreamEvent. |
# Bun (primary runtime)
bun add @ashlr/core-efficiency
# npm / pnpm / yarn
npm install @ashlr/core-efficiencyFor local development against a checkout:
bun add file:../ashlr-core-efficiencyThe package ships TypeScript source in src/. Bun runs it directly. For Node.js, compile first with bun run build (outputs to dist/).
import {
autoCompact,
snipCompact,
contextCollapse,
PromptPriority,
} from "@ashlr/core-efficiency/compression";
// Truncate any tool result that exceeds 2 KB (head + tail elided middle).
const trimmed = snipCompact(messages, { maxBytes: 2048 });
// Drop short or duplicate messages to reduce prompt size.
const collapsed = contextCollapse(messages);
// LLM-summarize old turns when approaching the context limit.
const compacted = await autoCompact(messages, summarizer, {
targetTokens: 50_000,
priority: PromptPriority.High,
});import {
getProviderContextLimit,
systemPromptBudget,
} from "@ashlr/core-efficiency/budget";
const limit = getProviderContextLimit("anthropic"); // 200_000
const budget = systemPromptBudget("anthropic", 0.05, 50_000); // 5% floor, 50K capimport {
estimateTokensFromString,
estimateTokensFromMessages,
} from "@ashlr/core-efficiency/tokens";
const n = estimateTokensFromString("Hello, world!");
const total = estimateTokensFromMessages(messages); // walks ContentBlock[] incl. tool resultsimport {
retrieveSectionsV2,
injectGenomeContext,
genomeExists,
} from "@ashlr/core-efficiency/genome";
if (await genomeExists(process.cwd())) {
const sections = await retrieveSectionsV2("architecture overview", process.cwd(), {
maxTokens: 2000,
});
const system = injectGenomeContext(baseSystem, sections);
}import Anthropic from "@anthropic-ai/sdk";
import { withGenome, cacheBreakpoints } from "@ashlr/core-efficiency/anthropic";
const client = new Anthropic();
const system = await withGenome("You are a senior engineer.", process.cwd());
const req = cacheBreakpoints({
system,
messages: [
{ role: "user", content: projectContext, cache: true },
{ role: "user", content: "What does login.ts do?" },
],
});
await client.messages.create({ ...req, model: "claude-sonnet-4-6", max_tokens: 1024 });For stdio MCP tools via the Agent SDK:
import { query } from "@anthropic-ai/claude-agent-sdk";
import { ashlrMcpConfigRecord } from "@ashlr/core-efficiency/anthropic";
const mcpServers = ashlrMcpConfigRecord({ plugins: ["efficiency"] });
for await (const msg of query({ prompt: "...", options: { mcpServers } })) {
// ...
}See examples/anthropic-sdk/ for runnable scenarios.
import { SessionLog } from "@ashlr/core-efficiency/session-log";
const log = new SessionLog();
log.record({ type: "tool_call", tool: "ashlr__read", inputTokens: 120 });
console.log(log.summary());import { LocalContextManager } from "@ashlr/core-efficiency/local";
const mgr = new LocalContextManager({ contextWindow: 4096 });
const messages = mgr.fit(allMessages); // drops oldest turns to stay within windowRoot barrel export (all subpaths re-exported):
import {
autoCompact,
getProviderContextLimit,
retrieveSectionsV2,
estimateTokensFromString,
} from "@ashlr/core-efficiency";| Runtime | macOS | Linux | Windows |
|---|---|---|---|
| Bun >= 1.0 | Yes | Yes | Yes |
| Node >= 20 | Yes (compile first) | Yes (compile first) | Yes (compile first) |
Path separators are normalized internally; no Unix-only assumptions.
bun install
bun test # ~17 unit tests (budget + tokens)
bun run typecheck
bun run build # emit dist/ for Node consumersIntegration tests live in the ashlrcode repo where 700+ tests run against real-world consumers.
LLMSummarizerinterface:autoCompactand genomescribedepend on a minimal{ stream(ProviderRequest): AsyncGenerator<StreamEvent> }contract, not a concrete router. Consumers inject their own provider.PromptPriorityenum: 12 named slots (Core=0throughUndercover=95). Numeric values are stable — raw-int callers continue to work across versions.estimateTokens: previously duplicated in three places. Now one implementation, two entry points:FromStringandFromMessages(walksContentBlock[]includingtool_use/tool_result).- Source-first exports:
mainandexportspoint tosrc/. Bun resolves.tsimports directly. For Node.js, runbun run buildand consume fromdist/. Amodulefield mirrorsmainfor bundlers that inspect it.
Follows semver. Breaking changes (removed exports, changed interfaces) go to major versions. Additive exports and bug fixes are minor/patch.
MIT — see LICENSE.