Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
81311d9
Initial plan
Copilot Mar 2, 2026
cf4f9ff
feat: initial scaffold and core implementation of agent-kernel
Copilot Mar 2, 2026
b8cfb22
fix: add explicit permissions to CI workflow (CodeQL alert)
Copilot Mar 2, 2026
68760bc
fix: replace bare ValueError with CapabilityAlreadyRegistered in regi…
dgenio Mar 4, 2026
aa690f1
fix: wire default_timeout as fallback in HTTPDriver.execute()
dgenio Mar 4, 2026
7fedd66
fix: prevent IndexError on empty rows in HandleStore.expand()
dgenio Mar 4, 2026
32da6bc
fix: include full token and audit_id in CapabilityGrant
dgenio Mar 4, 2026
b27b1a9
fix: validate max_rows constraint and raise PolicyDenied on invalid i…
dgenio Mar 4, 2026
5dc0b01
chore: remove unused _deep_copy_truncated and import copy
dgenio Mar 4, 2026
8b7fa79
fix: avoid json.loads on truncated JSON in raw mode transform
dgenio Mar 4, 2026
994260f
fix: record effective response_mode in ActionTrace instead of requested
dgenio Mar 4, 2026
1cc5929
docs: fix Kernel docstring example to include principal arg
dgenio Mar 4, 2026
610a18f
fix: require justification for DESTRUCTIVE operations
dgenio Mar 4, 2026
bc75e42
fix: remove duplicate Budgets class, consolidate to firewall.budgets
dgenio Mar 4, 2026
ccb5dd6
fix: tighten _PHONE_RE to require phone-like structure, add regex tests
dgenio Mar 4, 2026
c325985
fix: bound HandleStore with max_entries cap and periodic auto-eviction
dgenio Mar 4, 2026
e6f8c38
refactor: deduplicate get_token via grant_capability, bring kernel.py…
dgenio Mar 4, 2026
22f61a8
fix: add threading.Lock to _get_secret, fix test_dev_secret state lea…
dgenio Mar 4, 2026
692e06f
style: fix ruff format for get_token return statement
dgenio Mar 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: CI

on:
push:
branches: ["main", "copilot/**"]
pull_request:
branches: ["main"]

jobs:
test:
name: "Python ${{ matrix.python-version }}"
runs-on: ubuntu-latest
permissions:
contents: read
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: pip install -e ".[dev]"

- name: Lint (ruff check)
run: ruff check src/ tests/ examples/

- name: Format check (ruff format)
run: ruff format --check src/ tests/ examples/

- name: Type check (mypy)
run: mypy src/

- name: Test (pytest)
run: python -m pytest -q --cov=agent_kernel --cov-report=term-missing

- name: Examples
run: |
python examples/basic_cli.py
python examples/billing_demo.py
python examples/http_driver_demo.py
38 changes: 38 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# AGENTS.md — AI Agent Instructions

This file provides instructions for AI coding agents (Copilot, Cursor, etc.) working in this repository.

## Repo layout

```
src/agent_kernel/ — library source (one module per concern, ≤300 lines each)
tests/ — pytest test suite
examples/ — runnable demos (no internet required)
docs/ — architecture and security documentation
```

## Quality bar

- `make ci` must pass before every commit.
- All public interfaces need type hints and docstrings.
- Use custom exceptions from `errors.py` — never bare `ValueError` or `KeyError`.
- Keep modules ≤ 300 lines. Split if needed.
- No randomness in matching, routing, or summarization (deterministic outputs).

## Security rules

- Never log or print secret key material.
- HMAC secrets come from `AGENT_KERNEL_SECRET` env var; fall back to a random dev secret with a logged warning.
- Tokens are tamper-evident (HMAC-SHA256) but not encrypted — document this.
- Confused-deputy prevention: tokens bind to `principal_id + capability_id + constraints`.

## Adding a new capability driver

1. Implement the `Driver` protocol in `src/agent_kernel/drivers/`.
2. Register it with `StaticRouter` or implement a custom `Router`.
3. Add integration tests in `tests/test_drivers.py`.

## Adding a new policy rule

1. Add the rule to `DefaultPolicyEngine.evaluate()` in `policy.py`.
2. Cover it with a test in `tests/test_policy.py`.
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.1.0] - 2024-01-01

### Added
- Initial scaffold: `CapabilityRegistry`, `PolicyEngine`, `HMACTokenProvider`, `Kernel`.
- `InMemoryDriver` and `HTTPDriver` (httpx-based).
- Context `Firewall` with `Budgets`, redaction, and summarization.
- `HandleStore` with TTL, pagination, field selection, and basic filtering.
- `TraceStore` and `explain()` for full audit trail.
- Examples: `basic_cli.py`, `billing_demo.py`, `http_driver_demo.py`.
- Documentation: architecture, security model, integrations, capabilities, context firewall.
- CI pipeline for Python 3.10, 3.11, 3.12 with ruff + mypy + pytest.
36 changes: 36 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Contributing to agent-kernel

Thank you for your interest in contributing!

## Development setup

```bash
git clone https://github.com/dgenio/agent-kernel.git
cd agent-kernel
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```

## Running checks

```bash
make fmt # auto-format with ruff
make lint # lint with ruff
make type # type-check with mypy
make test # run pytest with coverage
make ci # all of the above + examples
```

## Pull request guidelines

1. Keep PRs focused — one logical change per PR.
2. Add or update tests for every behaviour change.
3. All checks in `make ci` must pass.
4. Follow the existing code style (ruff-enforced).
5. Write docstrings on all public interfaces.

## Security

Please report security vulnerabilities privately via GitHub Security Advisories.
Do **not** open a public issue for a security bug.
20 changes: 20 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.PHONY: fmt lint type test example ci

fmt:
ruff format src/ tests/ examples/

lint:
ruff check src/ tests/ examples/

type:
mypy src/

test:
python -m pytest -q --cov=agent_kernel

example:
python examples/basic_cli.py
python examples/billing_demo.py
python examples/http_driver_demo.py

ci: fmt lint type test example
140 changes: 139 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,140 @@
# agent-kernel
Python library implementing a capability-based security kernel for AI agents operating in large tool ecosystems (MCP, A2A). Provides capability tokens, HMAC-signed authorization, policy engine, context firewall with budget enforcement, and pluggable drivers — so agents can safely use 1000+ tools without context blowup.

[![CI](https://github.com/dgenio/agent-kernel/actions/workflows/ci.yml/badge.svg)](https://github.com/dgenio/agent-kernel/actions/workflows/ci.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

A capability-based security kernel for AI agents operating in large tool ecosystems (MCP, A2A, 1000+ tools).

## 30-second pitch

Modern AI agents face three hard problems when given access to hundreds or thousands of tools:

1. **Context blowup** — raw tool output floods the LLM context window.
2. **Tool-space interference** — agents accidentally invoke the wrong tool or escalate privileges.
3. **No audit trail** — there's no record of what ran, when, and why.

`agent-kernel` solves all three with a thin, composable layer that sits above your tool execution layer:

- **Capability Tokens** — HMAC-signed, time-bounded, principal-scoped. No token → no execution.
- **Policy Engine** — READ/WRITE/DESTRUCTIVE safety classes + PII/PCI sensitivity handling.
- **Context Firewall** — raw driver output is *never* returned to the LLM; always a bounded `Frame`.
- **Audit Trail** — every invocation creates an `ActionTrace` retrievable via `kernel.explain()`.

## Architecture

```mermaid
graph LR
LLM["LLM / Agent"] -->|goal| K["Kernel"]
K -->|search| REG["Registry"]
K -->|evaluate| POL["Policy Engine"]
K -->|sign| TOK["HMAC Token"]
K -->|route| DRV["Driver (MCP/HTTP/Memory)"]
DRV -->|RawResult| FW["Context Firewall"]
FW -->|Frame| LLM
K -->|record| AUD["Audit Trace"]
```

## Quickstart

```bash
pip install agent-kernel
```

```python
import asyncio, os
os.environ["AGENT_KERNEL_SECRET"] = "my-secret"

from agent_kernel import (
Capability, CapabilityRegistry, HMACTokenProvider,
InMemoryDriver, Kernel, Principal, SafetyClass, StaticRouter,
)
from agent_kernel.drivers.base import ExecutionContext
from agent_kernel.models import CapabilityRequest

# 1. Register a capability
registry = CapabilityRegistry()
registry.register(Capability(
capability_id="tasks.list",
name="List Tasks",
description="List all tasks",
safety_class=SafetyClass.READ,
tags=["tasks", "list"],
))

# 2. Wire up a driver
driver = InMemoryDriver()
driver.register_handler("tasks.list", lambda ctx: [{"id": 1, "title": "Buy milk"}])

# 3. Build the kernel
kernel = Kernel(registry=registry, router=StaticRouter(routes={"tasks.list": ["memory"]}))
kernel.register_driver(driver)

async def main():
principal = Principal(principal_id="alice", roles=["reader"])

# 4. Discover → grant → invoke → expand → explain
token = kernel.get_token(
CapabilityRequest(capability_id="tasks.list", goal="list tasks"),
principal, justification="",
)
frame = await kernel.invoke(token, principal=principal, args={})
print(frame.facts) # ['Total rows: 1', 'Top keys: id, title', ...]
print(frame.handle) # Handle(handle_id='...', ...)

expanded = kernel.expand(frame.handle, query={"limit": 1, "fields": ["title"]})
print(expanded.table_preview) # [{'title': 'Buy milk'}]

trace = kernel.explain(frame.action_id)
print(trace.driver_id) # 'memory'

asyncio.run(main())
```

## Where it fits

```
┌─────────────────────────────────────────────┐
│ LLM / Agent loop │
├─────────────────────────────────────────────┤
│ agent-kernel ← you are here │
│ (registry · policy · tokens · firewall) │
├────────────────┬────────────────────────────┤
│ contextweaver │ tool execution layer │
│ (context │ (MCP · HTTP · A2A · │
│ compilation) │ internal APIs) │
└────────────────┴────────────────────────────┘
```

`agent-kernel` sits **above** `contextweaver` (context compilation) and **above** raw tool execution. It provides the authorization, execution, and audit layer.

## Security disclaimers

> **v0.1 is not production-hardened for real authentication.**

- HMAC tokens are tamper-evident (SHA-256) but **not encrypted**. Do not put sensitive data in token fields.
- Set `AGENT_KERNEL_SECRET` to a strong random value in production. If unset, a random dev secret is generated per-process with a warning.
- PII redaction is heuristic (regex). It is not a substitute for proper data governance.
- See [docs/security.md](docs/security.md) for the full threat model.

## Documentation

- [Architecture](docs/architecture.md)
- [Security model](docs/security.md)
- [Integrations (MCP, HTTPDriver)](docs/integrations.md)
- [Designing capabilities](docs/capabilities.md)
- [Context Firewall](docs/context_firewall.md)

## Development

```bash
git clone https://github.com/dgenio/agent-kernel
cd agent-kernel
pip install -e ".[dev]"
make ci # fmt + lint + type + test + examples
```

## License

Apache-2.0 — see [LICENSE](LICENSE).

70 changes: 70 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Architecture

## Overview

`agent-kernel` is a capability-based security kernel that sits **above** raw tool execution (MCP, HTTP APIs, internal services) and **below** the LLM context window.

```mermaid
graph TD
LLM["LLM / Agent"] -->|goal text| K["Kernel"]
K -->|search| REG["CapabilityRegistry"]
REG -->|CapabilityRequest| K
K -->|evaluate| POL["PolicyEngine"]
POL -->|PolicyDecision| K
K -->|issue| TOK["TokenProvider (HMAC)"]
TOK -->|CapabilityToken| K
K -->|route| ROU["Router"]
ROU -->|RoutePlan| K
K -->|execute| DRV["Driver (Memory / HTTP / MCP)"]
DRV -->|RawResult| K
K -->|transform| FW["Firewall"]
FW -->|Frame| K
K -->|store| HS["HandleStore"]
K -->|record| TS["TraceStore"]
K -->|Frame| LLM
```

## Components

### Kernel
The central orchestrator. Wires all components together and exposes five methods:
- `request_capabilities(goal)` — discover relevant capabilities
- `grant_capability(request, principal, justification)` — policy check + token issuance
- `invoke(token, principal, args, response_mode)` — execute + firewall + trace
- `expand(handle, query)` — paginate/filter stored results
- `explain(action_id)` — retrieve audit trace

### CapabilityRegistry
A flat dict of `Capability` objects indexed by `capability_id`. Provides keyword-based search (no LLM, no vector DB — purely token overlap scoring).

### PolicyEngine
The `DefaultPolicyEngine` implements role-based rules:
1. **READ** — always allowed
2. **WRITE** — requires `justification ≥ 15 chars` + role `writer|admin`
3. **DESTRUCTIVE** — requires role `admin`
4. **PII/PCI** — requires `tenant` attribute; enforces `allowed_fields` unless `pii_reader`
5. **max_rows** — 50 (user), 500 (service)

### TokenProvider (HMAC)
Issues HMAC-SHA256 signed tokens. Each token is bound to `principal_id + capability_id + constraints`. Verification checks: expiry → signature → principal → capability.

### Router
`StaticRouter` maps `capability_id → [driver_id, ...]`. First driver that succeeds wins; others are tried as fallbacks.

### Drivers
- **InMemoryDriver** — Python callables, used for tests and demos
- **HTTPDriver** — `httpx`-based async HTTP client
- (Future) **MCPDriver** — adapter for Model Context Protocol tool servers

### Firewall
Transforms `RawResult → Frame`. Never exposes raw output to the LLM.
- Four response modes: `summary`, `table`, `handle_only`, `raw`
- Enforces `Budgets` (max_rows, max_fields, max_chars, max_depth)
- Redacts sensitive fields and inline PII patterns
- Deterministic summarisation (no LLM)

### HandleStore
Stores full results by opaque handle ID with TTL. `expand()` supports pagination, field selection, and basic equality filtering.

### TraceStore
Records every `ActionTrace`. `explain(action_id)` returns the full audit record.
Loading