Skip to content

ChainWatch is a flight data recorder for multi-step AI systems. It's a CLI-based tool that records every step in an AI decision chain, links them together in order, prevents tampering, and allows you to verify the chain's integrity and replay the full decision flow.

Notifications You must be signed in to change notification settings

Tarunjit45/ChainWatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

ChainWatch

ChainWatch is a flight data recorder for multi-step AI systems. It's a CLI-based tool that records every step in an AI decision chain, links them together in order, prevents tampering, and allows you to verify the chain's integrity and replay the full decision flow.

Getting Started

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/chainwatch.git
    cd chainwatch
  2. Install dependencies using Poetry: Make sure you have Poetry installed.

    poetry install
  3. Activate the virtual environment:

    poetry shell

Why are multi-step AI systems hard to debug?

Modern AI systems are often not a single call to a language model. They are a chain of steps: a prompt from a user, a call to an LLM, a call to a tool, another call to an LLM, and so on. When something goes wrong in this chain, it's difficult to pinpoint the exact step that caused the failure. Logs are often scattered across different systems, and the context of the decision is lost. This makes it nearly impossible to trust the system, debug it effectively, or ensure compliance.

What is an AI decision chain?

An AI decision chain is a sequence of events that represents the steps an AI system takes to reach a decision. Each event in the chain is a single step, such as a prompt, a tool call, or a final decision. Each event is linked to its parent, creating an ordered, tamper-proof record of the AI's reasoning process.

How ChainWatch records decisions

ChainWatch records each event in a decision chain to a .jsonl file in the chains/ directory. Each event is a JSON object with a specific structure, including a unique event ID, a chain ID, the event type, inputs, outputs, a timestamp, and a parent event ID.

To ensure the integrity of the chain, ChainWatch uses a hash chain. The hash of each event is calculated based on its content and the hash of its parent event. This creates a cryptographic link between events, making it impossible to tamper with an event without invalidating the entire chain.

How replay and verification work

Verification

The verify command checks the integrity of a chain. It looks for:

  • Missing events: Are there any gaps in the chain?
  • Broken parent links: Is the parent-child relationship between events intact?
  • Hash mismatches: Has any event been tampered with?
  • Out-of-order timestamps: Were events recorded in the correct chronological order?

If any of these checks fail, ChainWatch will report a verification failure and explain the cause.

Replay

The replay command allows you to see the step-by-step execution of a decision chain. It prints out each event in the chain in the order it occurred, giving you a clear view of the AI's reasoning process.

Real-world failure scenario

Imagine a customer service bot that uses a tool to check a user's order status. The bot takes the user's request, calls the order status tool, and then generates a response.

One day, a user complains that the bot gave them the wrong order status. Without a tool like ChainWatch, it would be difficult to determine what went wrong. Was the user's initial request misunderstood? Did the order status tool return the wrong information? Did the bot misinterpret the tool's output?

With ChainWatch, you could simply replay the decision chain for that interaction. You would see the user's initial prompt, the exact data passed to the order status tool, the tool's raw output, and the final response generated by the bot. If the tool returned the wrong information, the hash of that event would not match the rest of the chain, and the verify command would immediately flag the anomaly.

Usage

Recording an event

To record an event, you need to create a JSON file that represents the event. For example, examples/valid_chain_event_1.json:

{
  "event_id": "step_1",
  "chain_id": "chain_001",
  "type": "prompt",
  "name": "user_request",
  "input": {
    "user_query": "What is the credit score for user 123?"
  },
  "timestamp": "2026-01-22T10:30:10Z"
}

Then, use the record command to add it to the chain:

chainwatch record examples/valid_chain_event_1.json

The event will be appended to chains/chain_001.jsonl.

Verifying a chain

To verify the integrity of a chain, use the verify command with the chain ID:

chainwatch verify chain_001

Success Output:

Chain Verification: ✅ PASSED

Failure Output:

Chain Verification: ❌ FAILED
Reason: Hash mismatch at event 'step_2_tampered'. Expected '...', but found '...'.

Failure Cause:
The error occurred at event: step_2_tampered
...

Replaying a chain

To see a step-by-step replay of a chain, use the replay command:

chainwatch replay chain_001

Output:

Step 1: prompt - user_request
  Input: {'user_query': 'What is the credit score for user 123?'}
Step 2: tool_call - credit_checker
  Input: {'user_id': '123'}
  Output: {'score': 782}
Step 3: decision - final_answer
  Input: {'prompt': 'The user with id 123 has a credit score of 782. What is the final answer?'}
  Output: {'answer': 'The credit score for user 123 is 782.'}

About

ChainWatch is a flight data recorder for multi-step AI systems. It's a CLI-based tool that records every step in an AI decision chain, links them together in order, prevents tampering, and allows you to verify the chain's integrity and replay the full decision flow.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages