ChainWatch is a flight data recorder for multi-step AI systems. It's a CLI-based tool that records every step in an AI decision chain, links them together in order, prevents tampering, and allows you to verify the chain's integrity and replay the full decision flow.
-
Clone the repository:
git clone https://github.com/your-username/chainwatch.git cd chainwatch -
Install dependencies using Poetry: Make sure you have Poetry installed.
poetry install
-
Activate the virtual environment:
poetry shell
Modern AI systems are often not a single call to a language model. They are a chain of steps: a prompt from a user, a call to an LLM, a call to a tool, another call to an LLM, and so on. When something goes wrong in this chain, it's difficult to pinpoint the exact step that caused the failure. Logs are often scattered across different systems, and the context of the decision is lost. This makes it nearly impossible to trust the system, debug it effectively, or ensure compliance.
An AI decision chain is a sequence of events that represents the steps an AI system takes to reach a decision. Each event in the chain is a single step, such as a prompt, a tool call, or a final decision. Each event is linked to its parent, creating an ordered, tamper-proof record of the AI's reasoning process.
ChainWatch records each event in a decision chain to a .jsonl file in the chains/ directory. Each event is a JSON object with a specific structure, including a unique event ID, a chain ID, the event type, inputs, outputs, a timestamp, and a parent event ID.
To ensure the integrity of the chain, ChainWatch uses a hash chain. The hash of each event is calculated based on its content and the hash of its parent event. This creates a cryptographic link between events, making it impossible to tamper with an event without invalidating the entire chain.
The verify command checks the integrity of a chain. It looks for:
- Missing events: Are there any gaps in the chain?
- Broken parent links: Is the parent-child relationship between events intact?
- Hash mismatches: Has any event been tampered with?
- Out-of-order timestamps: Were events recorded in the correct chronological order?
If any of these checks fail, ChainWatch will report a verification failure and explain the cause.
The replay command allows you to see the step-by-step execution of a decision chain. It prints out each event in the chain in the order it occurred, giving you a clear view of the AI's reasoning process.
Imagine a customer service bot that uses a tool to check a user's order status. The bot takes the user's request, calls the order status tool, and then generates a response.
One day, a user complains that the bot gave them the wrong order status. Without a tool like ChainWatch, it would be difficult to determine what went wrong. Was the user's initial request misunderstood? Did the order status tool return the wrong information? Did the bot misinterpret the tool's output?
With ChainWatch, you could simply replay the decision chain for that interaction. You would see the user's initial prompt, the exact data passed to the order status tool, the tool's raw output, and the final response generated by the bot. If the tool returned the wrong information, the hash of that event would not match the rest of the chain, and the verify command would immediately flag the anomaly.
To record an event, you need to create a JSON file that represents the event. For example, examples/valid_chain_event_1.json:
{
"event_id": "step_1",
"chain_id": "chain_001",
"type": "prompt",
"name": "user_request",
"input": {
"user_query": "What is the credit score for user 123?"
},
"timestamp": "2026-01-22T10:30:10Z"
}Then, use the record command to add it to the chain:
chainwatch record examples/valid_chain_event_1.jsonThe event will be appended to chains/chain_001.jsonl.
To verify the integrity of a chain, use the verify command with the chain ID:
chainwatch verify chain_001Success Output:
Chain Verification: ✅ PASSED
Failure Output:
Chain Verification: ❌ FAILED
Reason: Hash mismatch at event 'step_2_tampered'. Expected '...', but found '...'.
Failure Cause:
The error occurred at event: step_2_tampered
...
To see a step-by-step replay of a chain, use the replay command:
chainwatch replay chain_001Output:
Step 1: prompt - user_request
Input: {'user_query': 'What is the credit score for user 123?'}
Step 2: tool_call - credit_checker
Input: {'user_id': '123'}
Output: {'score': 782}
Step 3: decision - final_answer
Input: {'prompt': 'The user with id 123 has a credit score of 782. What is the final answer?'}
Output: {'answer': 'The credit score for user 123 is 782.'}