Replay any run.
Prove what happened.
Agents are non-deterministic — the same input rarely gives the same output twice. Kraterion records every run as a tamper-evident trail you can replay from a receipt, then trace backward through every input that shaped the answer.
What is our refund policy?
- recallquery: "user prefs"→2 notes29 ms
- searchquery: "refund policy"→4 hits62 ms
- readkey: "pricing-faq.md"→12 KB38 ms
Refunds are processed within 7 business days from the original payment method.
The whole chain, not just the answer.
A run record is everything it took to produce an output. When something goes wrong, you can see exactly which step caused it.
Model and prompt
The model used and a fingerprint of the system prompt, so you know exactly what ran.
Inputs
The user inputs that started the run, recorded verbatim.
Retrievals
Every chunk the agent retrieved, with the fingerprint of the exact source it came from.
Tool calls
Each tool the agent called, with its arguments and its result.
Memory
Every remember and recall, tied to the agent that made it.
Outputs
Intermediate steps and the final response — the whole chain, not just the answer.
The record itself is stored on Walrus and anchored on Sui — so it can't be altered after the fact, and anyone can verify it.
Run it once.
Run it again, exactly.
From a receipt, in seconds.
Every finished run prints a short receipt.
kraterion replay <receipt> — or call the API.
The run reruns against the same inputs and the same retrieved data.
See the original and the replay side by side.
Click any output. See every input.
The same record that powers replay is a graph. Every artifact an agent reads or writes is a node; every operation is an edge. Start from an output and walk back through the chunks, tool calls, and memory that shaped it — following the OpenLineage mental model, with a verify button on every node.
Built from
- pricing-faq.md · §3chunk · score 0.92Retrievalverify
- billing-policy.md · §1.4chunk · score 0.88Retrievalverify
- searchquery: "refund policy" → 4 hitsTool call
- recall · user prefs2 notes · markdown outputMemoryverify
Every node traces back to a record you can verify.
What we prove — and what we don't.
We record what the run did and make that record tamper-evident: the inputs, the retrievals, the tool calls, the memory, and the outputs. We don't claim to prove the model's internal reasoning was correct — only that this is the run that happened, and you can check it.
Stop guessing.
Replay the run.
Every run recorded. Every record yours. Every step verifiable.