RAReplay & audit

Replay any run.
Prove what happened.

Agents are non-deterministic — the same input rarely gives the same output twice. Kraterion records every run as a tamper-evident trail you can replay from a receipt, then trace backward through every input that shaped the answer.

Try Kraterion →See lineage

ReproduceAuditTrace

Agent · supportkr_share_test_3f4d…

scoped · support-docs

USER

What is our refund policy?

TOOL CALLS · 3 of 6 available

recallquery: "user prefs"→2 notes29 ms
searchquery: "refund policy"→4 hits62 ms
readkey: "pricing-faq.md"→12 KB38 ms

ASSISTANT

Refunds are processed within 7 business days from the original payment method.

pricing-faq.md · §3score · 0.92verified

run · 3f4d…aereplayable

Total latency213 ms

Tools called3 of 6

Recordrecorded

receipt anchors every run

Replay from it anytime

100%

of steps captured

Retrieval, tools, memory

1:1

replay against the same inputs

Same retrieval, same data

yours

records you keep

Not vendor-held

01What a run captures

The whole chain, not just the answer.

A run record is everything it took to produce an output. When something goes wrong, you can see exactly which step caused it.

Model and prompt

The model used and a fingerprint of the system prompt, so you know exactly what ran.

Inputs

The user inputs that started the run, recorded verbatim.

Retrievals

Every chunk the agent retrieved, with the fingerprint of the exact source it came from.

Tool calls

Each tool the agent called, with its arguments and its result.

Memory

Every remember and recall, tied to the agent that made it.

Outputs

Intermediate steps and the final response — the whole chain, not just the answer.

The record itself is stored on Walrus and anchored on Sui — so it can't be altered after the fact, and anyone can verify it.

Bridge

Run it once.
Run it again, exactly.

02How replay works

From a receipt, in seconds.

Copy the receipt

Every finished run prints a short receipt.

Replay it

kraterion replay <receipt> — or call the API.

Same inputs

The run reruns against the same inputs and the same retrieved data.

Compare

See the original and the replay side by side.

Original · receipt 3f4d…ae

Replay · just now

recall2 notes

search4 hits

readpricing-faq.md · §3

outputRefunds in 7 business days.

4 of 4 steps matchverified

03Audit trail & lineage

Click any output. See every input.

The same record that powers replay is a graph. Every artifact an agent reads or writes is a node; every operation is an edge. Start from an output and walk back through the chunks, tool calls, and memory that shaped it — following the OpenLineage mental model, with a verify button on every node.

report.mdOutput

Built from

pricing-faq.md · §3chunk · score 0.92Retrievalverify
billing-policy.md · §1.4chunk · score 0.88Retrievalverify
searchquery: "refund policy" → 4 hitsTool call
recall · user prefs2 notes · markdown outputMemoryverify

Every node traces back to a record you can verify.

04The boundary

What we prove — and what we don't.

We record what the run did and make that record tamper-evident: the inputs, the retrievals, the tool calls, the memory, and the outputs. We don't claim to prove the model's internal reasoning was correct — only that this is the run that happened, and you can check it.

Replay & audit