Kraterion

Agents

Chat API

Agents speak the OpenAI Chat Completions protocol. Anything that already talks to OpenAI can talk to a Kraterion agent — and gets retrieval and citations back as an extra field.

Endpoint

POST https://api.kraterion.com/v1/agents/<agent_id>/chat/completions
Authorization: Bearer kr_live_...

Authenticate with a bearer token or a share token. The agent's model, tools, and knowledge come from its saved configuration.

Request

A standard Chat Completions body. The system prompt belongs to the agent, so a system message in the request is rejected, and the last message must be from the user. Two Kraterion-specific flags control the extension payload.

{
  "messages": [
    { "role": "user", "content": "What's our refund window?" }
  ],
  "model": "gpt-4o-mini",
  "temperature": 0.2,
  "max_tokens": 512,
  "stream": false,
  "include_retrieval_info": true,
  "include_citations": true
}

Response

The familiar Chat Completions object, plus a kraterion block with retrieval stats, citations, and any tool calls.

{
  "id": "chatcmpl_kr_...",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Refunds are accepted within 30 days..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 812, "completion_tokens": 48, "total_tokens": 860 },
  "kraterion": {
    "agent_id": "...",
    "retrieval": {
      "bucket_ids": ["..."],
      "hit_count": 3,
      "retrieval_latency_ms": 41,
      "llm_latency_ms": 520,
      "total_latency_ms": 564
    },
    "citations": [
      {
        "index": 0,
        "chunk_hash": "sha256-...",
        "s3_key": "handbook.md",
        "ordinal": 7,
        "bucket_id": "...",
        "source_walrus_blob_id": "...",
        "cited": true
      }
    ]
  }
}

Streaming

Set stream: true for Server-Sent Events. You get standard chat.completion.chunk frames for the text, interleaved kraterion.tool_call frames as tools run, a final kraterion.extension frame with the retrieval and citation summary, and then data: [DONE].

Citations extension

Each citation points at the exact chunk the answer drew on — its content_hash, the source object and ordinal, and the Walrus blob it came from. Because the hash is content-addressed, a reader can verify a quote was really in your data and hasn't been altered. The Search & citations page covers what each field means.

Using the OpenAI SDK

Point the base URL at the agent and pass your bearer token as the API key.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.kraterion.com/v1/agents/<agent_id>",
    api_key="kr_live_...",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's our refund window?"}],
)
print(resp.choices[0].message.content)