Knowledge

Search & citations

Search a knowledge bucket and get back ranked passages — each one tied to a specific object, position, and set of bytes you can verify.

Hybrid retrieval

Search runs two legs in parallel: BM25 keyword matching (great for exact terms, names, and codes) and vector similarity (great for paraphrases and concepts). The two ranked lists are combined with reciprocal rank fusion, so a chunk that scores well on either leg surfaces — you don't have to choose between keyword and semantic search.

Search endpoint

curl -X POST https://api.kraterion.com/v1/buckets/<bucket_id>/knowledge/search \
  -H "Authorization: Bearer kr_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "query": "what is the refund window?", "top_k": 8 }'

query is required (up to 4096 characters); top_k is optional (1–32).

Chunk hits

Each hit describes one chunk and where it came from.

Field	Meaning
`s3_key`	The object the chunk came from.
`ordinal`	The chunk's position within that object.
`content`	The chunk text.
`content_hash`	SHA-256 of the chunk — the verification anchor.
`source_walrus_blob_id`	The Walrus blob the content was read from.
`vector_distance`	Semantic distance for the vector leg (lower is closer).
`bm25_score`	Keyword relevance for the BM25 leg (higher is better).
`rrf_score`	The fused rank used to order results.

Verifiable citations

The content_hash is what makes a citation trustworthy. Because it's the hash of the chunk text and the source blob is content-addressed, anyone can confirm a quoted passage genuinely came from your data and wasn't fabricated or edited after the fact. Agent answers carry these same citations in their response extension.

Asking questions

Search returns passages, not answers. To ask a question in natural language and get a written, cited answer, point an agent at the bucket and call its chat endpoint — the agent uses this same retrieval under the hood. (There is no standalone /ask endpoint; that role belongs to agents.)