Knowledge bases
A knowledge base is a searchable collection of documents. When an agent receives a message, it queries attached knowledge bases using semantic (vector) search and includes the most relevant passages in its context window — keeping answers accurate and grounded without bloating the system prompt.
Supported file types
| Format | Extension | Notes |
|---|---|---|
.pdf | Text is extracted from all pages, including multi-column layouts | |
| Markdown | .md | Headings are preserved as chunk boundaries |
| HTML | .html | Script and style tags are stripped; main body text is indexed |
Creating a knowledge base
Go to Knowledge Bases → New knowledge base, give it a name and optional description, then upload files.
Each uploaded file goes through an async processing pipeline:
- Uploaded — file received and stored.
- Processing — text is extracted, split into chunks, and embeddings are generated.
- Indexed — chunks are written to the vector store and ready to search.
You can upload additional files at any time. Already-indexed files are not re-processed unless you re-upload them.
Processing status
The knowledge base detail page shows per-file status. If a file gets stuck in processing for more than a few minutes, try re-uploading it — very large or heavily formatted PDFs occasionally need a second pass.
Attaching to agents
Open an agent → Capabilities → add knowledge bases. The agent will search all attached knowledge bases on every turn and prepend the top-matching passages to its context.
You can attach multiple knowledge bases to one agent. They are searched in parallel and results are merged by relevance score.
Attaching to workflow steps
Individual workflow steps have their own knowledge base attachments. A step only searches the bases attached to it — not everything attached to the underlying agent. This is useful for pipelines where different steps need domain-specific context:
- Step 1 (Classify) — no knowledge base needed, the model decides the category from the message
- Step 2 (Answer billing question) — attaches the “Billing policies” knowledge base
- Step 3 (Answer tech question) — attaches the “Product documentation” knowledge base
Retrieval behaviour
At runtime, the query used for retrieval is the user’s latest message (after conversation context is considered). The top-k most semantically similar chunks are returned and injected into the system context before the model generates a response.
The agent is not told which knowledge base a result came from — only the content. If provenance matters, include document titles or source references inside the documents themselves.
Tips for better retrieval
- Break up large PDFs — a 200-page PDF indexed as one unit retrieves less precisely than the same content split into logical chapters.
- Use descriptive file names — the file name is included in the chunk metadata the agent sees.
- Keep bases focused — a knowledge base with 20 tightly-scoped documents outperforms one with 200 loosely-related ones. Create separate bases for separate domains.
- Avoid duplicate content — if the same text appears in multiple documents, retrieval results will be redundant and waste context budget.
- Plain text retrieves better — tables, footnotes, and heavily formatted layouts are harder to chunk cleanly. Converting key tables to plain prose or Markdown improves accuracy.