The AI agent doesn’t read your documents end-to-end the way a person does. When a customer asks a question, the platform searches your library for the passages most likely to contain the answer and hands those passages — not the whole document — to the model. The way you write and structure documentation has a direct impact on whether the right passage gets retrieved and whether the model can answer from it. This page lays out the properties that make documentation work well with our retrieval system. Treat it as a working baseline — we will keep iterating as we learn what helps most.Documentation Index
Fetch the complete documentation index at: https://docs.dialai.ca/llms.txt
Use this file to discover all available pages before exploring further.
How retrieval works (in one paragraph)
Each document is split into large chunks and embedded into a vector index. At conversation time, the customer’s message (rewritten as a search query — see Flow Documentation) is embedded against that index, and the closest chunks are returned. Before indexing, each chunk is contextualized: the platform prepends a short, AI-generated summary that situates the chunk inside the larger document, so that a chunk about “rebate eligibility” still retrieves well for a query about “the Energy Rebate Program” even when the program name isn’t repeated locally. This technique is described in Anthropic’s Contextual Retrieval write-up. The practical implication: the model only sees the chunks that come back. If a chunk is ambiguous on its own, retrieved out of context, or contradicts another chunk, the answer suffers.Properties of good documentation
Self-contained sections
A reader who lands on a single section — with no surrounding pages — should still understand what it’s about. Don’t rely on “as described above” or “see the previous section”. Restate the subject when a new section starts.Less good: “It must be submitted within 30 days.” Better: “A rebate application must be submitted within 30 days of installation.”Contextualization helps, but the closer a chunk is to standing on its own, the more reliably retrieval and answering both work.
Clear, descriptive headings
Headings are one of the strongest signals the contextualization step uses to summarize a chunk. Write them like search results, not table-of-contents entries.Less good: “Overview”, “Details”, “Other notes” Better: “Rebate eligibility for residential customers”, “How to submit a rebate application”, “What to do if your application is rejected”
One topic per document, when practical
Smaller, focused documents retrieve more accurately than monolithic handbooks. A 200-page policy manual is a worst case — ten 20-page documents on specific topics is much better. If a single PDF mixes billing, outages, and rebate programs, retrieval will sometimes return the wrong section for a given question. If you can’t split a source document, at least give it a strong, descriptive internal structure.Consistent terminology
Use the same name for the same thing throughout your library. If one document calls it the “Energy Rebate Program” and another calls it “EnerSave”, retrieval has to bridge that gap on every query. Pick one canonical term per concept and stick to it. Where customers genuinely use a different vocabulary than your documents do, use the Query Rewriter Context field on Flow Documentation to list the synonyms — but reduce vocabulary drift inside the documents themselves first.Define acronyms and jargon inline
Spell out an acronym the first time it appears in a section — not just the first time it appears in the document. Remember that the agent may only see one chunk, not the introduction.“PV (photovoltaic) systems qualify for the federal credit when…”
Concrete over abstract
Specific values, names, thresholds, dates, dollar amounts, and procedures retrieve much better than abstract descriptions. If the answer to a likely question is a number or a step list, write the number or the step list — don’t paraphrase it.Avoid duplication and contradiction
When two chunks cover the same ground with different details, retrieval may return either one — and the agent’s answer becomes inconsistent. Prefer a single authoritative source per topic. If you do need duplication (e.g., a quick summary plus a full policy), make sure the two versions agree, and consider whether the summary belongs in an FAQ instead.Keep documents current
Stale documentation is worse than missing documentation — the agent will confidently quote outdated policy. Set a review cadence for every document, and prefer connector sources (S3, Confluence) when the source of truth is already maintained somewhere your team updates regularly.Images and tables
Images aren’t invisible to the agent — the platform converts each image into an AI-generated text description before indexing — but quality varies. A labeled diagram or a screenshot with obvious content describes well; a dense chart, a small-text infographic, or a low-resolution scan describes poorly. If a fact is critical, don’t rely on it living only inside an image; include it in the surrounding text too. Tables are handled whether they’re expressed as text (Markdown, HTML, text-layer PDF) or rendered as images. Text-form tables retrieve more reliably; image tables go through the same image-to-text path and inherit its limitations. Supported direct-upload formats are listed under Manage Documents.Document, FAQ, or thought?
Documentation is one of three knowledge mechanisms. Choosing the right one matters as much as writing it well.
If you find yourself adding the same answer to many FAQs, it probably belongs in a document. If you find yourself writing a document that’s really one Q&A, it probably belongs in an FAQ.
A quick checklist
Before attaching a document to a flow, run through:- Does each section stand on its own without the rest of the document?
- Are headings specific enough to describe what’s in the section?
- Is terminology consistent with the rest of the library?
- Are acronyms defined where they appear?
- Are there concrete values, steps, and names — not just abstractions?
- Is anything duplicated or contradicted by another document?
- Is the source text-based (not scanned images)?
- Is it current, and is there an owner who will keep it current?
Further reading
- Anthropic — Contextual Retrieval — the technique used to enrich chunks before embedding.
Related
Manage documents
Upload, sync, and organize source material.
Flow documentation
Attach documents and tune retrieval per flow.
FAQ
Curate short Q&A pairs.
Document search
Query the library directly.