Explainer · 9 min read

What is an llms.txt file? A complete guide for 2026

Published April 30, 2026 · By Ceres

llms.txt is a Markdown file you put at the root of your domain (https://yoursite.com/llms.txt) to give large language models a curated map of your most important content. It was proposed by Jeremy Howard (co-founder of fast.ai and Answer.AI) in September 2024 and is documented at llmstxt.org. Think of it as a README written for AI agents — concise, hand-curated, pointing at the pages that matter most.

This post answers the questions we're asked most often: what llms.txt is, how it differs from robots.txt and sitemap.xml, whether AI engines actually read it, what to put in it, and how to ship one in 30 minutes. There's also a section on the optional llms-full.txt companion file and a real-world example you can copy.

The one-paragraph summary

llms.txt is a single Markdown file at /llms.txt on your domain that lists your most important pages with a one-line description for each. It's designed to help AI agents and LLMs understand your site quickly without crawling everything. The format is simple: a top-level # heading with your name, an optional > blockquote summary, then ## sections containing Markdown-style links with descriptions. Adoption is voluntary — there is no enforcement mechanism — but the file is cheap to ship and zero-downside, so most teams optimising for AI-search visibility now include one.

Why llms.txt exists

The story starts with the problem AI agents face when they hit a website. Your site probably has dozens or hundreds of pages, each rendered with JavaScript, surrounded by navigation chrome, marketing widgets, cookie banners, and footer links. An agent that wants to answer a question about your product has three bad options:

  • Crawl every page. Slow, expensive, and most pages aren't relevant to the agent's question. Tokens spent on cookie-banner copy are tokens not spent on the actual answer.
  • Read sitemap.xml. Gives a list of URLs but no signal about which matter, no descriptions, and no human-curated narrative. A sitemap with 1,200 URLs helps a search engine; it doesn't help an agent decide what's worth fetching.
  • Render the homepage with a browser. Catches the marketing copy but misses your docs, pricing details, and policy pages. Also expensive — full-page JS renders cost real money at agent scale.

llms.txt cuts this Gordian knot. The site owner — who knows which pages matter — writes a short Markdown index pointing the agent directly at the relevant content. The agent fetches one file (a few KB), gets a curated map, and decides what to read next based on clear hints. Bandwidth, latency, and token costs all drop.

llms.txt vs robots.txt vs sitemap.xml

These three files are often confused. They're actually complementary, each answering a different question:

  • robots.txt answers "what may you access?" — a permission file telling crawlers which paths are allowed and which are off-limits. It's an instruction to bots, not a description of content.
  • sitemap.xml answers "what URLs exist?" — a machine-readable list of every indexable URL with metadata (last modified, change frequency, priority). Built for search-engine discovery; long, exhaustive, no editorial judgment.
  • llms.txt answers "where should I look first?" — a human-curated, narrative-friendly Markdown index pointing at the most important pages with descriptions. Built for AI agents; short, opinionated, hand-edited.

You don't pick one. A serious site has all three. They serve different consumers: robots.txt for traditional crawlers, sitemap.xml for search engines, llms.txt for AI agents.

The format, in detail

llms.txt is plain Markdown. Per the spec at llmstxt.org, the structure is:

  1. An H1 heading with the site or product name. Required, and there must be exactly one.
  2. An optional blockquote (>) with a one-paragraph summary of what the site is about. Strongly recommended — this is the agent's first read.
  3. Optional plain-text paragraphs giving more context. Keep it short.
  4. H2 sections grouping links by purpose — common section names are ## Documentation, ## Product, ## Trust & policies, and ## Optional.
  5. Markdown links within each section, optionally followed by a colon and a description. Format: - [link text](url): description.

One special section name matters: ## Optional is reserved for pages an agent can skip if context is tight. Per the spec, agents should treat the Optional section as deprioritised reading — a way to mark transactional or low-information pages (signup forms, contact pages) without dropping them from the file entirely.

A real example

Here's the llms.txt we ship at agentceres.com (lightly trimmed):

# Ceres

> Ceres is your AI Growth Officer — the first AI agent that
> runs marketing 24/7 for indie founders and small SaaS teams.
> Specialist agents deliver evidence-cited briefings in
> Slack; outbound content ships as drafts for your review.

## Product

- [Landing page](https://agentceres.com/): Meet Ceres — the
  AI Growth Officer for indie SaaS
- [How it works](https://agentceres.com/how-it-works): The
  evidence chain, memory system, and human-review posture
- [Pricing](https://agentceres.com/pricing): Four flat-price
  plans from $39/mo

## Documentation

- [Documentation](https://agentceres.com/docs): Operator
  and customer-facing setup docs

## Trust & policies

- [Security](https://agentceres.com/security): Tenant-isolation,
  evidence requirements, approval boundary
- [Privacy](https://agentceres.com/privacy): Privacy policy
- [Terms](https://agentceres.com/terms): Terms of service

## Optional

- [Sign up](https://agentceres.com/signup): Free trial signup —
  transactional page, not informational reading
- [Contact](https://agentceres.com/contact): Contact form —
  transactional page, not informational reading

You can read the full file at agentceres.com/llms.txt. Note the structure: H1 for the brand, blockquote summary, sections grouped by purpose, descriptive line per link, transactional pages segregated under ## Optional.

llms.txt vs llms-full.txt

The spec defines an optional companion file: llms-full.txt. The two have different jobs:

  • llms.txt — the index. Short (typically 500–2000 words), curated, links out to your important pages, fetched first.
  • llms-full.txt — the full corpus, flattened. The actual content of your indexed pages concatenated into one Markdown document, optionally minified for token efficiency. Lets an agent fetch your entire knowledge base in one request instead of crawling.

Small sites usually only need llms.txt. If your site has substantial documentation (a docs portal, a help centre, a long blog archive), shipping llms-full.txt as well lets agents grab the whole thing without N round trips. Keep it under your hosting provider's response-size limit (Cloudflare and Vercel are typically generous; some CDNs cap at 10–25 MB).

How to create your own llms.txt — 30-minute version

The fast path:

  1. Open a blank file. Name it llms.txt. Put it in your site's public/static directory (Next.js: public/llms.txt; Vite: public/llms.txt; static-site generators: the build output root).
  2. Write the H1 and blockquote. Your product or site name as # Heading; one paragraph in a > blockquote answering "what is this?" in 2–3 sentences.
  3. Add the sections. Start with three: ## Product, ## Documentation, ## Trust & policies. Add ## Optional for transactional pages (signup, contact, login).
  4. Hand-pick the links. Resist the urge to dump every URL. The whole point is curation. 8–15 links is plenty for a typical SaaS landing site; a docs-heavy site might justify 25–40.
  5. Write a description per link. One concrete sentence about what the page covers. Skip marketing fluff ("the best X" is noise); write what an agent would learn by reading the page.
  6. Verify it's served at the root. Deploy, then curl https://yoursite.com/llms.txt and confirm you get the file back, not a 404 or your SPA shell. Common gotcha: SPA fallbacks rewrite all paths to index.html; you may need a config tweak so /llms.txt serves the static file.
  7. Add it to your sitemap (optional). Some teams reference llms.txt from sitemap.xml for discovery; not strictly required, but harmless.

Common mistakes

  • Treating it like a sitemap. 1,200 links with no descriptions defeats the purpose. llms.txt is a curated index, not a URL dump.
  • Marketing copy in descriptions. "The world's best AI-powered customer success platform" tells an agent nothing useful. Write what the page contains, not how you'd like it to be perceived.
  • Forgetting ## Optional. Without segregating transactional pages, an agent burns tokens trying to extract information from your signup form or contact page.
  • Stale content. If you ship llms.txt and never update it, the file drifts away from your real site. Treat it like docs — re-review it quarterly when you change navigation or add major features.
  • Linking to login-protected pages. Agents can't auth into your dashboard. Skip authenticated routes; only link public pages.

Does any of this actually move the needle?

The honest answer in 2026 is: probably yes, with caveats. llms.txt adoption is real but partial. Anthropic, Mistral, and several agent runtimes have signaled support. Perplexity and Google's AI Overviews haven't made formal commitments either way. We've seen anecdotal traffic-attribution data suggesting AI agents fetch llms.txt when present and use it to ground their answers, but the public research is still thin.

That said, the cost-benefit math is one-sided. Shipping llms.txt takes 30 minutes and ~2 KB on your CDN. The downside is zero — agents that don't read it ignore it silently, no SEO penalty, no maintenance burden once it exists. The upside, if/when llms.txt becomes a stronger ranking signal in AI engines, is meaningful AI-citation visibility for one of the cheapest interventions in the GEO playbook. We ship it. Most teams optimising for AI-search visibility ship it.

Where llms.txt fits in the GEO toolkit

Generative Engine Optimization (GEO) is the discipline of optimising your content so AI engines (Perplexity, ChatGPT, Claude, Google AI Overviews) cite you when prompted in your category. llms.txt is one of several artifacts in the GEO toolkit:

  • llms.txt — gives agents a hand-curated map of your site (this post).
  • Structured data (Schema.org JSON-LD) — embeds machine-readable facts in your HTML so engines can extract Article, FAQPage, HowTo, Organization markup directly.
  • FAQPage / HowTo / Article schema — the specific schema types most useful for AI-engine extraction. FAQ in particular maps cleanly to the question/answer shape engines synthesise from.
  • Citation-friendly content — concrete numbers, primary sources, comparison tables, named entities. Engines prefer pages that cite to pages that gesture.
  • Citation auditing — measuring which queries cite you vs. competitors across engines, on a recurring cadence, so rewrites are evidence-driven not guess-driven.

The last item is what our Generative Engine Optimization role does weekly — auditing citations across Perplexity, ChatGPT, Claude, and AI Overviews on your tracked query list, diagnosing why engines cite competitors when they do, and handing rewrite briefs to the SEO Content role. llms.txt is a one-time setup; citation auditing is the ongoing measurement loop.

Ship one today

llms.txt is the rare optimisation where the cost is trivial and the downside is zero. If you don't have one, write it now. Use the example above as a template. Keep it short, hand-curated, descriptive. Re-review it quarterly when your site changes.

If you want to see citation outcomes — actually measure whether AI engines start citing you after the change — that's where Ceres's GEO Strategist role picks up: weekly citation audits, rewrite briefs, before/after comparisons. You can start the free trial if you want the full team, or just keep this page bookmarked as your llms.txt reference.

FAQ

What is an llms.txt file?
An llms.txt file is a Markdown document at the root of your domain (https://yoursite.com/llms.txt) that gives large language models a curated map of your most important content. It's a proposal from Jeremy Howard (fast.ai) introduced in September 2024 — a single-file index that AI agents can read to understand your site without crawling every page or paying for full-page JavaScript renders.
Is llms.txt the same as robots.txt or sitemap.xml?
No. robots.txt tells crawlers what they may and may not access (a permission file). sitemap.xml lists every URL on your site with metadata for search engine indexing (a discovery file). llms.txt is a curated, human-written narrative pointing AI agents at the pages that matter most — closer to a README than to either of the two existing files. The three are complementary, not substitutes.
Do AI engines actually read llms.txt today?
Adoption is partial as of 2026. Anthropic, Mistral, and several agent runtimes have signaled support. Perplexity and Google's AI Overviews have not committed to it directly, but several research papers cite it as a useful signal. The pragmatic answer: llms.txt is cheap to ship (one Markdown file, no infrastructure) and the downside risk is zero, so most teams optimizing for AI-search visibility add one regardless of which engines have officially endorsed it.
What's the difference between llms.txt and llms-full.txt?
llms.txt is the index — concise, ~1–2 pages, human-readable, with curated links. llms-full.txt is the optional companion: the actual content of the indexed pages flattened into a single Markdown document, so an agent that needs your full corpus can fetch it in one request instead of crawling. Both are part of the same llmstxt.org spec; small sites usually only need llms.txt, content-heavy sites benefit from both.
Where do I put the llms.txt file?
At the root of your domain — https://yoursite.com/llms.txt. The spec is explicit on this: agents look for /llms.txt at the bare root, the same way they look for /robots.txt. Subdirectory placement (e.g. /docs/llms.txt) is not recognized by the spec.
How long should an llms.txt file be?
Concise. The spec recommends a single page of Markdown — typically 500–2000 words. Anything longer should live in llms-full.txt or in the linked pages themselves. Think of llms.txt as a README, not a knowledge base. If you need to give the agent more context, link to it; don't paste it inline.
What is an llms.txt file? A complete guide for 2026 · Ceres