Skip to content

Update llms.adoc for proper markdown filenames and AI crawler permissions#161

Open
JakeSCahill wants to merge 12 commits intomainfrom
update-llms-markdown-documentation
Open

Update llms.adoc for proper markdown filenames and AI crawler permissions#161
JakeSCahill wants to merge 12 commits intomainfrom
update-llms-markdown-documentation

Conversation

@JakeSCahill
Copy link
Copy Markdown
Contributor

@JakeSCahill JakeSCahill commented Mar 21, 2026

Summary

Updates documentation and configuration to reflect the new AI-optimized markdown export structure:

  1. Updates llms.adoc to document proper .md filenames (not index.md)
  2. Adds explicit AI crawler permissions to production robots.txt
  3. Fixes outdated netlify.toml redirects for markdown files

Changes

home/modules/ROOT/pages/llms.adoc

  • Removed references to /page/index.md structure
  • Updated to proper .md filenames (/page.md)
  • Added AI-Optimized Formats section documenting:
    • llms.txt (curated overview)
    • llms-full.txt (complete export)
    • Component-specific exports (ROOT-full.txt, redpanda-cloud-full.txt, etc.)

antora-playbook.yml

  • Enhanced production robots.txt from robots: allow to explicit directives
  • Added permissions for 14 AI platforms including:
    • GPTBot, ChatGPT-User
    • Claude-Web, anthropic-ai
    • Perplexity, PerplexityBot
    • Google-Extended, GoogleOther
    • CCBot, cohere-ai, and more
  • Added crawl-delay directive

netlify.toml

  • Updated redirect: /current.md/current/home.md (was /current/home/index.md)
  • Removed outdated catch-all index.md redirect

Testing

Verified that llms.txt generation works correctly with updated documentation.

Related PRs

## Changes

- Remove references to "indexify convention" and index.md files
- Update markdown access instructions: replace .html with .md instead of appending /index.md
- Add AI-Optimized Formats section with:
  - llms.txt (curated overview)
  - llms-full.txt (complete export ~20MB)
  - Component-specific exports (ROOT-full.txt, redpanda-cloud-full.txt, etc.)
- Document YAML frontmatter in individual markdown pages
- Update versioning section to use proper markdown paths

## Benefits

- Accurate documentation of new markdown structure
- Clear guidance for AI agents on available formats
- Better discoverability with component-specific exports
- Matches actual implementation from docs-extensions-and-macros
Enhanced the production playbook (antora-playbook.yml) with explicit
robots.txt directives for AI crawlers including GPTBot, Claude-Web,
Perplexity, Google-Extended, and other platforms.

This makes Redpanda's intent to welcome AI crawlers explicit and clear,
following best practices for AI discoverability.
@JakeSCahill JakeSCahill requested a review from a team as a code owner March 21, 2026 19:50
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 21, 2026

Deploy Preview for redpanda-documentation ready!

Name Link
🔨 Latest commit 7c11ef0
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-documentation/deploys/69c8e3971dfd9400087c6af7
😎 Deploy Preview https://deploy-preview-161--redpanda-documentation.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 91 (🟢 up 7 from production)
Accessibility: 96 (no change from production)
Best Practices: 100 (no change from production)
SEO: 83 (🔴 down 9 from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

JakeSCahill and others added 9 commits March 21, 2026 20:10
Move Crawl-delay inside wildcard User-agent block for proper
robots.txt syntax. Crawl-delay should be within a User-agent
block, not standalone after all blocks.
Replace all /index.md paths with proper .md filenames:
- /ai-agents/index.md → /ai-agents.md
- /console/index.md → /console.md
- /get-started/quickstarts/index.md → /get-started/quickstarts.md
- /api/doc/*.md → /api/*.md

This fixes afdocs check failures for broken links and ensures
all URLs in llms.txt point to actual markdown files that exist.
Add catch-all redirect: /*/index.md → /:splat.md

This ensures old bookmarks and links to /page/index.md are
redirected to /page.md in the new markdown structure.
Rewrite AI-Optimized Formats section to use flowing prose instead
of bullet lists for better readability. Remove ~20MB size reference
as it's dynamic and will change over time. Maintain all essential
information while improving narrative flow.
Change 'markdown' to 'Markdown' (proper noun) throughout the
Access Markdown content section. Revert to bullet list format
as the prose conversion was unintended.
Removed deprecated and questionable user agents:
- Claude-Web, anthropic-ai (deprecated by Anthropic in 2026)
- Perplexity (duplicate/incorrect - PerplexityBot is correct)
- cohere-ai (undocumented)
- Omgilibot (commercial scraper, not AI development)

Added Anthropic's new three-bot framework (2026):
- ClaudeBot (model training)
- Claude-User (user requests)
- Claude-SearchBot (search optimization)

Verified remaining agents with official documentation:
- GPTBot, ChatGPT-User (OpenAI)
- PerplexityBot (Perplexity)
- Google-Extended, GoogleOther (Google)
- CCBot (Common Crawl)
- FacebookBot (Meta)

Added AI-CRAWLER-USER-AGENTS.md documentation with:
- Verification evidence for each user agent
- Official documentation links
- Maintenance procedures
- Change log
Updates llms.adoc to include comprehensive information about the
Redpanda Documentation MCP (Model Context Protocol) server:

- MCP server URL: https://docs.redpanda.com/mcp
- Setup instructions for Claude Code (npx doc-tools setup-mcp)
- Complete list of available MCP tools:
  * generate_property_docs
  * generate_metrics_docs
  * generate_rpk_docs
  * generate_rpcn_connector_docs
  * generate_helm_docs
  * generate_crd_docs
  * generate_bundle_openapi
  * get_redpanda_version
  * get_console_version
  * get_antora_structure

Reorganized AI-Optimized Formats section:
- Interactive MCP Server (new subsection)
- Static Exports (existing content reorganized)

This makes the MCP server discoverable via llms.txt for AI agents
and tools that follow the llms.txt standard.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds MCPcat analytics tracking to the Redpanda docs MCP server.
MCPcat is an open-source analytics platform for monitoring MCP usage.
The integration is optional and only activates if the MCPCAT_PROJECT
environment variable is set.

Changes:
- Added mcpcat as a dependency in package.json
- Integrated MCPcat tracking in netlify/functions/mcp.mjs
- Includes error handling to prevent server crashes if analytics fail

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant