Skip to content

Add AI-friendly meta tags and enhanced structured data#371

Open
JakeSCahill wants to merge 22 commits intomainfrom
add-ai-crawler-optimizations
Open

Add AI-friendly meta tags and enhanced structured data#371
JakeSCahill wants to merge 22 commits intomainfrom
add-ai-crawler-optimizations

Conversation

@JakeSCahill
Copy link
Copy Markdown
Contributor

Summary

Enhances the Antora UI bundle with AI-friendly optimizations:

  1. Adds explicit robots meta tags for AI crawlers
  2. Enhances Schema.org structured data with TechArticle schema
  3. Updates application category to "Agentic Data Plane"

Changes

src/partials/head-meta.hbs

  • Added AI-friendly robots meta tags for production pages:
    • max-snippet:-1 (no snippet length limit)
    • max-image-preview:large (allow large image previews)
    • max-video-preview:-1 (no video preview limit)
    • ai-content-declaration: documentation (explicit content type)
  • Preserves noindex for prerelease and preview pages

src/partials/head-structured-data.hbs

  • Added TechArticle schema for individual documentation pages
  • Includes headline, description, dates, author, publisher
  • Links to parent website via isPartOf
  • Updated applicationCategory from "Streaming Data Platform" to "Agentic Data Plane"
  • Enhanced JSON-LD structured data for better search engine understanding

Benefits

  • Search Engine Discovery: TechArticle schema helps search engines understand documentation structure
  • AI Agent Optimization: Explicit meta tags signal AI-friendly content with no snippet limits
  • Brand Alignment: "Agentic Data Plane" reflects current positioning
  • SEO Best Practices: Follows Google's structured data guidelines

Testing

Verified template syntax is correct (no bundle build required for review).

Related PRs

## Changes

### head-meta.hbs
- Add AI-friendly robots meta tag for production pages
- Include max-snippet:-1, max-image-preview:large directives
- Add ai-content-declaration meta tag
- Keep noindex for prereleases and previews

### head-structured-data.hbs
- Add TechArticle schema.org type for individual pages
- Include page title, description, and URL
- Link to organization and website schemas
- Add software application context for component pages

## Benefits
- Better AI crawler understanding of page content
- Improved discoverability by AI search engines
- Richer metadata for AI-powered tools
- Standards-compliant structured data
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 21, 2026

Deploy Preview for docs-ui ready!

Name Link
🔨 Latest commit 5389822
🔍 Latest deploy log https://app.netlify.com/projects/docs-ui/deploys/69c8df0f4ec7190008ba0ec1
😎 Deploy Preview https://deploy-preview-371--docs-ui.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 34 (🟢 up 5 from production)
Accessibility: 93 (no change from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 89 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6def1449-f5f2-495c-b9f0-d434ab5494f9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The pull request modifies two Handlebars template partials to enhance SEO and structured data markup. The head-meta template adds conditional search engine indexing directives and AI content declaration meta tags for production pages. The head-structured-data template extends the JSON-LD graph by conditionally adding a TechArticle node with properties including headline, description, publication dates, and software component details when page information is available.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • paulohtb6
  • Feediver1
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding AI-friendly meta tags and enhanced structured data for improved SEO and AI crawler optimization.
Description check ✅ Passed The description is well-related to the changeset, providing clear details about the modifications to both template files, their benefits, and the rationale behind the changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-ai-crawler-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/partials/head-structured-data.hbs (1)

56-57: Use content-based timestamps instead of build date for article dates.

Lines 56–57 use {{iso-date}}, which returns today's date at build time, causing every page to appear newly published/modified on each build. This weakens structured-data accuracy for search engines and readers.

To fix this, consider:

  • Adding date metadata fields to your content (e.g., docdatetime, revdate, or a custom published-date/modified-date attribute)
  • Using git commit timestamps if content is version-controlled
  • Mapping datePublished and dateModified to these fields with iso-date only as a fallback for pages without explicit dates
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/partials/head-structured-data.hbs` around lines 56 - 57, The
structured-data partial currently sets "datePublished" and "dateModified" to the
build-time helper {{iso-date}} which makes all pages appear newly
published/modified; update the head-structured-data.hbs partial to read
content-level date metadata first (e.g., check fields like docdatetime, revdate,
published-date, modified-date or git-derived timestamps) and only fall back to
{{iso-date}} when those fields are absent so "datePublished" and "dateModified"
use content-based timestamps; locate the "datePublished" and "dateModified"
lines in head-structured-data.hbs and map them to those metadata properties with
{{iso-date}} as the fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/partials/head-structured-data.hbs`:
- Around line 52-54: The JSON-LD description currently uses unescaped raw output
({{{page.attributes.description}}}) which can break JSON or allow
script-breakout; replace the triple-stash with a safe, JSON-stringified/escaped
value (e.g. use a Handlebars helper that returns JSON.stringify(value) or
Handlebars.escapeExpression on page.attributes.description) and output that
helper with normal {{...}} so the "description" field contains a properly
escaped JSON string; update the partial head-structured-data.hbs to call that
helper instead of {{{page.attributes.description}}} (reference:
page.attributes.description in this partial).

---

Nitpick comments:
In `@src/partials/head-structured-data.hbs`:
- Around line 56-57: The structured-data partial currently sets "datePublished"
and "dateModified" to the build-time helper {{iso-date}} which makes all pages
appear newly published/modified; update the head-structured-data.hbs partial to
read content-level date metadata first (e.g., check fields like docdatetime,
revdate, published-date, modified-date or git-derived timestamps) and only fall
back to {{iso-date}} when those fields are absent so "datePublished" and
"dateModified" use content-based timestamps; locate the "datePublished" and
"dateModified" lines in head-structured-data.hbs and map them to those metadata
properties with {{iso-date}} as the fallback.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59bc3f17-61f2-4f8e-9ffd-6fc77801622f

📥 Commits

Reviewing files that changed from the base of the PR and between b488876 and 7d094be.

📒 Files selected for processing (2)
  • src/partials/head-meta.hbs
  • src/partials/head-structured-data.hbs

After research, ai-content-declaration is not a recognized standard.
The industry uses C2PA metadata instead of HTML meta tags for
AI content declarations. Keeping only the official Google robots
meta tags (max-snippet, max-image-preview, max-video-preview).
- Escape description field properly: use {{...}} instead of {{{...}}}
  to prevent JSON breakage from quotes/newlines in descriptions
- Use content-level dates for datePublished/dateModified: check
  docdatetime and revdate attributes before falling back to build-time
  iso-date so dates reflect actual content timestamps
Change indexify URLs from /path/index.md to /path.md to match
the new markdown export structure. This fixes "Copy as Markdown"
and "View as plain text" buttons to use correct URLs.
Add richer structured data for better search engine understanding:
- alternativeHeadline: navtitle if different from title
- keywords: from page keywords or categories attributes
- genre: from page-role (tutorial, how-to, concept, reference)
- dependencies: from page-prerequisites attribute
- datePublished: use page-release-date first (when page created),
  then fall back to docdatetime, revdate, iso-date
- dateModified: use docdatetime (last updated), then revdate,
  iso-date

This provides search engines with more context about documentation
pages while only using data already in page attributes.
- Use git-created-date for datePublished with page-release-date fallback
- Use git-modified-date for dateModified with page-release-date fallback
- Replaces build-time date fallbacks with accurate Git commit history
- Works with add-git-dates extension in docs-extensions-and-macros

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@JakeSCahill JakeSCahill requested a review from a team March 22, 2026 20:03
JakeSCahill and others added 2 commits March 22, 2026 20:58
The generate:bloblang-grammar task was hanging indefinitely in GitHub Actions
when fetching from GitHub/docs.redpanda.com due to network issues.

Added timeout option and timeout event handler to fail fast instead of hanging
for hours. This allows fallback versions to be tried and builds to complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created git-created-date and git-modified-date helpers that query
contentCatalog to access page.asciidoc.attributes where the
add-git-dates extension stores Git commit dates.

Updated structured data template to use these helpers instead of
directly accessing page.attributes.git-created-date, which don't
exist at template render time.

This fixes the issue where datePublished and dateModified were
showing today's date instead of actual Git commit dates.

Also added page-has-markdown attribute to preview page for testing
the "Page options" dropdown.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@JakeSCahill JakeSCahill requested a review from micheleRP March 23, 2026 19:25
Copy link
Copy Markdown

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. JSON string safety in head-structured-data.hbs (medium risk)

All dynamic values use {{...}} (HTML escaping), but this is inside a <script type="application/ld+json"> block where HTML escaping is wrong:

  • A \ in a page title produces invalid JSON (e.g., "title": "C:\path" → parse error)
  • A </script> substring in any attribute would break out of the script element
  • & in titles/descriptions becomes &amp; (valid JSON but wrong decoded value)

These values need JSON-safe encoding — either a dedicated json-stringify Handlebars helper or explicit escaping of \ and " before output.


2. applicationCategory is hardcoded in head-structured-data.hbs

"applicationCategory": "Agentic Data Plane"

This is applied to every page across all components (Redpanda, Console, Connect, etc.). For a connector docs page, "Agentic Data Plane" would be inaccurate. Consider pulling this from a site/component attribute, or omitting applicationCategory if there's no per-component value.


3. isPartOf @id reference may not resolve in head-structured-data.hbs

"isPartOf": { "@id": ".../#website" }

For this to work as linked data, the WebSite node in the same graph needs "@id": "https://docs.redpanda.com/#website" set explicitly. Please verify that the WebSite node added in PR #367 sets that @id, otherwise this is a dangling reference.


4. Build-time date fallback in head-structured-data.hbs

Using {{iso-date}} as the fallback for datePublished/dateModified means every page appears as published/modified at build time on every build. The git-created-date and page-release-date checks are good, but the {{iso-date}} fallback is worse than omitting the field entirely for pages without explicit dates.

JakeSCahill and others added 2 commits March 24, 2026 07:48
Fixed all issues raised by Michele in PR #371:

1. **Security: JSON string safety** (head-structured-data.hbs)
   - Created json-safe helper to escape dynamic values in JSON-LD
   - Escapes backslashes, quotes, and </script> tags
   - Prevents invalid JSON and script breakout vulnerabilities
   - Applied to all dynamic fields (title, description, keywords, etc.)

2. **Code quality: Hardcoded applicationCategory** (head-structured-data.hbs)
   - Removed hardcoded "Agentic Data Plane" value
   - Now conditional on page.component.asciidoc.attributes.application-category
   - Allows per-component customization instead of global value

3. **Code quality: Build-time date fallback** (head-structured-data.hbs)
   - Removed {{iso-date}} fallback for datePublished/dateModified
   - Pages without explicit dates now omit the field entirely
   - Prevents misleading "published at build time" metadata

4. **Bug fix: Root path edge case** (markdown-url.js)
   - Added guard for url === '/' to return '/index.md'
   - Prevents invalid '.md' output from url.slice(0, -1)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Business decision: Keep applicationCategory as 'Agentic Data Plane' across
all documentation to increase visibility for this new category.

This reverts the conditional logic added in the previous commit while
keeping the json-safe helper for proper escaping.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@JakeSCahill JakeSCahill requested a review from micheleRP March 24, 2026 08:10
JakeSCahill and others added 2 commits March 24, 2026 08:13
Changes:
- Create page-attribute helper to properly access AsciiDoc document attributes
  via contentCatalog query (page.attributes.XXX doesn't work for doc attributes)
- Update head-structured-data.hbs to use page-attribute helper for all
  AsciiDoc document attributes (description, keywords, page-categories,
  page-release-date, page-prerequisites, page-topic-type)
- Change from page-role to page-topic-type (correct attribute name)
- Use consistent {{#with}} pattern for optional fields

This fixes the issue where AsciiDoc document attributes weren't rendering
in structured data JSON-LD because they need to be accessed through
contentCatalog, not directly from the page object.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@micheleRP
Copy link
Copy Markdown

Integration Test Review

Built the UI bundle from this branch, integrated with docs-extensions-and-macros PR #178 (ai-optimization-frontmatter-exports), and ran a full Antora build against the docs repo. 495 pages rendered.

What Works

Check Result
JSON-LD validity ✅ 495/495 pages produce valid, parseable JSON
TechArticle present ✅ 495/495 pages with titles get a TechArticle entry
Keywords ✅ 181/495 pages populate keywords (from page-categories)
Robots meta (preview) ✅ Correctly shows noindex when preview: true
Robots meta (production) else branch adds index, follow, max-snippet:-1, ... — logic correct
Mandatory fields ✅ headline, url, author, publisher, inLanguage, isPartOf always present
about block ✅ Correctly populated with component title + "Agentic Data Plane"

Bugs Found

BUG 1 (High): Git dates never reach the template

The add-git-dates extension sets page.asciidoc.attributes['git-created-date'], but the template reads page.attributes.git-created-date. In Antora's UI model, page.attributes only contains attributes prefixed with page-. Since the git date attributes don't have that prefix, they're invisible to the template.

Result: All 495 pages show today's date (the {{iso-date}} fallback), not actual git dates.

Fix options:

  • Change the extension to set page-git-created-date / page-git-modified-date (so Antora exposes them in page.attributes)
  • Or restore the git-created-date / git-modified-date Handlebars helpers that query contentCatalog directly (the approach from an earlier commit)

BUG 2 (Medium): Description never populates

The template uses page.attributes.description, but :description: is an AsciiDoc intrinsic attribute — it's available as page.description in Antora's UI model, NOT as page.attributes.description.

Result: 0/495 pages have a description in their TechArticle, even though many pages set :description:.

Fix: Change {{#if page.attributes.description}}{{#if page.description}} and update the value reference similarly.

BUG 3 (Low): HTML entities in headlines

Some page titles contain AsciiDoc markup that renders as HTML entities in JSON-LD:

  • What&amp;#8217;s New (should be What's New)
  • Generate a Debug Bundle with &lt;code&gt;rpk&lt;/code&gt; (should be plain text)

Affected pages: 7/495. Produces semantically incorrect Schema.org data but doesn't break JSON validity.

Minor Notes

  • page.attributes.keywords won't work for the same intrinsic-attribute reason as description. The page.attributes.categories fallback (from :page-categories:) is what's actually populating keywords for 181 pages.
  • page.attributes.navtitle for alternativeHeadline may also need checking — depends on whether Antora exposes it in page.attributes.
  • PR description mentions ai-content-declaration: documentation but this isn't in the actual code.

Copy link
Copy Markdown

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JakeSCahill approving to not hold this up, but please see Claude's results from testing it all together!

JakeSCahill and others added 3 commits March 25, 2026 18:02
…ities

Bug fixes:
- git-created-date.js: Query page-git-created-date (with page- prefix)
- git-modified-date.js: Query page-git-modified-date (with page- prefix)
- page-attribute.js: Fall back to page.description/page.keywords for intrinsic attributes
- json-safe.js: Decode HTML entities before JSON escaping

The page- prefix is required because Antora only exposes attributes with
this prefix to page.attributes in the UI model. The extension now sets
page-git-created-date which becomes accessible as page.attributes['git-created-date'].

HTML entity decoding handles:
- Named entities (&amp;, &rsquo;, &mdash;, etc.)
- Numeric decimal entities (&#8217;)
- Numeric hex entities (&#x2019;)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Git dates are available as page.attributes.git-created-date and
page.attributes.git-modified-date directly since the extension
uses the page- prefix. No need for contentCatalog.getById() lookup.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed is-beta-feature, is-limited-availability-feature, and
is-enterprise helpers from O(n) per-lookup to O(1) by building
a URL->attribute Map once per component.

Before: Each nav item triggered a linear scan of all pages
After: Single scan per component, then constant-time lookups

Benchmark: ~45% faster builds (3:00 -> 1:45)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
JakeSCahill and others added 5 commits March 26, 2026 14:34
Added schema.org/TechArticle properties:
- learningResourceType: Maps from page-topic-type
- audience: Maps from personas attribute
- teaches: Maps from learning-objective-* attributes
- version: Component version
- articleSection: Module name
- isAccessibleForFree: true

These properties improve SEO and AI discoverability by providing
richer semantic metadata about documentation content.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
genre is meant for creative works (comedy, drama, etc.)
learningResourceType is the correct property for documentation types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use page.description and page.keywords directly (intrinsic)
- Use page.attributes.* for page-prefixed attributes
- Only use page-attribute helper for non-prefixed attrs:
  - personas
  - learning-objective-*

Reduces contentCatalog.getById() calls from 8 to 4 per page.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Includes FAQPage in @graph when page-faq-json-ld attribute exists.

Works with docs-extensions-and-macros FAQ extension that:
- Auto-extracts Q&A from page sections
- Supports manual overrides
- Generates schema.org compliant FAQPage

Example output:
{
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How do I install?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Install using..."
      },
      "url": "https://docs.redpanda.com/page#section"
    }
  ]
}

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants