Add AI-friendly meta tags and enhanced structured data#371
Add AI-friendly meta tags and enhanced structured data#371JakeSCahill wants to merge 22 commits intomainfrom
Conversation
## Changes ### head-meta.hbs - Add AI-friendly robots meta tag for production pages - Include max-snippet:-1, max-image-preview:large directives - Add ai-content-declaration meta tag - Keep noindex for prereleases and previews ### head-structured-data.hbs - Add TechArticle schema.org type for individual pages - Include page title, description, and URL - Link to organization and website schemas - Add software application context for component pages ## Benefits - Better AI crawler understanding of page content - Improved discoverability by AI search engines - Richer metadata for AI-powered tools - Standards-compliant structured data
✅ Deploy Preview for docs-ui ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe pull request modifies two Handlebars template partials to enhance SEO and structured data markup. The head-meta template adds conditional search engine indexing directives and AI content declaration meta tags for production pages. The head-structured-data template extends the JSON-LD graph by conditionally adding a TechArticle node with properties including headline, description, publication dates, and software component details when page information is available. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/partials/head-structured-data.hbs (1)
56-57: Use content-based timestamps instead of build date for article dates.Lines 56–57 use
{{iso-date}}, which returns today's date at build time, causing every page to appear newly published/modified on each build. This weakens structured-data accuracy for search engines and readers.To fix this, consider:
- Adding date metadata fields to your content (e.g.,
docdatetime,revdate, or a custompublished-date/modified-dateattribute)- Using git commit timestamps if content is version-controlled
- Mapping
datePublishedanddateModifiedto these fields withiso-dateonly as a fallback for pages without explicit dates🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/partials/head-structured-data.hbs` around lines 56 - 57, The structured-data partial currently sets "datePublished" and "dateModified" to the build-time helper {{iso-date}} which makes all pages appear newly published/modified; update the head-structured-data.hbs partial to read content-level date metadata first (e.g., check fields like docdatetime, revdate, published-date, modified-date or git-derived timestamps) and only fall back to {{iso-date}} when those fields are absent so "datePublished" and "dateModified" use content-based timestamps; locate the "datePublished" and "dateModified" lines in head-structured-data.hbs and map them to those metadata properties with {{iso-date}} as the fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/partials/head-structured-data.hbs`:
- Around line 52-54: The JSON-LD description currently uses unescaped raw output
({{{page.attributes.description}}}) which can break JSON or allow
script-breakout; replace the triple-stash with a safe, JSON-stringified/escaped
value (e.g. use a Handlebars helper that returns JSON.stringify(value) or
Handlebars.escapeExpression on page.attributes.description) and output that
helper with normal {{...}} so the "description" field contains a properly
escaped JSON string; update the partial head-structured-data.hbs to call that
helper instead of {{{page.attributes.description}}} (reference:
page.attributes.description in this partial).
---
Nitpick comments:
In `@src/partials/head-structured-data.hbs`:
- Around line 56-57: The structured-data partial currently sets "datePublished"
and "dateModified" to the build-time helper {{iso-date}} which makes all pages
appear newly published/modified; update the head-structured-data.hbs partial to
read content-level date metadata first (e.g., check fields like docdatetime,
revdate, published-date, modified-date or git-derived timestamps) and only fall
back to {{iso-date}} when those fields are absent so "datePublished" and
"dateModified" use content-based timestamps; locate the "datePublished" and
"dateModified" lines in head-structured-data.hbs and map them to those metadata
properties with {{iso-date}} as the fallback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 59bc3f17-61f2-4f8e-9ffd-6fc77801622f
📒 Files selected for processing (2)
src/partials/head-meta.hbssrc/partials/head-structured-data.hbs
After research, ai-content-declaration is not a recognized standard. The industry uses C2PA metadata instead of HTML meta tags for AI content declarations. Keeping only the official Google robots meta tags (max-snippet, max-image-preview, max-video-preview).
- Escape description field properly: use {{...}} instead of {{{...}}}
to prevent JSON breakage from quotes/newlines in descriptions
- Use content-level dates for datePublished/dateModified: check
docdatetime and revdate attributes before falling back to build-time
iso-date so dates reflect actual content timestamps
Change indexify URLs from /path/index.md to /path.md to match the new markdown export structure. This fixes "Copy as Markdown" and "View as plain text" buttons to use correct URLs.
Add richer structured data for better search engine understanding: - alternativeHeadline: navtitle if different from title - keywords: from page keywords or categories attributes - genre: from page-role (tutorial, how-to, concept, reference) - dependencies: from page-prerequisites attribute - datePublished: use page-release-date first (when page created), then fall back to docdatetime, revdate, iso-date - dateModified: use docdatetime (last updated), then revdate, iso-date This provides search engines with more context about documentation pages while only using data already in page attributes.
- Use git-created-date for datePublished with page-release-date fallback - Use git-modified-date for dateModified with page-release-date fallback - Replaces build-time date fallbacks with accurate Git commit history - Works with add-git-dates extension in docs-extensions-and-macros Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The generate:bloblang-grammar task was hanging indefinitely in GitHub Actions when fetching from GitHub/docs.redpanda.com due to network issues. Added timeout option and timeout event handler to fail fast instead of hanging for hours. This allows fallback versions to be tried and builds to complete. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created git-created-date and git-modified-date helpers that query contentCatalog to access page.asciidoc.attributes where the add-git-dates extension stores Git commit dates. Updated structured data template to use these helpers instead of directly accessing page.attributes.git-created-date, which don't exist at template render time. This fixes the issue where datePublished and dateModified were showing today's date instead of actual Git commit dates. Also added page-has-markdown attribute to preview page for testing the "Page options" dropdown. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
micheleRP
left a comment
There was a problem hiding this comment.
1. JSON string safety in head-structured-data.hbs (medium risk)
All dynamic values use {{...}} (HTML escaping), but this is inside a <script type="application/ld+json"> block where HTML escaping is wrong:
- A
\in a page title produces invalid JSON (e.g.,"title": "C:\path"→ parse error) - A
</script>substring in any attribute would break out of the script element &in titles/descriptions becomes&(valid JSON but wrong decoded value)
These values need JSON-safe encoding — either a dedicated json-stringify Handlebars helper or explicit escaping of \ and " before output.
2. applicationCategory is hardcoded in head-structured-data.hbs
This is applied to every page across all components (Redpanda, Console, Connect, etc.). For a connector docs page, "Agentic Data Plane" would be inaccurate. Consider pulling this from a site/component attribute, or omitting applicationCategory if there's no per-component value.
3. isPartOf @id reference may not resolve in head-structured-data.hbs
For this to work as linked data, the WebSite node in the same graph needs "@id": "https://docs.redpanda.com/#website" set explicitly. Please verify that the WebSite node added in PR #367 sets that @id, otherwise this is a dangling reference.
4. Build-time date fallback in head-structured-data.hbs
Using {{iso-date}} as the fallback for datePublished/dateModified means every page appears as published/modified at build time on every build. The git-created-date and page-release-date checks are good, but the {{iso-date}} fallback is worse than omitting the field entirely for pages without explicit dates.
Fixed all issues raised by Michele in PR #371: 1. **Security: JSON string safety** (head-structured-data.hbs) - Created json-safe helper to escape dynamic values in JSON-LD - Escapes backslashes, quotes, and </script> tags - Prevents invalid JSON and script breakout vulnerabilities - Applied to all dynamic fields (title, description, keywords, etc.) 2. **Code quality: Hardcoded applicationCategory** (head-structured-data.hbs) - Removed hardcoded "Agentic Data Plane" value - Now conditional on page.component.asciidoc.attributes.application-category - Allows per-component customization instead of global value 3. **Code quality: Build-time date fallback** (head-structured-data.hbs) - Removed {{iso-date}} fallback for datePublished/dateModified - Pages without explicit dates now omit the field entirely - Prevents misleading "published at build time" metadata 4. **Bug fix: Root path edge case** (markdown-url.js) - Added guard for url === '/' to return '/index.md' - Prevents invalid '.md' output from url.slice(0, -1) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Business decision: Keep applicationCategory as 'Agentic Data Plane' across all documentation to increase visibility for this new category. This reverts the conditional logic added in the previous commit while keeping the json-safe helper for proper escaping. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes:
- Create page-attribute helper to properly access AsciiDoc document attributes
via contentCatalog query (page.attributes.XXX doesn't work for doc attributes)
- Update head-structured-data.hbs to use page-attribute helper for all
AsciiDoc document attributes (description, keywords, page-categories,
page-release-date, page-prerequisites, page-topic-type)
- Change from page-role to page-topic-type (correct attribute name)
- Use consistent {{#with}} pattern for optional fields
This fixes the issue where AsciiDoc document attributes weren't rendering
in structured data JSON-LD because they need to be accessed through
contentCatalog, not directly from the page object.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Integration Test ReviewBuilt the UI bundle from this branch, integrated with docs-extensions-and-macros PR #178 ( What Works
Bugs FoundBUG 1 (High): Git dates never reach the templateThe Result: All 495 pages show today's date (the Fix options:
BUG 2 (Medium): Description never populatesThe template uses Result: 0/495 pages have a description in their TechArticle, even though many pages set Fix: Change BUG 3 (Low): HTML entities in headlinesSome page titles contain AsciiDoc markup that renders as HTML entities in JSON-LD:
Affected pages: 7/495. Produces semantically incorrect Schema.org data but doesn't break JSON validity. Minor Notes
|
micheleRP
left a comment
There was a problem hiding this comment.
@JakeSCahill approving to not hold this up, but please see Claude's results from testing it all together!
…ities Bug fixes: - git-created-date.js: Query page-git-created-date (with page- prefix) - git-modified-date.js: Query page-git-modified-date (with page- prefix) - page-attribute.js: Fall back to page.description/page.keywords for intrinsic attributes - json-safe.js: Decode HTML entities before JSON escaping The page- prefix is required because Antora only exposes attributes with this prefix to page.attributes in the UI model. The extension now sets page-git-created-date which becomes accessible as page.attributes['git-created-date']. HTML entity decoding handles: - Named entities (&, ’, —, etc.) - Numeric decimal entities (’) - Numeric hex entities (’) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Git dates are available as page.attributes.git-created-date and page.attributes.git-modified-date directly since the extension uses the page- prefix. No need for contentCatalog.getById() lookup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed is-beta-feature, is-limited-availability-feature, and is-enterprise helpers from O(n) per-lookup to O(1) by building a URL->attribute Map once per component. Before: Each nav item triggered a linear scan of all pages After: Single scan per component, then constant-time lookups Benchmark: ~45% faster builds (3:00 -> 1:45) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added schema.org/TechArticle properties: - learningResourceType: Maps from page-topic-type - audience: Maps from personas attribute - teaches: Maps from learning-objective-* attributes - version: Component version - articleSection: Module name - isAccessibleForFree: true These properties improve SEO and AI discoverability by providing richer semantic metadata about documentation content. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
genre is meant for creative works (comedy, drama, etc.) learningResourceType is the correct property for documentation types Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use page.description and page.keywords directly (intrinsic) - Use page.attributes.* for page-prefixed attributes - Only use page-attribute helper for non-prefixed attrs: - personas - learning-objective-* Reduces contentCatalog.getById() calls from 8 to 4 per page. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Includes FAQPage in @graph when page-faq-json-ld attribute exists. Works with docs-extensions-and-macros FAQ extension that: - Auto-extracts Q&A from page sections - Supports manual overrides - Generates schema.org compliant FAQPage Example output: { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "How do I install?", "acceptedAnswer": { "@type": "Answer", "text": "Install using..." }, "url": "https://docs.redpanda.com/page#section" } ] } Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Summary
Enhances the Antora UI bundle with AI-friendly optimizations:
Changes
src/partials/head-meta.hbs
max-snippet:-1(no snippet length limit)max-image-preview:large(allow large image previews)max-video-preview:-1(no video preview limit)ai-content-declaration: documentation(explicit content type)noindexfor prerelease and preview pagessrc/partials/head-structured-data.hbs
isPartOfapplicationCategoryfrom "Streaming Data Platform" to "Agentic Data Plane"Benefits
Testing
Verified template syntax is correct (no bundle build required for review).
Related PRs