Skip to content

fix(utils): escape HTML entities in inline code before removing HTML tags#11877

Closed
perashanid wants to merge 1 commit intofacebook:mainfrom
perashanid:fix/excerpt-xml-tags-in-inline-code-11818
Closed

fix(utils): escape HTML entities in inline code before removing HTML tags#11877
perashanid wants to merge 1 commit intofacebook:mainfrom
perashanid:fix/excerpt-xml-tags-in-inline-code-11818

Conversation

@perashanid
Copy link
Copy Markdown

Description

Fixes #11818

This PR fixes an issue where XML tags inside inline code (e.g., `<metadata>`) were being incorrectly removed when creating page excerpts, resulting in empty backticks in the metadata description tags.

Problem

When Docusaurus generates page metadata (description, og:description), it uses the createExcerpt function to extract the first meaningful sentence. However, when that sentence contains inline code with XML-like tags, the tags were being removed, leaving only empty backticks.

Example:

Removes the [`<metadata>`](https://developer.mozilla.org/en-US/docs/Web/SVG/Element/metadata) element from the document.

Before this fix:

<meta name="description" content="Removes the `` element from the document.">

After this fix:

<meta name="description" content="Removes the &lt;metadata&gt; element from the document.">

Solution

The issue was in the order of regex replacements in createExcerpt:

  1. HTML tags were removed first: .replace(/<[^>]*>/g, '')
  2. Then inline code was processed: .replace(/(?.+?)/g, '$1')

This meant <metadata> inside backticks was treated as an HTML tag and removed before the inline code processing could preserve it.

The fix:

  1. First, escape HTML entities (<&lt;, >&gt;) inside inline code
  2. Then, remove HTML tags (which now won't match the escaped entities)
  3. Finally, remove the backticks (content is already escaped)

Changes Made

  • Modified packages/docusaurus-utils/src/markdownUtils.ts:

    • Added a new regex replacement at the beginning of the cleaning chain to escape HTML entities inside inline code
    • Updated comment for the inline code removal step to clarify it now just removes backticks
  • Added test cases in packages/docusaurus-utils/src/__tests__/markdownUtils.test.ts:

    • Test for XML tag inside inline code
    • Test for XML tag inside inline code with hyperlink

Testing

  • Code follows project style guidelines
  • No TypeScript errors or linting issues
  • Added test cases that verify the fix
  • Tested manually with the provided test cases from the issue
  • Existing functionality preserved (regression tests pass)

Test Results

All new tests pass:

it('creates excerpt with XML tag inside inline code', () => {
  expect(
    createExcerpt(dedent`
        # Markdown Regular Title

        This paragraph includes a link to the \`<metadata>\` documentation.
      `),
  ).toBe('This paragraph includes a link to the &lt;metadata&gt; documentation.');
});

it('creates excerpt with XML tag inside inline code with hyperlink', () => {
  expect(
    createExcerpt(dedent`
        # Markdown Regular Title

        This paragraph includes a link to the [\`<metadata>\`](https://developer.mozilla.org/en-US/docs/Web/SVG/Element/metadata) documentation.
      `),
  ).toBe('This paragraph includes a link to the &lt;metadata&gt; documentation.');
});

Related Issue

Closes #11818

…tags

Fixes facebook#11818

When creating excerpts, XML tags inside inline code (e.g., \<metadata>\) were being removed by the HTML tag removal regex before the inline code was processed. This resulted in empty backticks in the excerpt metadata.

The fix escapes HTML entities (&lt; and &gt;) inside inline code BEFORE removing HTML tags, ensuring that XML-like content within backticks is preserved in the final excerpt.

Added test cases to verify the fix works for both standalone inline code and inline code within hyperlinks.
@meta-cla meta-cla bot added the CLA Signed Signed Facebook CLA label Apr 2, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 2, 2026

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit 7c60825
🔍 Latest deploy log https://app.netlify.com/projects/docusaurus-2/deploys/69ce70dac5ef72000868c37e
😎 Deploy Preview https://deploy-preview-11877--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@slorber
Copy link
Copy Markdown
Collaborator

slorber commented Apr 2, 2026

Please:

@slorber slorber closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Signed Facebook CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect page metadata when using markdown inline-code with XML tags inside

2 participants