From b97d58ab1d9fdab5b1064c91a6feaca77908b2f0 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Tue, 8 Jul 2025 15:15:45 -0400 Subject: [PATCH 01/11] Document HTML sanitation policy --- docs/reference.md | 82 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 78 insertions(+), 4 deletions(-) diff --git a/docs/reference.md b/docs/reference.md index de7e26f4b..46b74409a 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -25,7 +25,33 @@ instance of the `markdown.Markdown` class and pass multiple documents through it. If you do use a single instance though, make sure to call the `reset` method appropriately ([see below](#convert)). -### markdown.markdown(text [, **kwargs]) {: #markdown data-toc-label='markdown.markdown' } +### `markdown.markdown(text [, **kwargs])` {: #markdown data-toc-label='markdown.markdown' } + +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. See [Markdown and + XSS] for an overview of some of the dangers and [Improper markup + sanitization in popular software] for notes on best practices to ensure + HTML is properly sanitized. + + The developers of Python-Markdown recommend using [nh3] or [bleach][][^1] + as a sanitizer on the output of `markdown.markdown`. However, be + aware that those libraries may not be sufficient in themselves and will + likely require customization. Some useful lists of allowed tags and + attributes can be found in the [bleach-allowlist] library, which should + work with either sanitizer. + + +[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/ +[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md +[nh3]: https://nh3.readthedocs.io/en/latest/ +[bleach]: http://bleach.readthedocs.org/en/latest/ +[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist +[^1]: We are aware that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). +However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for +those who cannot use [nh3] (Python bindings to a Rust library). The following options are available on the `markdown.markdown` function: @@ -216,7 +242,23 @@ __encoding__{: #encoding } meet your specific needs, it is suggested that you write your own code to handle your encoding/decoding needs. -### markdown.Markdown([**kwargs]) {: #Markdown data-toc-label='markdown.Markdown' } +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. See [Markdown and + XSS] for an overview of some of the dangers and [Improper markup + sanitization in popular software] for notes on best practices to ensure + HTML is properly sanitized. + + The developers of Python-Markdown recommend using [nh3] or [bleach][] + [^1] as a sanitizer on the output of `markdown.markdownFromFile`. + However, be aware that those libraries may not be sufficient in + themselves and will likely require customization. Some useful lists of + allowed tags and attributes can be found in the + [bleach-allowlist] library, which should work with either sanitizer. + +### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' } The same options are available when initializing the `markdown.Markdown` class as on the [`markdown.markdown`](#markdown) function, except that the class does @@ -229,7 +271,7 @@ string must be passed to one of two instance methods. the thread they were created in. A single instance should not be accessed from multiple threads. -#### Markdown.convert(source) {: #convert data-toc-label='Markdown.convert' } +#### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' } The `source` text must meet the same requirements as the [`text`](#text) argument of the [`markdown.markdown`](#markdown) function. @@ -258,7 +300,23 @@ To make this easier, you can also chain calls to `reset` together: html3 = md.reset().convert(text3) ``` -#### Markdown.convertFile(**kwargs) {: #convertFile data-toc-label='Markdown.convertFile' } +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. See [Markdown and + XSS] for an overview of some of the dangers and [Improper markup + sanitization in popular software] for notes on best practices to ensure + HTML is properly sanitized. + + The developers of Python-Markdown recommend using [nh3] or [bleach][] + [^1] as a sanitizer on the output of `Markdown.convert`. However, be + aware that those libraries may not be sufficient in themselves and will + likely require customization. Some useful lists of allowed tags and + attributes can be found in the [bleach-allowlist] library, which should + work with either sanitizer. + +#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' } The arguments of this method are identical to the arguments of the same name on the `markdown.markdownFromFile` function ([`input`](#input), @@ -267,3 +325,19 @@ name on the `markdown.markdownFromFile` function ([`input`](#input), process multiple files without creating a new instance of the class for each document. State may need to be `reset` between each call to `convertFile` as is the case with `convert`. + +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. See [Markdown and + XSS] for an overview of some of the dangers and [Improper markup + sanitization in popular software] for notes on best practices to ensure + HTML is properly sanitized. + + The developers of Python-Markdown recommend using [nh3] or [bleach][] + [^1] as a sanitizer on the output of `Markdown.convertFile`. However, be + aware that those libraries may not be sufficient in themselves and will + likely require customization. Some useful lists of allowed tags and + attributes can be found in the [bleach-allowlist] library, which should + work with either sanitizer. From 3cbcebc599ef0fcbf1ea7eed8c8b628c5e03927e Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Tue, 8 Jul 2025 15:29:43 -0400 Subject: [PATCH 02/11] cleanup spelling --- docs/reference.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/reference.md b/docs/reference.md index 46b74409a..0960a1deb 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -36,11 +36,11 @@ method appropriately ([see below](#convert)). sanitization in popular software] for notes on best practices to ensure HTML is properly sanitized. - The developers of Python-Markdown recommend using [nh3] or [bleach][][^1] + The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] as a sanitizer on the output of `markdown.markdown`. However, be aware that those libraries may not be sufficient in themselves and will likely require customization. Some useful lists of allowed tags and - attributes can be found in the [bleach-allowlist] library, which should + attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer. @@ -251,12 +251,12 @@ __encoding__{: #encoding } sanitization in popular software] for notes on best practices to ensure HTML is properly sanitized. - The developers of Python-Markdown recommend using [nh3] or [bleach][] - [^1] as a sanitizer on the output of `markdown.markdownFromFile`. + The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] + as a sanitizer on the output of `markdown.markdownFromFile`. However, be aware that those libraries may not be sufficient in themselves and will likely require customization. Some useful lists of allowed tags and attributes can be found in the - [bleach-allowlist] library, which should work with either sanitizer. + [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer. ### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' } @@ -309,11 +309,11 @@ html3 = md.reset().convert(text3) sanitization in popular software] for notes on best practices to ensure HTML is properly sanitized. - The developers of Python-Markdown recommend using [nh3] or [bleach][] - [^1] as a sanitizer on the output of `Markdown.convert`. However, be + The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] + as a sanitizer on the output of `Markdown.convert`. However, be aware that those libraries may not be sufficient in themselves and will likely require customization. Some useful lists of allowed tags and - attributes can be found in the [bleach-allowlist] library, which should + attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer. #### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' } @@ -335,9 +335,9 @@ each document. State may need to be `reset` between each call to sanitization in popular software] for notes on best practices to ensure HTML is properly sanitized. - The developers of Python-Markdown recommend using [nh3] or [bleach][] - [^1] as a sanitizer on the output of `Markdown.convertFile`. However, be + The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] + as a sanitizer on the output of `Markdown.convertFile`. However, be aware that those libraries may not be sufficient in themselves and will likely require customization. Some useful lists of allowed tags and - attributes can be found in the [bleach-allowlist] library, which should + attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer. From 41b057388d97a4e90c1fdf05af89d534fce743a1 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Tue, 8 Jul 2025 15:34:27 -0400 Subject: [PATCH 03/11] more spelling cleanup --- .spell-dict | 2 ++ docs/reference.md | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.spell-dict b/.spell-dict index 51a6a3270..5f8099aa6 100644 --- a/.spell-dict +++ b/.spell-dict @@ -111,6 +111,7 @@ rST ryneeverett sanitizer sanitizers +sanitization Sauder schemeless setuptools @@ -168,6 +169,7 @@ workflow Xanthakis XHTML xhtml +XSS YAML Yunusov inline diff --git a/docs/reference.md b/docs/reference.md index 0960a1deb..5e920e056 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -51,7 +51,7 @@ method appropriately ([see below](#convert)). [bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist [^1]: We are aware that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for -those who cannot use [nh3] (Python bindings to a Rust library). +those who cannot use [`nh3`][nh3] (Python bindings to a Rust library). The following options are available on the `markdown.markdown` function: From 320ef39eed3381c9c4c706b3bfc5d58e205ef3fa Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 11:03:44 -0500 Subject: [PATCH 04/11] Recommend JustHTML and customize comments for each function/method. --- docs/reference.md | 108 +++++++++++++++++++++++++--------------------- 1 file changed, 60 insertions(+), 48 deletions(-) diff --git a/docs/reference.md b/docs/reference.md index 5e920e056..9eeb5d8d5 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -36,22 +36,31 @@ method appropriately ([see below](#convert)). sanitization in popular software] for notes on best practices to ensure HTML is properly sanitized. - The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] - as a sanitizer on the output of `markdown.markdown`. However, be - aware that those libraries may not be sufficient in themselves and will - likely require customization. Some useful lists of allowed tags and - attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should + The developers of Python-Markdown recommend using [JustHTML] as a + sanitizer on the output of `markdown.markdown`. JustHTML includes a + built-in HTML sanitizer. When you pass the HTML output through JustHTML + (`JustHTML(markdown.markdown(text), fragment=True).to_html())`), it + is sanitized by default according to a strict [allow list policy]. The + policy can be [customized] if necessary. + + If you cannot use JustHTML for some reason, some alternatives include + [`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those + libraries will not be sufficient in themselves and will require + customization. Some useful lists of allowed tags and attributes can be + found in the [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer. [Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/ [Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md +[JustHTML]: https://emilstenstrom.github.io/justhtml/ +[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy +[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy [nh3]: https://nh3.readthedocs.io/en/latest/ [bleach]: http://bleach.readthedocs.org/en/latest/ [bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist -[^1]: We are aware that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). -However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for -those who cannot use [`nh3`][nh3] (Python bindings to a Rust library). +[^1]: Note that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). +However, it may be the only option for some users. The following options are available on the `markdown.markdown` function: @@ -205,6 +214,20 @@ __tab_length__{: #tab_length }: ### `markdown.markdownFromFile (**kwargs)` {: #markdownFromFile data-toc-label='markdown.markdownFromFile' } +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. See [Markdown and + XSS] for an overview of some of the dangers and [Improper markup + sanitization in popular software] for notes on best practices to ensure + HTML is properly sanitized. + + As `markdown.markdownFromFile` writes directly to the file system, there + is no easy way to sanitize the output from Python code. Therefore, it is + recommended that the `markdown.markdownFromFile` function not be used on + input from an untrusted source. + With a few exceptions, `markdown.markdownFromFile` accepts the same options as `markdown.markdown`. It does **not** accept a `text` (or Unicode) string. Instead, it accepts the following required options: @@ -242,22 +265,6 @@ __encoding__{: #encoding } meet your specific needs, it is suggested that you write your own code to handle your encoding/decoding needs. -!!! warning - - The Python-Markdown library does ***not*** sanitize its HTML output. If - you are processing Markdown input from an untrusted source, it is your - responsibility to ensure that it is properly sanitized. See [Markdown and - XSS] for an overview of some of the dangers and [Improper markup - sanitization in popular software] for notes on best practices to ensure - HTML is properly sanitized. - - The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] - as a sanitizer on the output of `markdown.markdownFromFile`. - However, be aware that those libraries may not be sufficient in - themselves and will likely require customization. Some useful lists of - allowed tags and attributes can be found in the - [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer. - ### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' } The same options are available when initializing the `markdown.Markdown` class @@ -273,6 +280,29 @@ string must be passed to one of two instance methods. #### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' } +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. See [Markdown and + XSS] for an overview of some of the dangers and [Improper markup + sanitization in popular software] for notes on best practices to ensure + HTML is properly sanitized. + + The developers of Python-Markdown recommend using [JustHTML] as a + sanitizer on the output of `Markdown.convert`. JustHTML includes a + built-in HTML sanitizer. When you pass the HTML output through JustHTML + (`JustHTML(md.convert(text), fragment=True).to_html())`), it + is sanitized by default according to a strict [allow list policy]. The + policy can be [customized] if necessary. + + If you cannot use JustHTML for some reason, some alternatives include + [`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those + libraries will not be sufficient in themselves and will require + customization. Some useful lists of allowed tags and attributes can be + found in the [`bleach-allowlist`][bleach-allowlist] library, which should + work with either sanitizer. + The `source` text must meet the same requirements as the [`text`](#text) argument of the [`markdown.markdown`](#markdown) function. @@ -300,6 +330,8 @@ To make this easier, you can also chain calls to `reset` together: html3 = md.reset().convert(text3) ``` +#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' } + !!! warning The Python-Markdown library does ***not*** sanitize its HTML output. If @@ -309,14 +341,10 @@ html3 = md.reset().convert(text3) sanitization in popular software] for notes on best practices to ensure HTML is properly sanitized. - The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] - as a sanitizer on the output of `Markdown.convert`. However, be - aware that those libraries may not be sufficient in themselves and will - likely require customization. Some useful lists of allowed tags and - attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should - work with either sanitizer. - -#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' } + As `Markdown.convertFile` writes directly to the file system, there + is no easy way to sanitize the output from Python code. Therefore, it is + recommended that the `Markdown.convertFile` method not be used on + input from an untrusted source. The arguments of this method are identical to the arguments of the same name on the `markdown.markdownFromFile` function ([`input`](#input), @@ -325,19 +353,3 @@ name on the `markdown.markdownFromFile` function ([`input`](#input), process multiple files without creating a new instance of the class for each document. State may need to be `reset` between each call to `convertFile` as is the case with `convert`. - -!!! warning - - The Python-Markdown library does ***not*** sanitize its HTML output. If - you are processing Markdown input from an untrusted source, it is your - responsibility to ensure that it is properly sanitized. See [Markdown and - XSS] for an overview of some of the dangers and [Improper markup - sanitization in popular software] for notes on best practices to ensure - HTML is properly sanitized. - - The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1] - as a sanitizer on the output of `Markdown.convertFile`. However, be - aware that those libraries may not be sufficient in themselves and will - likely require customization. Some useful lists of allowed tags and - attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should - work with either sanitizer. From cddb247c756f093e78516af55f02862681048baa Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 11:16:00 -0500 Subject: [PATCH 05/11] spell dict update --- .spell-dict | 1 + 1 file changed, 1 insertion(+) diff --git a/.spell-dict b/.spell-dict index 5f8099aa6..bba4338fe 100644 --- a/.spell-dict +++ b/.spell-dict @@ -54,6 +54,7 @@ implementers InlineProcessor Jiryu JSON +JustHTML keepachangelog Kjell Krech From 160dcffafb1946c5eb180afd55adc13e19776e98 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 14:22:19 -0500 Subject: [PATCH 06/11] move to sanitization.md and document for CLI --- .spell-dict | 1 + docs/cli.md | 9 ++++- docs/reference.md | 86 +++++++++----------------------------------- docs/sanitization.md | 76 +++++++++++++++++++++++++++++++++++++++ markdown/__main__.py | 8 +++-- mkdocs.yml | 1 + 6 files changed, 109 insertions(+), 72 deletions(-) create mode 100644 docs/sanitization.md diff --git a/.spell-dict b/.spell-dict index bba4338fe..f8278dc8b 100644 --- a/.spell-dict +++ b/.spell-dict @@ -78,6 +78,7 @@ munge namespace NanoDOM Neale +nh3 nosetests OrderedDict OrderedDicts diff --git a/docs/cli.md b/docs/cli.md index 50e9ec2d1..fe415e58b 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -35,12 +35,19 @@ For example: echo "Some **Markdown** text." | python -m markdown > output.html ``` -Use the `--help` option for a list all available options and arguments: +Use the `--help` option for a list of all available options and arguments: ```bash python -m markdown --help ``` +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output](sanitization.md). + If you don't want to call the python executable directly (using the `-m` flag), follow the instructions below to use a wrapper script: diff --git a/docs/reference.md b/docs/reference.md index 9eeb5d8d5..7edb54dd1 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -31,36 +31,10 @@ method appropriately ([see below](#convert)). The Python-Markdown library does ***not*** sanitize its HTML output. If you are processing Markdown input from an untrusted source, it is your - responsibility to ensure that it is properly sanitized. See [Markdown and - XSS] for an overview of some of the dangers and [Improper markup - sanitization in popular software] for notes on best practices to ensure - HTML is properly sanitized. - - The developers of Python-Markdown recommend using [JustHTML] as a - sanitizer on the output of `markdown.markdown`. JustHTML includes a - built-in HTML sanitizer. When you pass the HTML output through JustHTML - (`JustHTML(markdown.markdown(text), fragment=True).to_html())`), it - is sanitized by default according to a strict [allow list policy]. The - policy can be [customized] if necessary. - - If you cannot use JustHTML for some reason, some alternatives include - [`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those - libraries will not be sufficient in themselves and will require - customization. Some useful lists of allowed tags and attributes can be - found in the [`bleach-allowlist`][bleach-allowlist] library, which should - work with either sanitizer. - - -[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/ -[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md -[JustHTML]: https://emilstenstrom.github.io/justhtml/ -[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy -[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy -[nh3]: https://nh3.readthedocs.io/en/latest/ -[bleach]: http://bleach.readthedocs.org/en/latest/ -[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist -[^1]: Note that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). -However, it may be the only option for some users. + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output]. + +[Sanitizing HTML Output]: sanitization.md The following options are available on the `markdown.markdown` function: @@ -216,17 +190,12 @@ __tab_length__{: #tab_length }: !!! warning - The Python-Markdown library does ***not*** sanitize its HTML output. If - you are processing Markdown input from an untrusted source, it is your - responsibility to ensure that it is properly sanitized. See [Markdown and - XSS] for an overview of some of the dangers and [Improper markup - sanitization in popular software] for notes on best practices to ensure - HTML is properly sanitized. - - As `markdown.markdownFromFile` writes directly to the file system, there - is no easy way to sanitize the output from Python code. Therefore, it is + The Python-Markdown library does ***not*** sanitize its HTML output. As + `markdown.markdownFromFile` writes directly to the file system, there is + no easy way to sanitize the output from Python code. Therefore, it is recommended that the `markdown.markdownFromFile` function not be used on - input from an untrusted source. + input from an untrusted source. For more information see [Sanitizing HTML + Output]. With a few exceptions, `markdown.markdownFromFile` accepts the same options as `markdown.markdown`. It does **not** accept a `text` (or Unicode) string. @@ -284,24 +253,8 @@ string must be passed to one of two instance methods. The Python-Markdown library does ***not*** sanitize its HTML output. If you are processing Markdown input from an untrusted source, it is your - responsibility to ensure that it is properly sanitized. See [Markdown and - XSS] for an overview of some of the dangers and [Improper markup - sanitization in popular software] for notes on best practices to ensure - HTML is properly sanitized. - - The developers of Python-Markdown recommend using [JustHTML] as a - sanitizer on the output of `Markdown.convert`. JustHTML includes a - built-in HTML sanitizer. When you pass the HTML output through JustHTML - (`JustHTML(md.convert(text), fragment=True).to_html())`), it - is sanitized by default according to a strict [allow list policy]. The - policy can be [customized] if necessary. - - If you cannot use JustHTML for some reason, some alternatives include - [`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those - libraries will not be sufficient in themselves and will require - customization. Some useful lists of allowed tags and attributes can be - found in the [`bleach-allowlist`][bleach-allowlist] library, which should - work with either sanitizer. + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output]. The `source` text must meet the same requirements as the [`text`](#text) argument of the [`markdown.markdown`](#markdown) function. @@ -334,17 +287,12 @@ html3 = md.reset().convert(text3) !!! warning - The Python-Markdown library does ***not*** sanitize its HTML output. If - you are processing Markdown input from an untrusted source, it is your - responsibility to ensure that it is properly sanitized. See [Markdown and - XSS] for an overview of some of the dangers and [Improper markup - sanitization in popular software] for notes on best practices to ensure - HTML is properly sanitized. - - As `Markdown.convertFile` writes directly to the file system, there - is no easy way to sanitize the output from Python code. Therefore, it is - recommended that the `Markdown.convertFile` method not be used on - input from an untrusted source. + The Python-Markdown library does ***not*** sanitize its HTML output. As + `Markdown.convertFile` writes directly to the file system, there is no + easy way to sanitize the output from Python code. Therefore, it is + recommended that the `Markdown.convertFile` method not be used on input + from an untrusted source. For more information see [Sanitizing HTML + Output]. The arguments of this method are identical to the arguments of the same name on the `markdown.markdownFromFile` function ([`input`](#input), diff --git a/docs/sanitization.md b/docs/sanitization.md new file mode 100644 index 000000000..bfa99e1a4 --- /dev/null +++ b/docs/sanitization.md @@ -0,0 +1,76 @@ +title: Sanitization and Security + +# Sanitizing HTML Output + +The Python-Markdown library does ***not*** sanitize its HTML output. If you +are processing Markdown input from an untrusted source, it is your +responsibility to ensure that it is properly sanitized. See _[Markdown and +XSS]_ for an overview of some of the dangers and _[Improper markup sanitization +in popular software]_ for notes on best practices to ensure HTML is properly +sanitized. With those concerns in mind, some recommendations are provided +below to ensure that any input from an untrusted source is properly +sanitized. + +That said, if you fully trust the source of your input, you may choose to do +nothing. Conversely, you may find solutions other than those suggested here. +However, you do so at your own risk. + +## Using JustHTML + +[JustHTML] is recommended as a sanitizer on the output of `markdown.markdown` +or `Markdown.convert`. When you pass HTML output through JustHTML, it is +sanitized by default according to a strict [allow list policy]. The policy +can be [customized] if necessary. + +``` python +import markdown +from justhtml import JustHTML + +html = markdown.markdown(text) +safe_html = JustHTML(html, fragment=True).to_html() +``` + +## Using nh3 or bleach + +If you cannot use JustHTML for some reason, some alternatives include [nh3] or +[bleach][^1]. However, be aware that these libraries will not be sufficient +in themselves and will require customization. Some useful lists of allowed +tags and attributes can be found in the [`bleach-allowlist`] +[bleach-allowlist] library, which should work with both nh3 and bleach as nh3 +mirrors bleach's API. + +``` python +import markdown +import bleach +from bleach_allowlist import markdown_tags, markdown_attrs + +html = markdown.markdown(text) +safe_html = bleach.clean(html, markdown_tags, markdown_attrs) +``` + +[^1]: The [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). +However, it may be the only option for some users as [nh3] is a set of Python bindings to a Rust library. + +## Sanitizing on the Command Line + +Both Python-Markdown and JustHTML provide command line interfaces which read +from STDIN and write to STDOUT. Therefore, they can be used togeather to +ensure that the output from untrusted input is properly sanitized. + +```sh +echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > safe_output.html +``` + +For more information on JustHTML's Command Line Interface, see the +[documentation][JustHTML_CLI]. Use the `--help` option for a list of all available +options and arguments to the `markdown` command. + +[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/ +[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md +[JustHTML]: https://emilstenstrom.github.io/justhtml/ +[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy +[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy +[nh3]: https://nh3.readthedocs.io/en/latest/ +[bleach]: http://bleach.readthedocs.org/en/latest/ +[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist +[JustHTML_CLI]: https://emilstenstrom.github.io/justhtml/cli.html diff --git a/markdown/__main__.py b/markdown/__main__.py index 259df6336..60f9a5e85 100644 --- a/markdown/__main__.py +++ b/markdown/__main__.py @@ -49,10 +49,14 @@ def parse_options(args=None, values=None): usage = """%prog [options] [INPUTFILE] (STDIN is assumed if no INPUTFILE is given)""" desc = "A Python implementation of John Gruber's Markdown. " \ - "https://Python-Markdown.github.io/" + "https://python-markdown.github.io/" ver = "%%prog %s" % markdown.__version__ + epilog = "WARNING: The Python-Markdown library does NOT sanitize its HTML output. If " \ + "you are processing Markdown input from an untrusted source, it is your " \ + "responsibility to ensure that it is properly sanitized. For more " \ + "information see ." - parser = optparse.OptionParser(usage=usage, description=desc, version=ver) + parser = optparse.OptionParser(usage=usage, description=desc, version=ver, epilog=epilog) parser.add_option("-f", "--file", dest="filename", default=None, help="Write output to OUTPUT_FILE. Defaults to STDOUT.", metavar="OUTPUT_FILE") diff --git a/mkdocs.yml b/mkdocs.yml index 92f6ccc80..0458dd4e1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -22,6 +22,7 @@ nav: - Installation: install.md - Library Reference: reference.md - Command Line: cli.md + - Sanitization and Security: sanitization.md - Extensions: extensions/index.md - Officially Supported Extensions: - Abbreviations: extensions/abbreviations.md From 64917dcc6f328e61ad898062bf4febb6b96a3bf5 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 14:39:38 -0500 Subject: [PATCH 07/11] Add warnings to API docs --- markdown/core.py | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/markdown/core.py b/markdown/core.py index 11cb5adc9..ce6edf1ee 100644 --- a/markdown/core.py +++ b/markdown/core.py @@ -335,6 +335,12 @@ def convert(self, source: str) -> str: [`ElementTree`][xml.etree.ElementTree.ElementTree] object has been serialized into text. 5. The output is returned as a string. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + If you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output](../../sanitization.md). + """ # Fix up the source text @@ -392,9 +398,9 @@ def convertFile( encoding: str | None = None, ) -> Markdown: """ - Converts a Markdown file and returns the HTML as a Unicode string. + Read Markdown text from a file or stream and write HTML output to a file or stream. - Decodes the file using the provided encoding (defaults to `utf-8`), + Decodes the input file using the provided encoding (defaults to `utf-8`), passes the file content to markdown, and outputs the HTML to either the provided stream or the file with provided name, using the same encoding as the source file. The @@ -410,6 +416,14 @@ def convertFile( output: File object or path. Writes to `stdout` if `None`. encoding: Encoding of input and output files. Defaults to `utf-8`. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + As `Markdown.convertFile` writes directly to the file system, there is no + easy way to sanitize the output from Python code. Therefore, it is + recommended that the `Markdown.convertFile` method not be used on input + from an untrusted source. For more information see [Sanitizing HTML + Output](sanitization.md). + """ encoding = encoding or "utf-8" @@ -477,6 +491,12 @@ def markdown(text: str, **kwargs: Any) -> str: Returns: A string in the specified output format. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + If you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output](sanitization.md). + """ md = Markdown(**kwargs) return md.convert(text) @@ -484,7 +504,7 @@ def markdown(text: str, **kwargs: Any) -> str: def markdownFromFile(**kwargs: Any): """ - Read Markdown text from a file and write output to a file or a stream. + Read Markdown text from a file or stream and write HTML output to a file or stream. This is a shortcut function which initializes an instance of [`Markdown`][markdown.Markdown], and calls the [`convertFile`][markdown.Markdown.convertFile] method rather than @@ -496,6 +516,14 @@ def markdownFromFile(**kwargs: Any): encoding (str): Encoding of input and output. **kwargs: Any arguments accepted by the `Markdown` class. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + As `markdown.markdownFromFile` writes directly to the file system, there is no + easy way to sanitize the output from Python code. Therefore, it is + recommended that the `markdown.markdownFromFile` function not be used on input + from an untrusted source. For more information see [Sanitizing HTML + Output](sanitization.md). + """ md = Markdown(**kwargs) md.convertFile(kwargs.get('input', None), From 9804ddf364b9b6b464b73b2c3bd73f16aa213774 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 15:15:48 -0500 Subject: [PATCH 08/11] cleanup --- docs/sanitization.md | 2 +- markdown/core.py | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sanitization.md b/docs/sanitization.md index bfa99e1a4..65ad80c98 100644 --- a/docs/sanitization.md +++ b/docs/sanitization.md @@ -54,7 +54,7 @@ However, it may be the only option for some users as [nh3] is a set of Python bi ## Sanitizing on the Command Line Both Python-Markdown and JustHTML provide command line interfaces which read -from STDIN and write to STDOUT. Therefore, they can be used togeather to +from STDIN and write to STDOUT. Therefore, they can be used together to ensure that the output from untrusted input is properly sanitized. ```sh diff --git a/markdown/core.py b/markdown/core.py index ce6edf1ee..370cb7ec5 100644 --- a/markdown/core.py +++ b/markdown/core.py @@ -422,7 +422,7 @@ def convertFile( easy way to sanitize the output from Python code. Therefore, it is recommended that the `Markdown.convertFile` method not be used on input from an untrusted source. For more information see [Sanitizing HTML - Output](sanitization.md). + Output](../../sanitization.md). """ @@ -495,7 +495,7 @@ def markdown(text: str, **kwargs: Any) -> str: The Python-Markdown library does ***not*** sanitize its HTML output. If you are processing Markdown input from an untrusted source, it is your responsibility to ensure that it is properly sanitized. For more - information see [Sanitizing HTML Output](sanitization.md). + information see [Sanitizing HTML Output](../../sanitization.md). """ md = Markdown(**kwargs) @@ -522,7 +522,7 @@ def markdownFromFile(**kwargs: Any): easy way to sanitize the output from Python code. Therefore, it is recommended that the `markdown.markdownFromFile` function not be used on input from an untrusted source. For more information see [Sanitizing HTML - Output](sanitization.md). + Output](../../sanitization.md). """ md = Markdown(**kwargs) From 46096dd28027e3be040257a308516704cb281893 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 15:39:56 -0500 Subject: [PATCH 09/11] more cleanup --- docs/sanitization.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/sanitization.md b/docs/sanitization.md index 65ad80c98..fe5af6a6f 100644 --- a/docs/sanitization.md +++ b/docs/sanitization.md @@ -15,10 +15,10 @@ That said, if you fully trust the source of your input, you may choose to do nothing. Conversely, you may find solutions other than those suggested here. However, you do so at your own risk. -## Using JustHTML +## Using `JustHTML` -[JustHTML] is recommended as a sanitizer on the output of `markdown.markdown` -or `Markdown.convert`. When you pass HTML output through JustHTML, it is +[`JustHTML`][JustHTML] is recommended as a sanitizer on the output of `markdown.markdown` +or `Markdown.convert`. When you pass HTML output through `JustHTML`, it is sanitized by default according to a strict [allow list policy]. The policy can be [customized] if necessary. @@ -30,14 +30,14 @@ html = markdown.markdown(text) safe_html = JustHTML(html, fragment=True).to_html() ``` -## Using nh3 or bleach +## Using `nh3` or `bleach` -If you cannot use JustHTML for some reason, some alternatives include [nh3] or -[bleach][^1]. However, be aware that these libraries will not be sufficient +If you cannot use `JustHTML` for some reason, some alternatives include [`nh3`][nh3] or +[`bleach`][bleach][^1]. However, be aware that these libraries will not be sufficient in themselves and will require customization. Some useful lists of allowed tags and attributes can be found in the [`bleach-allowlist`] -[bleach-allowlist] library, which should work with both nh3 and bleach as nh3 -mirrors bleach's API. +[bleach-allowlist] library, which should work with both `nh3` and `bleach` as `nh3` +mirrors `bleach`'s API. ``` python import markdown @@ -48,20 +48,20 @@ html = markdown.markdown(text) safe_html = bleach.clean(html, markdown_tags, markdown_attrs) ``` -[^1]: The [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). -However, it may be the only option for some users as [nh3] is a set of Python bindings to a Rust library. +[^1]: The [`bleach`][bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). +However, it may be the only option for some users as `nh3` is a set of Python bindings to a Rust library. ## Sanitizing on the Command Line -Both Python-Markdown and JustHTML provide command line interfaces which read -from STDIN and write to STDOUT. Therefore, they can be used together to +Both Python-Markdown and `JustHTML` provide command line interfaces which read +from `STDIN` and write to `STDOUT`. Therefore, they can be used together to ensure that the output from untrusted input is properly sanitized. ```sh echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > safe_output.html ``` -For more information on JustHTML's Command Line Interface, see the +For more information on `JustHTML`'s Command Line Interface, see the [documentation][JustHTML_CLI]. Use the `--help` option for a list of all available options and arguments to the `markdown` command. From aca9a939d5f6fc0898d2a361affa58726d6ab613 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 16:20:24 -0500 Subject: [PATCH 10/11] Reorganize cli.md for clarity --- .spell-dict | 1 - docs/cli.md | 172 +++++++++++++++++++++++++++++++--------------------- 2 files changed, 104 insertions(+), 69 deletions(-) diff --git a/.spell-dict b/.spell-dict index f8278dc8b..bba4338fe 100644 --- a/.spell-dict +++ b/.spell-dict @@ -78,7 +78,6 @@ munge namespace NanoDOM Neale -nh3 nosetests OrderedDict OrderedDicts diff --git a/docs/cli.md b/docs/cli.md index fe415e58b..5bbe5b4af 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -12,6 +12,8 @@ Generally, you will want to have the Markdown library fully installed on your system to run the command line script. See the [Installation instructions](install.md) for details. +## Basic Usage + Python-Markdown's command line script takes advantage of Python's `-m` flag. Therefore, assuming the python executable is on your system path, use the following format: @@ -28,13 +30,6 @@ At its most basic usage, one would simply pass in a file name as the only argume python -m markdown input_file.txt ``` -Piping input and output (on `STDIN` and `STDOUT`) is fully supported as well. -For example: - -```bash -echo "Some **Markdown** text." | python -m markdown > output.html -``` - Use the `--help` option for a list of all available options and arguments: ```bash @@ -48,79 +43,49 @@ python -m markdown --help responsibility to ensure that it is properly sanitized. For more information see [Sanitizing HTML Output](sanitization.md). -If you don't want to call the python executable directly (using the `-m` flag), -follow the instructions below to use a wrapper script: - -Setup ------ - -Upon installation, the `markdown_py` script will have been copied to -your Python "Scripts" directory. Different systems require different methods to -ensure that any files in the Python "Scripts" directory are on your system -path. - -* **Windows**: - - Assuming a default install of Python on Windows, your "Scripts" directory - is most likely something like `C:\\Python37\Scripts`. Verify the location - of your "Scripts" directory and add it to you system path. - - Calling `markdown_py` from the command line will call the wrapper batch - file `markdown_py.bat` in the `"Scripts"` directory created during install. - -* __*nix__ (Linux, OSX, BSD, Unix, etc.): - - As each \*nix distribution is different and we can't possibly document all - of them here, we'll provide a few helpful pointers: - - * Some systems will automatically install the script on your path. Try it - and see if it works. Just run `markdown_py` from the command line. - - * Other systems may maintain a separate "Scripts" ("bin") directory which - you need to add to your path. Find it (check with your distribution) and - either add it to your path or make a symbolic link to it from your path. - - * If you are sure `markdown_py` is on your path, but it still is not being - found, check the permissions of the file and make sure it is executable. - - As an alternative, you could just `cd` into the directory which contains - the source distribution, and run it from there. However, remember that your - markdown text files will not likely be in that directory, so it is much - more convenient to have `markdown_py` on your path. - -!!!Note - Python-Markdown uses `"markdown_py"` as a script name because the Perl - implementation has already taken the more obvious name "markdown". - Additionally, the default Python configuration on some systems would cause a - script named `"markdown.py"` to fail by importing itself rather than the - markdown library. Therefore, the script has been named `"markdown_py"` as a - compromise. If you prefer a different name for the script on your system, it - is suggested that you create a symbolic link to `markdown_py` with your - preferred name. - -Usage ------ +## Piping Input and Output -To use `markdown_py` from the command line, run it as +Piping input and output (on `STDIN` and `STDOUT`) is fully supported. +For example: ```bash -markdown_py input_file.txt +echo "Some **Markdown** text." | python -m markdown > output.html ``` -or +The above command would generate a file named `output.html` with the following content: +```html +

Some Markdown Text.

+``` + +As Python-Markdown only ever outputs HTML fragments (no ``, ``, +and `` tags), it is generally expected that the command line interface +will always be used to pipe output to a templating engine. In the event that +no additional content is needed and the output only needs to be wrapped in +otherwise empty ``, ``, and `` tags, +[JustHTML](https://emilstenstrom.github.io/justhtml/) can do that with with +a single command: ```bash -markdown_py input_file.txt > output_file.html +echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > output.html ``` -For a complete list of options, run +The above command would generate a file named `output.html` with the following content: -```bash -markdown_py --help +```html + + + +

Some Markdown Text.

+ + ``` -Using Extensions ----------------- +If you don't need or want JustHTML's HTML sanitation, you can disable it with the +`--unsafe` flag, although that is not recommended. See JustHTML's +[Command Line Interface](https://emilstenstrom.github.io/justhtml/cli.html) +documentation for details. + +## Using Extensions To load a Python-Markdown extension from the command line use the `-x` (or `--extension`) option. The extension module must be on your `PYTHONPATH` @@ -194,3 +159,74 @@ dependencies. The format of your configuration file is automatically detected. [JSON]: https://json.org/ [PyYAML]: https://pyyaml.org/ [2.5 release notes]: change_log/release-2.5.md + +## Using the `markdown_py` Command + +If you don't want to call the python executable directly (using the `-m` flag), +follow the instructions below to use a wrapper script: + +### Setup `markdown_py` + +Upon installation, the `markdown_py` script will have been copied to +your Python "Scripts" directory. Different systems require different methods to +ensure that any files in the Python "Scripts" directory are on your system +path. + +* **Windows**: + + Assuming a default install of Python on Windows, your "Scripts" directory + is most likely something like `C:\\Python37\Scripts`. Verify the location + of your "Scripts" directory and add it to you system path. + + Calling `markdown_py` from the command line will call the wrapper batch + file `markdown_py.bat` in the `"Scripts"` directory created during install. + +* __*nix__ (Linux, OSX, BSD, Unix, etc.): + + As each \*nix distribution is different and we can't possibly document all + of them here, we'll provide a few helpful pointers: + + * Some systems will automatically install the script on your path. Try it + and see if it works. Just run `markdown_py` from the command line. + + * Other systems may maintain a separate "Scripts" ("bin") directory which + you need to add to your path. Find it (check with your distribution) and + either add it to your path or make a symbolic link to it from your path. + + * If you are sure `markdown_py` is on your path, but it still is not being + found, check the permissions of the file and make sure it is executable. + + As an alternative, you could just `cd` into the directory which contains + the source distribution, and run it from there. However, remember that your + markdown text files will not likely be in that directory, so it is much + more convenient to have `markdown_py` on your path. + +!!!Note + Python-Markdown uses `"markdown_py"` as a script name because the Perl + implementation has already taken the more obvious name "markdown". + Additionally, the default Python configuration on some systems would cause a + script named `"markdown.py"` to fail by importing itself rather than the + markdown library. Therefore, the script has been named `"markdown_py"` as a + compromise. If you prefer a different name for the script on your system, it + is suggested that you create a symbolic link to `markdown_py` with your + preferred name. + +### Using `markdown_py` + +To use `markdown_py` from the command line, run it as + +```bash +markdown_py input_file.txt +``` + +or + +```bash +markdown_py input_file.txt > output_file.html +``` + +For a complete list of options, run + +```bash +markdown_py --help +``` From b47ccd7c5ceea41a529081da79a60339fa038171 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Thu, 5 Feb 2026 16:23:30 -0500 Subject: [PATCH 11/11] spelling; --- .spell-dict | 1 + 1 file changed, 1 insertion(+) diff --git a/.spell-dict b/.spell-dict index bba4338fe..7e5171f54 100644 --- a/.spell-dict +++ b/.spell-dict @@ -137,6 +137,7 @@ svn Swartz Szakmeister Takhteyev +templating Tiago toc tokenized