diff --git a/.spell-dict b/.spell-dict index 51a6a3270..7e5171f54 100644 --- a/.spell-dict +++ b/.spell-dict @@ -54,6 +54,7 @@ implementers InlineProcessor Jiryu JSON +JustHTML keepachangelog Kjell Krech @@ -111,6 +112,7 @@ rST ryneeverett sanitizer sanitizers +sanitization Sauder schemeless setuptools @@ -135,6 +137,7 @@ svn Swartz Szakmeister Takhteyev +templating Tiago toc tokenized @@ -168,6 +171,7 @@ workflow Xanthakis XHTML xhtml +XSS YAML Yunusov inline diff --git a/docs/cli.md b/docs/cli.md index 50e9ec2d1..5bbe5b4af 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -12,6 +12,8 @@ Generally, you will want to have the Markdown library fully installed on your system to run the command line script. See the [Installation instructions](install.md) for details. +## Basic Usage + Python-Markdown's command line script takes advantage of Python's `-m` flag. Therefore, assuming the python executable is on your system path, use the following format: @@ -28,92 +30,62 @@ At its most basic usage, one would simply pass in a file name as the only argume python -m markdown input_file.txt ``` -Piping input and output (on `STDIN` and `STDOUT`) is fully supported as well. -For example: - -```bash -echo "Some **Markdown** text." | python -m markdown > output.html -``` - -Use the `--help` option for a list all available options and arguments: +Use the `--help` option for a list of all available options and arguments: ```bash python -m markdown --help ``` -If you don't want to call the python executable directly (using the `-m` flag), -follow the instructions below to use a wrapper script: - -Setup ------ - -Upon installation, the `markdown_py` script will have been copied to -your Python "Scripts" directory. Different systems require different methods to -ensure that any files in the Python "Scripts" directory are on your system -path. - -* **Windows**: - - Assuming a default install of Python on Windows, your "Scripts" directory - is most likely something like `C:\\Python37\Scripts`. Verify the location - of your "Scripts" directory and add it to you system path. +!!! warning - Calling `markdown_py` from the command line will call the wrapper batch - file `markdown_py.bat` in the `"Scripts"` directory created during install. - -* __*nix__ (Linux, OSX, BSD, Unix, etc.): + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output](sanitization.md). - As each \*nix distribution is different and we can't possibly document all - of them here, we'll provide a few helpful pointers: - - * Some systems will automatically install the script on your path. Try it - and see if it works. Just run `markdown_py` from the command line. - - * Other systems may maintain a separate "Scripts" ("bin") directory which - you need to add to your path. Find it (check with your distribution) and - either add it to your path or make a symbolic link to it from your path. +## Piping Input and Output - * If you are sure `markdown_py` is on your path, but it still is not being - found, check the permissions of the file and make sure it is executable. - - As an alternative, you could just `cd` into the directory which contains - the source distribution, and run it from there. However, remember that your - markdown text files will not likely be in that directory, so it is much - more convenient to have `markdown_py` on your path. - -!!!Note - Python-Markdown uses `"markdown_py"` as a script name because the Perl - implementation has already taken the more obvious name "markdown". - Additionally, the default Python configuration on some systems would cause a - script named `"markdown.py"` to fail by importing itself rather than the - markdown library. Therefore, the script has been named `"markdown_py"` as a - compromise. If you prefer a different name for the script on your system, it - is suggested that you create a symbolic link to `markdown_py` with your - preferred name. - -Usage ------ - -To use `markdown_py` from the command line, run it as +Piping input and output (on `STDIN` and `STDOUT`) is fully supported. +For example: ```bash -markdown_py input_file.txt +echo "Some **Markdown** text." | python -m markdown > output.html ``` -or +The above command would generate a file named `output.html` with the following content: +```html +

Some Markdown Text.

+``` + +As Python-Markdown only ever outputs HTML fragments (no ``, ``, +and `` tags), it is generally expected that the command line interface +will always be used to pipe output to a templating engine. In the event that +no additional content is needed and the output only needs to be wrapped in +otherwise empty ``, ``, and `` tags, +[JustHTML](https://emilstenstrom.github.io/justhtml/) can do that with with +a single command: ```bash -markdown_py input_file.txt > output_file.html +echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > output.html ``` -For a complete list of options, run +The above command would generate a file named `output.html` with the following content: -```bash -markdown_py --help +```html + + + +

Some Markdown Text.

+ + ``` -Using Extensions ----------------- +If you don't need or want JustHTML's HTML sanitation, you can disable it with the +`--unsafe` flag, although that is not recommended. See JustHTML's +[Command Line Interface](https://emilstenstrom.github.io/justhtml/cli.html) +documentation for details. + +## Using Extensions To load a Python-Markdown extension from the command line use the `-x` (or `--extension`) option. The extension module must be on your `PYTHONPATH` @@ -187,3 +159,74 @@ dependencies. The format of your configuration file is automatically detected. [JSON]: https://json.org/ [PyYAML]: https://pyyaml.org/ [2.5 release notes]: change_log/release-2.5.md + +## Using the `markdown_py` Command + +If you don't want to call the python executable directly (using the `-m` flag), +follow the instructions below to use a wrapper script: + +### Setup `markdown_py` + +Upon installation, the `markdown_py` script will have been copied to +your Python "Scripts" directory. Different systems require different methods to +ensure that any files in the Python "Scripts" directory are on your system +path. + +* **Windows**: + + Assuming a default install of Python on Windows, your "Scripts" directory + is most likely something like `C:\\Python37\Scripts`. Verify the location + of your "Scripts" directory and add it to you system path. + + Calling `markdown_py` from the command line will call the wrapper batch + file `markdown_py.bat` in the `"Scripts"` directory created during install. + +* __*nix__ (Linux, OSX, BSD, Unix, etc.): + + As each \*nix distribution is different and we can't possibly document all + of them here, we'll provide a few helpful pointers: + + * Some systems will automatically install the script on your path. Try it + and see if it works. Just run `markdown_py` from the command line. + + * Other systems may maintain a separate "Scripts" ("bin") directory which + you need to add to your path. Find it (check with your distribution) and + either add it to your path or make a symbolic link to it from your path. + + * If you are sure `markdown_py` is on your path, but it still is not being + found, check the permissions of the file and make sure it is executable. + + As an alternative, you could just `cd` into the directory which contains + the source distribution, and run it from there. However, remember that your + markdown text files will not likely be in that directory, so it is much + more convenient to have `markdown_py` on your path. + +!!!Note + Python-Markdown uses `"markdown_py"` as a script name because the Perl + implementation has already taken the more obvious name "markdown". + Additionally, the default Python configuration on some systems would cause a + script named `"markdown.py"` to fail by importing itself rather than the + markdown library. Therefore, the script has been named `"markdown_py"` as a + compromise. If you prefer a different name for the script on your system, it + is suggested that you create a symbolic link to `markdown_py` with your + preferred name. + +### Using `markdown_py` + +To use `markdown_py` from the command line, run it as + +```bash +markdown_py input_file.txt +``` + +or + +```bash +markdown_py input_file.txt > output_file.html +``` + +For a complete list of options, run + +```bash +markdown_py --help +``` diff --git a/docs/reference.md b/docs/reference.md index de7e26f4b..7edb54dd1 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -25,7 +25,16 @@ instance of the `markdown.Markdown` class and pass multiple documents through it. If you do use a single instance though, make sure to call the `reset` method appropriately ([see below](#convert)). -### markdown.markdown(text [, **kwargs]) {: #markdown data-toc-label='markdown.markdown' } +### `markdown.markdown(text [, **kwargs])` {: #markdown data-toc-label='markdown.markdown' } + +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output]. + +[Sanitizing HTML Output]: sanitization.md The following options are available on the `markdown.markdown` function: @@ -179,6 +188,15 @@ __tab_length__{: #tab_length }: ### `markdown.markdownFromFile (**kwargs)` {: #markdownFromFile data-toc-label='markdown.markdownFromFile' } +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. As + `markdown.markdownFromFile` writes directly to the file system, there is + no easy way to sanitize the output from Python code. Therefore, it is + recommended that the `markdown.markdownFromFile` function not be used on + input from an untrusted source. For more information see [Sanitizing HTML + Output]. + With a few exceptions, `markdown.markdownFromFile` accepts the same options as `markdown.markdown`. It does **not** accept a `text` (or Unicode) string. Instead, it accepts the following required options: @@ -216,7 +234,7 @@ __encoding__{: #encoding } meet your specific needs, it is suggested that you write your own code to handle your encoding/decoding needs. -### markdown.Markdown([**kwargs]) {: #Markdown data-toc-label='markdown.Markdown' } +### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' } The same options are available when initializing the `markdown.Markdown` class as on the [`markdown.markdown`](#markdown) function, except that the class does @@ -229,7 +247,14 @@ string must be passed to one of two instance methods. the thread they were created in. A single instance should not be accessed from multiple threads. -#### Markdown.convert(source) {: #convert data-toc-label='Markdown.convert' } +#### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' } + +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. If + you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output]. The `source` text must meet the same requirements as the [`text`](#text) argument of the [`markdown.markdown`](#markdown) function. @@ -258,7 +283,16 @@ To make this easier, you can also chain calls to `reset` together: html3 = md.reset().convert(text3) ``` -#### Markdown.convertFile(**kwargs) {: #convertFile data-toc-label='Markdown.convertFile' } +#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' } + +!!! warning + + The Python-Markdown library does ***not*** sanitize its HTML output. As + `Markdown.convertFile` writes directly to the file system, there is no + easy way to sanitize the output from Python code. Therefore, it is + recommended that the `Markdown.convertFile` method not be used on input + from an untrusted source. For more information see [Sanitizing HTML + Output]. The arguments of this method are identical to the arguments of the same name on the `markdown.markdownFromFile` function ([`input`](#input), diff --git a/docs/sanitization.md b/docs/sanitization.md new file mode 100644 index 000000000..fe5af6a6f --- /dev/null +++ b/docs/sanitization.md @@ -0,0 +1,76 @@ +title: Sanitization and Security + +# Sanitizing HTML Output + +The Python-Markdown library does ***not*** sanitize its HTML output. If you +are processing Markdown input from an untrusted source, it is your +responsibility to ensure that it is properly sanitized. See _[Markdown and +XSS]_ for an overview of some of the dangers and _[Improper markup sanitization +in popular software]_ for notes on best practices to ensure HTML is properly +sanitized. With those concerns in mind, some recommendations are provided +below to ensure that any input from an untrusted source is properly +sanitized. + +That said, if you fully trust the source of your input, you may choose to do +nothing. Conversely, you may find solutions other than those suggested here. +However, you do so at your own risk. + +## Using `JustHTML` + +[`JustHTML`][JustHTML] is recommended as a sanitizer on the output of `markdown.markdown` +or `Markdown.convert`. When you pass HTML output through `JustHTML`, it is +sanitized by default according to a strict [allow list policy]. The policy +can be [customized] if necessary. + +``` python +import markdown +from justhtml import JustHTML + +html = markdown.markdown(text) +safe_html = JustHTML(html, fragment=True).to_html() +``` + +## Using `nh3` or `bleach` + +If you cannot use `JustHTML` for some reason, some alternatives include [`nh3`][nh3] or +[`bleach`][bleach][^1]. However, be aware that these libraries will not be sufficient +in themselves and will require customization. Some useful lists of allowed +tags and attributes can be found in the [`bleach-allowlist`] +[bleach-allowlist] library, which should work with both `nh3` and `bleach` as `nh3` +mirrors `bleach`'s API. + +``` python +import markdown +import bleach +from bleach_allowlist import markdown_tags, markdown_attrs + +html = markdown.markdown(text) +safe_html = bleach.clean(html, markdown_tags, markdown_attrs) +``` + +[^1]: The [`bleach`][bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698). +However, it may be the only option for some users as `nh3` is a set of Python bindings to a Rust library. + +## Sanitizing on the Command Line + +Both Python-Markdown and `JustHTML` provide command line interfaces which read +from `STDIN` and write to `STDOUT`. Therefore, they can be used together to +ensure that the output from untrusted input is properly sanitized. + +```sh +echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > safe_output.html +``` + +For more information on `JustHTML`'s Command Line Interface, see the +[documentation][JustHTML_CLI]. Use the `--help` option for a list of all available +options and arguments to the `markdown` command. + +[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/ +[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md +[JustHTML]: https://emilstenstrom.github.io/justhtml/ +[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy +[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy +[nh3]: https://nh3.readthedocs.io/en/latest/ +[bleach]: http://bleach.readthedocs.org/en/latest/ +[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist +[JustHTML_CLI]: https://emilstenstrom.github.io/justhtml/cli.html diff --git a/markdown/__main__.py b/markdown/__main__.py index 259df6336..60f9a5e85 100644 --- a/markdown/__main__.py +++ b/markdown/__main__.py @@ -49,10 +49,14 @@ def parse_options(args=None, values=None): usage = """%prog [options] [INPUTFILE] (STDIN is assumed if no INPUTFILE is given)""" desc = "A Python implementation of John Gruber's Markdown. " \ - "https://Python-Markdown.github.io/" + "https://python-markdown.github.io/" ver = "%%prog %s" % markdown.__version__ + epilog = "WARNING: The Python-Markdown library does NOT sanitize its HTML output. If " \ + "you are processing Markdown input from an untrusted source, it is your " \ + "responsibility to ensure that it is properly sanitized. For more " \ + "information see ." - parser = optparse.OptionParser(usage=usage, description=desc, version=ver) + parser = optparse.OptionParser(usage=usage, description=desc, version=ver, epilog=epilog) parser.add_option("-f", "--file", dest="filename", default=None, help="Write output to OUTPUT_FILE. Defaults to STDOUT.", metavar="OUTPUT_FILE") diff --git a/markdown/core.py b/markdown/core.py index 11cb5adc9..370cb7ec5 100644 --- a/markdown/core.py +++ b/markdown/core.py @@ -335,6 +335,12 @@ def convert(self, source: str) -> str: [`ElementTree`][xml.etree.ElementTree.ElementTree] object has been serialized into text. 5. The output is returned as a string. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + If you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output](../../sanitization.md). + """ # Fix up the source text @@ -392,9 +398,9 @@ def convertFile( encoding: str | None = None, ) -> Markdown: """ - Converts a Markdown file and returns the HTML as a Unicode string. + Read Markdown text from a file or stream and write HTML output to a file or stream. - Decodes the file using the provided encoding (defaults to `utf-8`), + Decodes the input file using the provided encoding (defaults to `utf-8`), passes the file content to markdown, and outputs the HTML to either the provided stream or the file with provided name, using the same encoding as the source file. The @@ -410,6 +416,14 @@ def convertFile( output: File object or path. Writes to `stdout` if `None`. encoding: Encoding of input and output files. Defaults to `utf-8`. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + As `Markdown.convertFile` writes directly to the file system, there is no + easy way to sanitize the output from Python code. Therefore, it is + recommended that the `Markdown.convertFile` method not be used on input + from an untrusted source. For more information see [Sanitizing HTML + Output](../../sanitization.md). + """ encoding = encoding or "utf-8" @@ -477,6 +491,12 @@ def markdown(text: str, **kwargs: Any) -> str: Returns: A string in the specified output format. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + If you are processing Markdown input from an untrusted source, it is your + responsibility to ensure that it is properly sanitized. For more + information see [Sanitizing HTML Output](../../sanitization.md). + """ md = Markdown(**kwargs) return md.convert(text) @@ -484,7 +504,7 @@ def markdown(text: str, **kwargs: Any) -> str: def markdownFromFile(**kwargs: Any): """ - Read Markdown text from a file and write output to a file or a stream. + Read Markdown text from a file or stream and write HTML output to a file or stream. This is a shortcut function which initializes an instance of [`Markdown`][markdown.Markdown], and calls the [`convertFile`][markdown.Markdown.convertFile] method rather than @@ -496,6 +516,14 @@ def markdownFromFile(**kwargs: Any): encoding (str): Encoding of input and output. **kwargs: Any arguments accepted by the `Markdown` class. + !!! warning + The Python-Markdown library does ***not*** sanitize its HTML output. + As `markdown.markdownFromFile` writes directly to the file system, there is no + easy way to sanitize the output from Python code. Therefore, it is + recommended that the `markdown.markdownFromFile` function not be used on input + from an untrusted source. For more information see [Sanitizing HTML + Output](../../sanitization.md). + """ md = Markdown(**kwargs) md.convertFile(kwargs.get('input', None), diff --git a/mkdocs.yml b/mkdocs.yml index 92f6ccc80..0458dd4e1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -22,6 +22,7 @@ nav: - Installation: install.md - Library Reference: reference.md - Command Line: cli.md + - Sanitization and Security: sanitization.md - Extensions: extensions/index.md - Officially Supported Extensions: - Abbreviations: extensions/abbreviations.md