Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .spell-dict
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ implementers
InlineProcessor
Jiryu
JSON
JustHTML
keepachangelog
Kjell
Krech
Expand Down Expand Up @@ -111,6 +112,7 @@ rST
ryneeverett
sanitizer
sanitizers
sanitization
Sauder
schemeless
setuptools
Expand All @@ -135,6 +137,7 @@ svn
Swartz
Szakmeister
Takhteyev
templating
Tiago
toc
tokenized
Expand Down Expand Up @@ -168,6 +171,7 @@ workflow
Xanthakis
XHTML
xhtml
XSS
YAML
Yunusov
inline
Expand Down
177 changes: 110 additions & 67 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Generally, you will want to have the Markdown library fully installed on your
system to run the command line script. See the
[Installation instructions](install.md) for details.

## Basic Usage

Python-Markdown's command line script takes advantage of Python's `-m` flag.
Therefore, assuming the python executable is on your system path, use the
following format:
Expand All @@ -28,92 +30,62 @@ At its most basic usage, one would simply pass in a file name as the only argume
python -m markdown input_file.txt
```

Piping input and output (on `STDIN` and `STDOUT`) is fully supported as well.
For example:

```bash
echo "Some **Markdown** text." | python -m markdown > output.html
```

Use the `--help` option for a list all available options and arguments:
Use the `--help` option for a list of all available options and arguments:

```bash
python -m markdown --help
```

If you don't want to call the python executable directly (using the `-m` flag),
follow the instructions below to use a wrapper script:

Setup
-----

Upon installation, the `markdown_py` script will have been copied to
your Python "Scripts" directory. Different systems require different methods to
ensure that any files in the Python "Scripts" directory are on your system
path.

* **Windows**:

Assuming a default install of Python on Windows, your "Scripts" directory
is most likely something like `C:\\Python37\Scripts`. Verify the location
of your "Scripts" directory and add it to you system path.
!!! warning

Calling `markdown_py` from the command line will call the wrapper batch
file `markdown_py.bat` in the `"Scripts"` directory created during install.

* __*nix__ (Linux, OSX, BSD, Unix, etc.):
The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. For more
information see [Sanitizing HTML Output](sanitization.md).

As each \*nix distribution is different and we can't possibly document all
of them here, we'll provide a few helpful pointers:

* Some systems will automatically install the script on your path. Try it
and see if it works. Just run `markdown_py` from the command line.

* Other systems may maintain a separate "Scripts" ("bin") directory which
you need to add to your path. Find it (check with your distribution) and
either add it to your path or make a symbolic link to it from your path.
## Piping Input and Output

* If you are sure `markdown_py` is on your path, but it still is not being
found, check the permissions of the file and make sure it is executable.

As an alternative, you could just `cd` into the directory which contains
the source distribution, and run it from there. However, remember that your
markdown text files will not likely be in that directory, so it is much
more convenient to have `markdown_py` on your path.

!!!Note
Python-Markdown uses `"markdown_py"` as a script name because the Perl
implementation has already taken the more obvious name "markdown".
Additionally, the default Python configuration on some systems would cause a
script named `"markdown.py"` to fail by importing itself rather than the
markdown library. Therefore, the script has been named `"markdown_py"` as a
compromise. If you prefer a different name for the script on your system, it
is suggested that you create a symbolic link to `markdown_py` with your
preferred name.

Usage
-----

To use `markdown_py` from the command line, run it as
Piping input and output (on `STDIN` and `STDOUT`) is fully supported.
For example:

```bash
markdown_py input_file.txt
echo "Some **Markdown** text." | python -m markdown > output.html
```

or
The above command would generate a file named `output.html` with the following content:
```html
<p>Some <strong>Markdown</strong> Text.</p>
```

As Python-Markdown only ever outputs HTML fragments (no `<html>`, `<head>`,
and `<body>` tags), it is generally expected that the command line interface
will always be used to pipe output to a templating engine. In the event that
no additional content is needed and the output only needs to be wrapped in
otherwise empty `<html>`, `<head>`, and `<body>` tags,
[JustHTML](https://emilstenstrom.github.io/justhtml/) can do that with with
a single command:

```bash
markdown_py input_file.txt > output_file.html
echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > output.html
```

For a complete list of options, run
The above command would generate a file named `output.html` with the following content:

```bash
markdown_py --help
```html
<html>
<head></head>
<body>
<p>Some <strong>Markdown</strong> Text.</p>
</body>
</html>
```

Using Extensions
----------------
If you don't need or want JustHTML's HTML sanitation, you can disable it with the
`--unsafe` flag, although that is not recommended. See JustHTML's
[Command Line Interface](https://emilstenstrom.github.io/justhtml/cli.html)
documentation for details.

## Using Extensions

To load a Python-Markdown extension from the command line use the `-x`
(or `--extension`) option. The extension module must be on your `PYTHONPATH`
Expand Down Expand Up @@ -187,3 +159,74 @@ dependencies. The format of your configuration file is automatically detected.
[JSON]: https://json.org/
[PyYAML]: https://pyyaml.org/
[2.5 release notes]: change_log/release-2.5.md

## Using the `markdown_py` Command

If you don't want to call the python executable directly (using the `-m` flag),
follow the instructions below to use a wrapper script:

### Setup `markdown_py`

Upon installation, the `markdown_py` script will have been copied to
your Python "Scripts" directory. Different systems require different methods to
ensure that any files in the Python "Scripts" directory are on your system
path.

* **Windows**:

Assuming a default install of Python on Windows, your "Scripts" directory
is most likely something like `C:\\Python37\Scripts`. Verify the location
of your "Scripts" directory and add it to you system path.

Calling `markdown_py` from the command line will call the wrapper batch
file `markdown_py.bat` in the `"Scripts"` directory created during install.

* __*nix__ (Linux, OSX, BSD, Unix, etc.):

As each \*nix distribution is different and we can't possibly document all
of them here, we'll provide a few helpful pointers:

* Some systems will automatically install the script on your path. Try it
and see if it works. Just run `markdown_py` from the command line.

* Other systems may maintain a separate "Scripts" ("bin") directory which
you need to add to your path. Find it (check with your distribution) and
either add it to your path or make a symbolic link to it from your path.

* If you are sure `markdown_py` is on your path, but it still is not being
found, check the permissions of the file and make sure it is executable.

As an alternative, you could just `cd` into the directory which contains
the source distribution, and run it from there. However, remember that your
markdown text files will not likely be in that directory, so it is much
more convenient to have `markdown_py` on your path.

!!!Note
Python-Markdown uses `"markdown_py"` as a script name because the Perl
implementation has already taken the more obvious name "markdown".
Additionally, the default Python configuration on some systems would cause a
script named `"markdown.py"` to fail by importing itself rather than the
markdown library. Therefore, the script has been named `"markdown_py"` as a
compromise. If you prefer a different name for the script on your system, it
is suggested that you create a symbolic link to `markdown_py` with your
preferred name.

### Using `markdown_py`

To use `markdown_py` from the command line, run it as

```bash
markdown_py input_file.txt
```

or

```bash
markdown_py input_file.txt > output_file.html
```

For a complete list of options, run

```bash
markdown_py --help
```
42 changes: 38 additions & 4 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,16 @@ instance of the `markdown.Markdown` class and pass multiple documents through
it. If you do use a single instance though, make sure to call the `reset`
method appropriately ([see below](#convert)).

### markdown.markdown(text [, **kwargs]) {: #markdown data-toc-label='markdown.markdown' }
### `markdown.markdown(text [, **kwargs])` {: #markdown data-toc-label='markdown.markdown' }

!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. For more
information see [Sanitizing HTML Output].

[Sanitizing HTML Output]: sanitization.md

The following options are available on the `markdown.markdown` function:

Expand Down Expand Up @@ -179,6 +188,15 @@ __tab_length__{: #tab_length }:

### `markdown.markdownFromFile (**kwargs)` {: #markdownFromFile data-toc-label='markdown.markdownFromFile' }

!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. As
`markdown.markdownFromFile` writes directly to the file system, there is
no easy way to sanitize the output from Python code. Therefore, it is
recommended that the `markdown.markdownFromFile` function not be used on
input from an untrusted source. For more information see [Sanitizing HTML
Output].

With a few exceptions, `markdown.markdownFromFile` accepts the same options as
`markdown.markdown`. It does **not** accept a `text` (or Unicode) string.
Instead, it accepts the following required options:
Expand Down Expand Up @@ -216,7 +234,7 @@ __encoding__{: #encoding }
meet your specific needs, it is suggested that you write your own code
to handle your encoding/decoding needs.

### markdown.Markdown([**kwargs]) {: #Markdown data-toc-label='markdown.Markdown' }
### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' }

The same options are available when initializing the `markdown.Markdown` class
as on the [`markdown.markdown`](#markdown) function, except that the class does
Expand All @@ -229,7 +247,14 @@ string must be passed to one of two instance methods.
the thread they were created in. A single instance should not be accessed
from multiple threads.

#### Markdown.convert(source) {: #convert data-toc-label='Markdown.convert' }
#### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' }

!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. For more
information see [Sanitizing HTML Output].

The `source` text must meet the same requirements as the [`text`](#text)
argument of the [`markdown.markdown`](#markdown) function.
Expand Down Expand Up @@ -258,7 +283,16 @@ To make this easier, you can also chain calls to `reset` together:
html3 = md.reset().convert(text3)
```

#### Markdown.convertFile(**kwargs) {: #convertFile data-toc-label='Markdown.convertFile' }
#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' }

!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. As
`Markdown.convertFile` writes directly to the file system, there is no
easy way to sanitize the output from Python code. Therefore, it is
recommended that the `Markdown.convertFile` method not be used on input
from an untrusted source. For more information see [Sanitizing HTML
Output].

The arguments of this method are identical to the arguments of the same
name on the `markdown.markdownFromFile` function ([`input`](#input),
Expand Down
76 changes: 76 additions & 0 deletions docs/sanitization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
title: Sanitization and Security

# Sanitizing HTML Output

The Python-Markdown library does ***not*** sanitize its HTML output. If you
are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. See _[Markdown and
XSS]_ for an overview of some of the dangers and _[Improper markup sanitization
in popular software]_ for notes on best practices to ensure HTML is properly
sanitized. With those concerns in mind, some recommendations are provided
below to ensure that any input from an untrusted source is properly
sanitized.

That said, if you fully trust the source of your input, you may choose to do
nothing. Conversely, you may find solutions other than those suggested here.
However, you do so at your own risk.

## Using `JustHTML`

[`JustHTML`][JustHTML] is recommended as a sanitizer on the output of `markdown.markdown`
or `Markdown.convert`. When you pass HTML output through `JustHTML`, it is
sanitized by default according to a strict [allow list policy]. The policy
can be [customized] if necessary.

``` python
import markdown
from justhtml import JustHTML

html = markdown.markdown(text)
safe_html = JustHTML(html, fragment=True).to_html()
```

## Using `nh3` or `bleach`

If you cannot use `JustHTML` for some reason, some alternatives include [`nh3`][nh3] or
[`bleach`][bleach][^1]. However, be aware that these libraries will not be sufficient
in themselves and will require customization. Some useful lists of allowed
tags and attributes can be found in the [`bleach-allowlist`]
[bleach-allowlist] library, which should work with both `nh3` and `bleach` as `nh3`
mirrors `bleach`'s API.

``` python
import markdown
import bleach
from bleach_allowlist import markdown_tags, markdown_attrs

html = markdown.markdown(text)
safe_html = bleach.clean(html, markdown_tags, markdown_attrs)
```

[^1]: The [`bleach`][bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
However, it may be the only option for some users as `nh3` is a set of Python bindings to a Rust library.

## Sanitizing on the Command Line

Both Python-Markdown and `JustHTML` provide command line interfaces which read
from `STDIN` and write to `STDOUT`. Therefore, they can be used together to
ensure that the output from untrusted input is properly sanitized.

```sh
echo "Some **Markdown** text." | python -m markdown | justhtml - --fragment > safe_output.html
```

For more information on `JustHTML`'s Command Line Interface, see the
[documentation][JustHTML_CLI]. Use the `--help` option for a list of all available
options and arguments to the `markdown` command.

[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/
[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
[JustHTML]: https://emilstenstrom.github.io/justhtml/
[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy
[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy
[nh3]: https://nh3.readthedocs.io/en/latest/
[bleach]: http://bleach.readthedocs.org/en/latest/
[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist
[JustHTML_CLI]: https://emilstenstrom.github.io/justhtml/cli.html
Loading
Loading