GitHub - mutating/metacode: A standard language for machine-readable code comments

ⓘ

Many source code analysis tools use specially formatted comments to annotate code. This is an important part of the Python ecosystem, but there is still no single standard for it. This library proposes one.

Why?

The Python ecosystem includes many source code tools, such as linters, coverage tools, and formatters. Many of them use special comments, and their syntax is often very similar. Here are some examples:

Ruff, Vulture —> # noqa, # noqa: E741, F841.
Black and Ruff —> # fmt: on, # fmt: off.
Mypy —> # type: ignore, # type: ignore[error-code].
Coverage —> # pragma: no cover, # pragma: no branch.
Isort —> # isort: skip, # isort: off.
Bandit —> # nosec.

But you know what? There is no single standard for such comments.

Tools also parse these comments differently. Some tools use regular expressions, others rely on simple string processing, and still others use full-fledged parsers, including the Python parser or even written from scratch.

As a result, as a user, you need to remember the rules by which comments are written for each specific tool. And at the same time, you can't be sure that things like double comments (when you want to leave two comments for different tools on the same line of code) will work in principle. And as the creator of such tools, you are faced with a seemingly simple task — just to read a comment — and discover that it is surprisingly tricky, and there are many possible mistakes.

This is exactly the problem that this library solves. It describes a simple and intuitive standard for action comments, and also offers a ready-made parser that creators of other tools can use. The standard offered by this library is based entirely on a subset of Python syntax and can be easily reimplemented even if you do not want to use this library directly.

The language

So, this library offers a language for action comments. Its syntax is a subset of Python syntax, but without Python semantics, because no code is executed. The goal of the language is to expose the contents of a comment in a convenient form when the comment is written in a compatible format. If the comment format is not compatible with the parser, it is ignored.

From the point of view of the language, any meaningful comment can consist of three elements:

Key. This is usually the name of the tool the comment is intended for, but in some cases it may be something else. This can be any valid Python identifier.
Action. A short name for the action associated with this line. Again, this must be a valid Python identifier.
List of arguments. These are often some kind of identifiers of specific linting rules or other arguments associated with this action. The possible data types are described below.

Consider a comment designed to ignore a specific mypy rule:

# type: ignore[error-code]
└-key-┘└action┴-arguments┘

↑ The key here is the word type, that is, what you see before the colon. The action is the ignore word, that is, what comes before the square brackets, but after the colon. Finally, the list of arguments is what appears in square brackets. In this case, it contains only one argument: error-code.

Simplified writing is also possible, without a list of arguments:

# type: ignore
└-key-┘└action┘

↑ In this case, the parser treats this as an empty argument list.

There can be any number of arguments; they can be separated by commas. Here are the valid data types for arguments:

Valid Python identifiers. They are interpreted as strings.
Two valid Python identifiers, separated by the - symbol, like this: error-code. There can be any number of spaces between them; they will be ignored. Interpreted as a single string.
String literals.
Numeric literals (int, float, complex).
Boolean literals (True and False).
None.
... (ellipsis).
Any other Python expressions. This is disabled by default, but you can enable parsing of such code and receive such arguments as AST nodes, after which you can somehow process it yourself.

The syntax of these data types matches Python syntax (except that you can't use multi-line writing options). Over time, it is possible to extend the syntax of metacode, but this core syntax will always be supported.

A single line can contain multiple metacode comments. In this case, they should be separated by the # symbol, effectively chaining comments on the same line. You can also add regular text comments, they will just be ignored by the parser if they are not in metacode format:

# type: ignore # <- This is a comment for mypy! # fmt: off # <- And this is a comment for Ruff!

If you look back at the examples of action comments from various tools above, you may notice that the syntax of most of them (but not all) can be described using metacode, and the rest can usually be adapted with minor changes. Read on to learn how to use the provided parser in practice.

Installation

You can install metacode with pip:

pip install metacode

You can also use instld to quickly try out this package and others without installing them.

Usage

The library exposes a single parser function:

from metacode import parse

To use it, you need to extract the comment text however you like (ideally without the leading #, though this is not required) and pass it to the function, along with the expected key as the second argument. The function returns a list of all parsed comments:

print(parse('type: ignore[error-code]', 'type'))
#> [ParsedComment(key='type', command='ignore', arguments=['error-code'])]
print(parse('type: ignore[error-code] # type: not_ignore[another-error]', 'type'))
#> [ParsedComment(key='type', command='ignore', arguments=['error-code']), ParsedComment(key='type', command='not_ignore', arguments=['another-error'])]

As you can see, the parse() function returns a list of ParsedComment objects. They all have the following fields:

key: str
command: str
arguments: List[Optional[Union[str, int, float, complex, bool, EllipsisType, AST]]]

↑ Please note that you are passing a key, which means that the result is filtered by that key. This way you can read only those comments that relate to your tool, ignoring the rest.

By default, an argument in a comment must be of one of the strictly allowed types. However, you can enable parsing of arbitrary expressions, in which case they will be returned as AST nodes. To do this, pass allow_ast=True:

print(parse('key: action[a + b]', 'key', allow_ast=True))
#> [ParsedComment(key='key', command='action', arguments=[<ast.BinOp object at 0x102e44eb0>])]

↑ If you do not pass allow_ast=True, a metacode.errors.UnknownArgumentTypeError exception will be raised. When processing an argument, you can also raise this exception for an AST node in a form your tool does not support.

⚠️ Be careful when writing code that analyzes the AST. Different versions of the Python interpreter can generate different ASTs for the same code, so don't forget to test your code (for example, using matrix or tox) well. Otherwise, it is better to use standard metacode argument types.

You can allow your users to write keys in any letter case. To do this, pass ignore_case=True:

print(parse('KEY: action', 'key', ignore_case=True))
#> [ParsedComment(key='KEY', command='action', arguments=[])]

You can also easily add support for several different keys. To do this, pass a list of keys instead of one key:

print(parse('key: action # other_key: other_action', ['key', 'other_key']))
#> [ParsedComment(key='key', command='action', arguments=[]), ParsedComment(key='other_key', command='other_action', arguments=[])]

Well, now we can read the comments. But what if we want to write them? There is another function for this: insert():

from metacode import insert, ParsedComment

Pass the comment you want to insert, as well as the current comment (empty if there is no comment, or starting with # if there is), and get the resulting comment text:

print(insert(ParsedComment(key='key', command='command', arguments=['lol', 'lol-kek']), ''))
# key: command[lol, 'lol-kek']
print(insert(ParsedComment(key='key', command='command', arguments=['lol', 'lol-kek']), '# some existing text'))
# key: command[lol, 'lol-kek'] # some existing text

As you can see, our comment is inserted before the existing comment. However, you can do the opposite:

print(insert(ParsedComment(key='key', command='command', arguments=['lol', 'lol-kek']), '# some existing text', at_end=True))
# some existing text # key: command[lol, 'lol-kek']

⚠️ Be careful: AST nodes can be read, but cannot be written.

What about other languages?

If you are writing your Python-related tool in some other language, such as Rust, you may want to adhere to the metacode standard for machine-readable comments; however, you cannot directly use the ready-made parser described above. What can you do in that case?

The proposed metacode language is a syntactic subset of Python. The original metacode parser allows you to read arbitrary arguments written in Python as AST nodes. Such parsing depends on the Python version under which metacode runs, and it cannot be strictly standardized, since Python syntax is gradually evolving in an unpredictable direction. However, you can use a "safe" subset of the valid syntax by implementing your parser based on this EBNF grammar:

line ::= element { "#" element }
element ::= statement | ignored_content
statement ::= key ":" action [ "[" arguments "]" ]
ignored_content ::= ? any sequence of characters excluding "#" ?

key ::= identifier
action ::= identifier { "-" identifier }
arguments ::= argument { "," argument }

argument ::= hyphenated_identifier 
           | identifier 
           | string_literal 
           | complex_literal 
           | number_literal 
           | "True" | "False" | "None" | "..."

hyphenated_identifier ::= identifier "-" identifier
identifier ::= ? python-style identifier ?
string_literal ::= ? python-style string ?
number_literal ::= ? python-style number ?
complex_literal ::= ? python-style complex number ?

If you implement an open-source parser of this grammar in a language other than Python, please let me know. This information can be added to this README.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github		.github
docs/assets		docs/assets
metacode		metacode
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements_dev.txt		requirements_dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

Why?

The language

Installation

Usage

What about other languages?

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table of contents

Why?

The language

Installation

Usage

What about other languages?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages