Decoupling the html2text rendering pipeline

I’ve spent some time using `html2text`, reading its source code and even writing small patches.  Still, I haven’t really grasped the complete rendering process that `html2text` performs.  At the same time, I have some specific requirements like #27 or #36 that cannot be realized with `html2text` and maybe don’t even belong in a generic HTML rendering library.

Therefore, I am wondering:  Would it be possible and would it make sense to decouple the html2text rendering pipeline into steps that can be customized by the user?  This would make it easier to understand the rendering process, and it might make it possible to implement some of the requirements I mentioned earlier without having to re-implement the entire rendering stack.

From my point of view, these are the steps of the rendering pipeline (while I’m quite confident that steps 1–3 are correct, I’m not really sure about 4 and 5.):
1. Parsing the HTML document (`src/lib.rs`).
2. Transforming the HTML document into a render tree (`src/lib.rs`).
3. Estimating the size of the elements of the render tree (`src/lib.rs`).
4. Laying out the elements of the render tree into lines (`src/text_renderer.rs`?).
5. Rendering the elements into text (`src/text_renderer.rs`?).
6. Annotating the lines using a `TextDecorator` (`src/text_renderer.rs`).

It would be especially nice if the user would be able to customize step 5 without having to re-implement everything else.

Is my understanding of the rendering process roughly correct?  What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoupling the html2text rendering pipeline #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Decoupling the html2text rendering pipeline #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions