Skip to content

Decoupling the html2text rendering pipeline #37

@robinkrahl

Description

@robinkrahl

I’ve spent some time using html2text, reading its source code and even writing small patches. Still, I haven’t really grasped the complete rendering process that html2text performs. At the same time, I have some specific requirements like #27 or #36 that cannot be realized with html2text and maybe don’t even belong in a generic HTML rendering library.

Therefore, I am wondering: Would it be possible and would it make sense to decouple the html2text rendering pipeline into steps that can be customized by the user? This would make it easier to understand the rendering process, and it might make it possible to implement some of the requirements I mentioned earlier without having to re-implement the entire rendering stack.

From my point of view, these are the steps of the rendering pipeline (while I’m quite confident that steps 1–3 are correct, I’m not really sure about 4 and 5.):

  1. Parsing the HTML document (src/lib.rs).
  2. Transforming the HTML document into a render tree (src/lib.rs).
  3. Estimating the size of the elements of the render tree (src/lib.rs).
  4. Laying out the elements of the render tree into lines (src/text_renderer.rs?).
  5. Rendering the elements into text (src/text_renderer.rs?).
  6. Annotating the lines using a TextDecorator (src/text_renderer.rs).

It would be especially nice if the user would be able to customize step 5 without having to re-implement everything else.

Is my understanding of the rendering process roughly correct? What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions