I’ve spent some time using html2text, reading its source code and even writing small patches. Still, I haven’t really grasped the complete rendering process that html2text performs. At the same time, I have some specific requirements like #27 or #36 that cannot be realized with html2text and maybe don’t even belong in a generic HTML rendering library.
Therefore, I am wondering: Would it be possible and would it make sense to decouple the html2text rendering pipeline into steps that can be customized by the user? This would make it easier to understand the rendering process, and it might make it possible to implement some of the requirements I mentioned earlier without having to re-implement the entire rendering stack.
From my point of view, these are the steps of the rendering pipeline (while I’m quite confident that steps 1–3 are correct, I’m not really sure about 4 and 5.):
- Parsing the HTML document (
src/lib.rs).
- Transforming the HTML document into a render tree (
src/lib.rs).
- Estimating the size of the elements of the render tree (
src/lib.rs).
- Laying out the elements of the render tree into lines (
src/text_renderer.rs?).
- Rendering the elements into text (
src/text_renderer.rs?).
- Annotating the lines using a
TextDecorator (src/text_renderer.rs).
It would be especially nice if the user would be able to customize step 5 without having to re-implement everything else.
Is my understanding of the rendering process roughly correct? What do you think?
I’ve spent some time using
html2text, reading its source code and even writing small patches. Still, I haven’t really grasped the complete rendering process thathtml2textperforms. At the same time, I have some specific requirements like #27 or #36 that cannot be realized withhtml2textand maybe don’t even belong in a generic HTML rendering library.Therefore, I am wondering: Would it be possible and would it make sense to decouple the html2text rendering pipeline into steps that can be customized by the user? This would make it easier to understand the rendering process, and it might make it possible to implement some of the requirements I mentioned earlier without having to re-implement the entire rendering stack.
From my point of view, these are the steps of the rendering pipeline (while I’m quite confident that steps 1–3 are correct, I’m not really sure about 4 and 5.):
src/lib.rs).src/lib.rs).src/lib.rs).src/text_renderer.rs?).src/text_renderer.rs?).TextDecorator(src/text_renderer.rs).It would be especially nice if the user would be able to customize step 5 without having to re-implement everything else.
Is my understanding of the rendering process roughly correct? What do you think?