BUG: Fixed merge page bug issue with _markup_annotations objects#3577
BUG: Fixed merge page bug issue with _markup_annotations objects#3577HSY-999 wants to merge 7 commits intopy-pdf:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3577 +/- ##
=======================================
Coverage 97.30% 97.30%
=======================================
Files 56 56
Lines 9838 9844 +6
Branches 1790 1790
=======================================
+ Hits 9573 9579 +6
Misses 157 157
Partials 108 108 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
stefan6419846
left a comment
There was a problem hiding this comment.
Thanks for the PR. I have left some comments.
Additionally, could you please clarify whether this has further side effects? Does exposing the new method have the potential to break existing code? Are there any other aspects to keep in mind?
| "DictionaryObject", | ||
| self._reference_clone(self.__class__(), pdf_dest, force_duplicate), | ||
| self._reference_clone(self.__new__(self.__class__), pdf_dest, force_duplicate), | ||
| # self.__new__(self.__class__) because we want instance of type __class__, |
There was a problem hiding this comment.
Comments should go before, not after the code.
| original_page: "PageObject" # very local use in writer when appending | ||
|
|
||
| def __new__(cls, *args: tuple[Any], **kwargs: dict[Any, Any]) -> "PageObject": | ||
| """__new__ used here to make sure instance.pdf attribute |
There was a problem hiding this comment.
This is an explaining comment, not a docstring, thus please make it a "simple" comment. Are we able to re-use some of the values used here in the __init__ method?
There was a problem hiding this comment.
what values do you mean? i would want __init__ to handle initialization of attribute because that is a python standard.
There was a problem hiding this comment.
The values initialized to None here.
|
|
||
|
|
||
| def test_merge_page_with_annotation(): | ||
| # added and adapted from issue #3467 |
There was a problem hiding this comment.
This comment is superfluous.
i do not think it is possible to break code. a potential side effect of the change would be that the values of for other |
|
Thanks for the heads-up. Given your comment about |
i review python documentation for i will rewrite code so |
this cause side effect of double init when create object like let me reexamine |
|
@stefan6419846 i have done more reading the codebase for maybe side effects. it seems that there are more class that do not initialized instance variable also. i want to ask about |
|
Sorry, I am having some trouble to follow your comment. Could you please elaborate and reference some code and/or provide some example code showing the behavior in question? |
for key, value in vars(src).items():
if key != "indirect_reference":
setattr(self, key, value)i want to place at the end of |
|
Thanks for the explanation. Given that we iterate over all values with your proposed |
|
Good work @HSY-999. @stefan6419846 is this not being merged because you feel there is a more elegant way? It passes the tests and fixes the issue. I think merging it and open an issue on refactoring is a way forward, but your call obviously. |
My questions remained unanswered and I do not think that the current approach and its advantages and disadvantages have been made clear enough.
Passing the tests does not help if there might be side effects or proper future maintenance cannot be ensured. This issue still seems to only have come up due to your own PR and not in real-world scenarios - at least no other users showed up in the issue.
I made an initial proposal for the preferred way forward, but the PR author decided to take another approach without further justification if I remember correctly. We have enough technical debts to resolve and I do not think that merging incomplete changes is worth it here. |
|
The technical debt causes a longer time to understand the code, with associated readability and maintainability issues. I think we (pypdf) should do more breaking changes to get to better foundations. For the PR I am doing could it avoid the tests, and then it is a name change, which helps in readability. You could think of adding another person who can merge PRs to help you. You do a lot of work, which is appreciated. |
We already started to do more breaking changes lately again after the initial cleanup, but we always have to consider our deprecation process. pypdf is used in many places and we cannot do whatever we want.
Just omitting tests is not how we want to maintain pypdf. If we discover missing coverage during refactoring, we should aim at fixing this - this is how we make pypdf better and more tested.
At the moment, my work on pypdf is (hopefully) not the bottleneck, but IMHO the general lack of interested contributors. Nevertheless, contributors can reduce unnecessary overhead from my side by following the development guidelines like discussing solutions first, adhering to the code style and only pushing changes which are mostly good (as verified locally with running all the tests and linting beforehand). |
I do not understand why not. I would encourage more and faster deprecations (through quicker major releases). This is a usual difference of opinions and is expected and a good thing. Is there anything I can do to get the other PR through? It seems this PR is stopping it progressing. Could you have a look at this PR and see if the changes you need could be done by you and then you do a joint PR so @HSY-999 gets deserved credit. |
It is our current policy and I currently do not see any benefit of actually changing this - we implicitly do a major release once a year to drop support for old Python versions. Having a mostly stable and rarely breaking public API interface is something I consider rather valuable for software, as this avoids unnecessary adaption overhead for library consumers (including my use cases as well).
I have answered on your PR to clarify the situation there. If this really is a bug and you cannot avoid it in your PR, I am sure we can find a solution for your PR.
This would mean discussing the advantages and disadvantages of the different approaches with myself after already having indicated my preferred approach in the issue. Opening a new PR yourself which implements a proper fix where I do not see any direct blockers is something which is always available for such actually stale PRs. |
|
@stefan6419846 what you said is all understandable. Library authors moving with us rather than staying with an old version due to incompatibilty is definitely desirable. |
related to #3467
i notice that
_data_structures.DictionaryObject.clone()was callingself.__class__(), which looked incorrect to me. callingself.__class__()runs both__new__(cls, *args, **kwargs)and__init__(self). we are handling cloning the values ind__._clone(self)later, so we do not need to run__init__(self)constructor as we set the values of the instance using the original object.this had a side effects by breaking
PageObjectcloning, but creating the__new__(cls, *args, **kwargs)function and adding the attribute names to the instance seem to resolve it. this should not change behavior ofPageObjectinstantiation, because__init__(self)is called after__new__(cls, *args, **kwargs)