[SPARK-56066][PYTHON] Lazy import numpy to improve import speed#54896
Open
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
Open
[SPARK-56066][PYTHON] Lazy import numpy to improve import speed#54896gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
Conversation
Comment on lines
+27
to
+36
| try: | ||
| from numpy import maximum, minimum, sqrt | ||
| except ImportError: | ||
| maximum = max # type: ignore[assignment] | ||
| minimum = min # type: ignore[assignment] | ||
| sqrt = math.sqrt # type: ignore[assignment] | ||
|
|
||
| self.maximum = maximum | ||
| self.minimum = minimum | ||
| self.sqrt = sqrt |
Contributor
There was a problem hiding this comment.
those methods are pretty standard, do we necessarily need to use the numpy version of them?
we are falling back to math anyway when numpy is not detected
Contributor
Author
There was a problem hiding this comment.
The original PR (long long time ago) was to fix the issue that max, min and math.sqrt does not deal with numpy. It might not be an issue now. We might not have to keep it but that would be a different thing to fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In
StatCounter, only importnumpywhen needed.Why are the changes needed?
This is the only place that triggers
import numpywhen weimport pyspark. importingnumpytakes about 30 ms. We can reduce the bootstrap time significantly.Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI
Was this patch authored or co-authored using generative AI tooling?
No.