Skip to content

[SPARK-56066][PYTHON] Lazy import numpy to improve import speed#54896

Open
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:lazy-load-numpy
Open

[SPARK-56066][PYTHON] Lazy import numpy to improve import speed#54896
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:lazy-load-numpy

Conversation

@gaogaotiantian
Copy link
Contributor

What changes were proposed in this pull request?

In StatCounter, only import numpy when needed.

Why are the changes needed?

This is the only place that triggers import numpy when we import pyspark. importing numpy takes about 30 ms. We can reduce the bootstrap time significantly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

No.

Comment on lines +27 to +36
try:
from numpy import maximum, minimum, sqrt
except ImportError:
maximum = max # type: ignore[assignment]
minimum = min # type: ignore[assignment]
sqrt = math.sqrt # type: ignore[assignment]

self.maximum = maximum
self.minimum = minimum
self.sqrt = sqrt
Copy link
Contributor

@Yicong-Huang Yicong-Huang Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those methods are pretty standard, do we necessarily need to use the numpy version of them?
we are falling back to math anyway when numpy is not detected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PR (long long time ago) was to fix the issue that max, min and math.sqrt does not deal with numpy. It might not be an issue now. We might not have to keep it but that would be a different thing to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants