perf(python): optimize pystr deserialize perf#2007
perf(python): optimize pystr deserialize perf#2007chaokunyang wants to merge 24 commits intoapache:mainfrom
Conversation
8ba4b1b to
6f0a64b
Compare
pandalee99
left a comment
There was a problem hiding this comment.
This code is very efficient,very nice!
maybe we can optimize the repetitive code.
// Handle remaining elements
for (; i < length; i++) {
if (arr[i] > max_sse) {
max_sse = arr[i];
}It's just the way it's written. It's nothing serious.
| # PyUnicode_FromASCII | ||
| return PyUnicode_DecodeLatin1(buf, size, "strict") | ||
| return <unicode>Fury_PyUnicode_FromUCS1(buf, size) | ||
| # return PyUnicode_DecodeLatin1(buf, size, "strict") |
There was a problem hiding this comment.
If i use PyUnicode_DecodeLatin1 directly here, It's faster in macos, which is unexpected Since my implementation used the simd, and if i invoke PyUnicode_DecodeLatin1 directly in PyUnicode_FromUCS1, it's slower too. @penguin-wwy do you have any ideas?
There was a problem hiding this comment.
Could you describe the testing method? The tests I wrote myself do not have this issue.
# integration_tests/cpython_benchmark/fury_benchmark.py
STRING = "sjuveaibngurbzsivbrubiasb3r93284r92r1209130r0fa;2''j93r2nfln''[]\=-_+/,./!@$#%^&*()i9124u0hpq[jnzj0r9h034-2iu1058]"
def micro_benchmark():
runner.bench_func(
"fury_string", fury_object, language, not args.no_ref, STRING
)
runner.bench_func(
"fury_large_string", fury_object, language, not args.no_ref, STRING * 10000
)Using PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 54.7 us +- 2.5 us
fury_large_string: Mean +- std dev: 255 us +- 24 us
Using Fury_PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 53.8 us +- 2.0 us
fury_large_string: Mean +- std dev: 236 us +- 6 us
What does this PR do?
This PR implemented an optimized version of
PyUnicode_FromUCS1/Fury_PyUnicode_FromUCS2for faster performance by :Related issues
Does this PR introduce any user-facing change?
Benchmark