Skip to content

Optimize splitPart scalar function to reduce allocation#17660

Merged
Jackie-Jiang merged 5 commits intoapache:masterfrom
justahuman1:split-part-malloc-fix
Mar 26, 2026
Merged

Optimize splitPart scalar function to reduce allocation#17660
Jackie-Jiang merged 5 commits intoapache:masterfrom
justahuman1:split-part-malloc-fix

Conversation

@justahuman1
Copy link
Copy Markdown
Contributor

@justahuman1 justahuman1 commented Feb 7, 2026

Optimized the splitPart function to use direct string traversal instead of allocating a full String[] array, significantly reducing memory pressure in high-throughput query scenarios.

Performance improvements:

  • Space complexity: O(n) → O(1)
  • Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case so I added splitPartNegativeIdxSingleCharDelim (b4272e2) which increases complexity but makes the kernel much faster

Addresses #17362

JMH Results

Yeah, the common cases are more than 10x faster 🚀 . The original results showed a regression with negative_index, hence b4272e2. However, this code may be harder to manage so that is the trade-off. The below results are from all commits in the PR

Operation Type New (ns/op) Old (ns/op) Speedup (Old/New)
small_index 777.8 38,722.2 ≈ 49.8× faster
large_index 295,474.5 3,867,775.5 ≈ 13.1× faster
negative_index 13.2 38,635.4 ≈ 2,929× faster
large_negative_index 47,797.4 3,824,985.5 ≈ 80× faster
adjacent_delimiters 779.9 3,864.4 ≈ 5× faster
Benchmark                                       (_scenario)  Mode  Cnt           Score          Error  Units
BenchmarkSplitPart.testSplitPartNew             small_index  avgt   10         777.797 ±       22.025  ns/op
BenchmarkSplitPart.testSplitPartNew             large_index  avgt   10      295474.514 ±     2876.945  ns/op
BenchmarkSplitPart.testSplitPartNew          negative_index  avgt   10          13.198 ±        2.553  ns/op
BenchmarkSplitPart.testSplitPartNew    large_negative_index  avgt   10       47797.384 ±      465.119  ns/op
BenchmarkSplitPart.testSplitPartNew     adjacent_delimiters  avgt   10         779.944 ±       14.194  ns/op
BenchmarkSplitPart.testSplitPartNew             many_fields  avgt   10        1935.942 ±       59.342  ns/op
BenchmarkSplitPart.testSplitPartNew        multi_char_delim  avgt   10        4281.167 ±      102.780  ns/op
BenchmarkSplitPart.testSplitPartNew  large_multi_char_delim  avgt   10     1160607.923 ±    20648.116  ns/op
BenchmarkSplitPart.testSplitPartOld             small_index  avgt   10       38722.193 ±     1463.000  ns/op
BenchmarkSplitPart.testSplitPartOld             large_index  avgt   10     3867775.470 ±   107083.008  ns/op
BenchmarkSplitPart.testSplitPartOld          negative_index  avgt   10       38635.403 ±     1325.020  ns/op
BenchmarkSplitPart.testSplitPartOld    large_negative_index  avgt   10     3824985.465 ±    89641.937  ns/op
BenchmarkSplitPart.testSplitPartOld     adjacent_delimiters  avgt   10        3864.391 ±      101.146  ns/op
BenchmarkSplitPart.testSplitPartOld             many_fields  avgt   10     3814644.277 ±   132924.021  ns/op
BenchmarkSplitPart.testSplitPartOld        multi_char_delim  avgt   10      182955.787 ±     3254.768  ns/op
BenchmarkSplitPart.testSplitPartOld  large_multi_char_delim  avgt   10  1043425954.300 ± 29614093.332  ns/op

Allocation via -prof gc

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.002 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.007 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.036 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  6616.001 B/op

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  304.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  3560.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  6616.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.044 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.045 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.046 B/op

* TODO: Revisit if index should be one-based (both Presto and Postgres use one-based index, which starts with 1)
* @param input
* @param delimiter
* @param input the input String to be split into parts.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied verbatim from the the overloaded splitPart function below to keep in sync.

@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from 33336f1 to 5a2b4f1 Compare February 7, 2026 20:53
@@ -0,0 +1,160 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this if required. Added it for my own validation of the perf

@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from 5a2b4f1 to 24644cf Compare February 7, 2026 21:38
@justahuman1 justahuman1 changed the title Update splitPart scalar function to reduce allocation Optimize splitPart scalar function to reduce allocation Feb 10, 2026
@Jackie-Jiang Jackie-Jiang added enhancement Improvement to existing functionality performance Related to performance optimization labels Feb 13, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 93.84615% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.22%. Comparing base (7eec9a7) to head (84c507d).
⚠️ Report is 100 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/common/function/scalar/StringFunctions.java 93.84% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17660      +/-   ##
============================================
- Coverage     63.29%   63.22%   -0.08%     
- Complexity     1466     1525      +59     
============================================
  Files          3190     3194       +4     
  Lines        191904   193705    +1801     
  Branches      29390    29809     +419     
============================================
+ Hits         121474   122469     +995     
- Misses        60940    61622     +682     
- Partials       9490     9614     +124     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.20% <93.84%> (-0.06%) ⬇️
java-21 63.19% <93.84%> (-0.07%) ⬇️
temurin 63.22% <93.84%> (-0.08%) ⬇️
unittests 63.22% <93.84%> (-0.08%) ⬇️
unittests1 55.52% <93.84%> (-0.08%) ⬇️
unittests2 34.17% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@justahuman1
Copy link
Copy Markdown
Contributor Author

I will debug these tests and update the PR. I will also try to get the existing coverage to 100% so we can be confident before migrating

@Jackie-Jiang
Copy link
Copy Markdown
Contributor

Try rebasing. The test failures might not be related

Updated the splitPart function to use direct string traversal instead of
allocating a full String[] array, significantly reducing memory pressure
in high-throughput query scenarios.

Performance improvements:
  - Space complexity: O(n) → O(1)
  - Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case with a large index value. This is the uncommon case and maybe worth the tradeoff since the memory allocation in the common case is now significantly reduced.
@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from b4272e2 to cb95a56 Compare March 7, 2026 04:16
@justahuman1
Copy link
Copy Markdown
Contributor Author

Done. I also added full test coverage which uncovered a behavior change with this new implementation, let me know your thoughts @Jackie-Jiang

Behavioral difference from old implementation

The new implementation differs from StringUtils.splitByWholeSeparator for empty input strings only. The old approach returns an empty array for "", treating it as having 0 fields.
The new implementation treats "" as having 1 field (the empty string itself), which is consistent with how non-empty strings without a matching delimiter are handled:

input delimiter index old impl new impl
"" "." 0 "null" ""
"" "." -1 "null" ""
"" "::" -1 "null" ""
"a" "." 0 "a" "a" (same)

I feel that the new implementation is more correct, rather than returning null. wdyt?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the splitPart scalar function in pinot-common to avoid allocating a full String[] split result, reducing memory pressure for high-throughput query workloads. It also adds a JMH benchmark and expands unit test coverage for additional edge cases.

Changes:

  • Reimplemented StringFunctions.splitPart(input, delimiter, index) using direct string traversal (including a specialized negative-index fast path for single-char delimiters).
  • Added JMH benchmarks to compare the new vs old splitPart implementations.
  • Added many new splitPart unit test cases covering edge scenarios (empty delimiter, consecutive delimiters, leading/trailing delimiters, multi-char delimiters, empty input).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java Replaces splitPart implementation with allocation-reduced traversal logic and adds a negative-index optimized helper.
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java Expands splitPart test matrix for more delimiter/index boundary cases.
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkSplitPart.java Introduces a JMH benchmark to measure old vs new splitPart performance across scenarios.

Comment on lines +101 to +102
_delimiter = ":";
_input = buildString(50, repeatStr(_delimiter, 10));
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the multi_char_delim scenario, _delimiter is set to ":" but _input is built using repeatStr(_delimiter, 10) as the delimiter between fields. This means the benchmark isn't actually exercising a multi-character delimiter for splitPart (it passes a single-char delimiter that happens to appear consecutively), so the results can be misleading. Consider setting _delimiter to the repeated string (or use something like "::") and building _input with that same _delimiter.

Suggested change
_delimiter = ":";
_input = buildString(50, repeatStr(_delimiter, 10));
_delimiter = repeatStr(":", 10);
_input = buildString(50, _delimiter);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is semantically the same. its more readable this way so I will address it but the logic is the same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +69 to +97
case "large_index":
// Extract 80th element from 100-field string
_delimiter = ",";
_input = buildString(10000, _delimiter);
_index = 790;
break;
case "negative_index":
// Extract last element from 100-field string
_delimiter = ",";
_input = buildString(100, _delimiter);
_index = -1;
break;
case "large_negative_index":
// Extract last element from 100-field string
_delimiter = ",";
_input = buildString(10000, _delimiter);
_index = -7242;
break;
case "adjacent_delimiters":
// String with many consecutive delimiters
_input = "field0+++field1+++field2+++field3";
_delimiter = "+";
_index = 1;
break;
case "many_fields":
// Extract 5th element from 1000-field string
_delimiter = ",";
_input = buildString(10000, _delimiter);
_index = 4;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several scenario comments don't match the actual parameters (e.g., large_index/many_fields mention 100/1000 fields but build 10,000; large_index says 80th but uses index 790). Align the comments (or the parameters) so the benchmark scenarios remain self-explanatory.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@justahuman1 justahuman1 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this. updated the comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start += delimLen;
}
// optimization for negative index with single-char delimiter since common case
// multi-char delimiter with negative index can be handled in future since less common and more complex
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-index will overflow when index == Integer.MIN_VALUE, causing incorrect behavior for that edge case (and potentially passing a negative value into the helper). Add an explicit guard for index == Integer.MIN_VALUE (e.g., return "null" early) or handle the absolute-value conversion without overflow.

Suggested change
// multi-char delimiter with negative index can be handled in future since less common and more complex
// multi-char delimiter with negative index can be handled in future since less common and more complex
if (index == Integer.MIN_VALUE) {
return "null";
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh TIL. Done in 710c916

@justahuman1
Copy link
Copy Markdown
Contributor Author

Addressed all comment from copilot cc: @xiangfu0 @deemoliu

} else {
// compare with {@link BenchmarkSplitPart} for future changes
int delimLen = delimiter.length();
if (delimLen == 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a behavior change. When delimiter is empty, the current behavior splits on whitespace.

Can you also check the behavior on PostgreSQL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I checked the behavior for postgres;

When the delimiter is an empty string:

  • PostgreSQL: returns the entire input for field 1/-1, empty string for other fields
  • Pinot (old): splits on whitespace — this was a side effect of StringUtils.splitByWholeSeparator treating empty ("") the same as null

I've updated the implementation to preserve the existing whitespace-splitting behavior for the empty delimiter case by falling back to StringUtils.splitByWholeSeparator. The changes in this PR now only applies to non-empty delimiters, so there's no behavioral change.

If we want to align with PostgreSQL's empty-delimiter semantics, that can be a separate PR since it's a behavioral change independent of this performance optimization.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Could you please also add a test using random string generator and some simple delimiters to ensure the correctness of the new impl?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 84c507d

@xiangfu0 xiangfu0 added the index Related to indexing (general) label Mar 20, 2026
@Jackie-Jiang Jackie-Jiang removed the index Related to indexing (general) label Mar 24, 2026
@xiangfu0 xiangfu0 requested a review from Copilot March 24, 2026 19:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


// skip leading delimiters
int start = 0;
int len = input.length();
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For empty input (""), this implementation returns an empty string for index 0 and -1 (because start == len and substring(len) is ""). That differs from the existing splitPart(input, delimiter, limit, index) behavior (which returns "null" for the same cases with a large limit) and is likely an unintended semantic change for splitPart without limit. Consider adding an explicit empty-input guard (return "null") to keep the 3-arg and 4-arg splitPart variants consistent.

Suggested change
int len = input.length();
int len = input.length();
if (len == 0) {
// Align behavior with splitPart(input, delimiter, limit, index) for empty input
return "null";
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed as part of Jackie's comment regarding behavior change. Thanks 👍

Comment on lines +143 to +148
{"", ".", 0, 100, "", "null"},
{"", ".", -1, 100, "", "null"},
{"", ".", -2, 100, "null", "null"},

// Empty input with multi-char delimiter and negative index
{"", "::", -1, 100, "", "null"},
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new test cases assert that StringFunctions.splitPart("", ".", 0/-1) returns "" (empty string). The 4-arg splitPart(input, delimiter, limit, index) currently returns "null" for the same inputs with limit=100, so this bakes in a behavioral divergence between the overloads. If the intent is to preserve existing semantics, update these expectations to match the overload (and the prior implementation) rather than introducing a silent behavior change for empty input.

Suggested change
{"", ".", 0, 100, "", "null"},
{"", ".", -1, 100, "", "null"},
{"", ".", -2, 100, "null", "null"},
// Empty input with multi-char delimiter and negative index
{"", "::", -1, 100, "", "null"},
{"", ".", 0, 100, "null", "null"},
{"", ".", -1, 100, "null", "null"},
{"", ".", -2, 100, "null", "null"},
// Empty input with multi-char delimiter and negative index
{"", "::", -1, 100, "null", "null"},

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I was also doing this per #17660 (comment)

} else if (index < 0 && index >= -splitString.length) {
return splitString[splitString.length + index];
} else {
// compare with {@link BenchmarkSplitPart} for future changes
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The // compare with {@link BenchmarkSplitPart} note uses Javadoc {@link ...} syntax in a non-Javadoc comment and references a class in the perf module. Consider switching to plain text (or a URL) to avoid implying a resolvable Javadoc symbol and reduce cross-module coupling in production code comments.

Suggested change
// compare with {@link BenchmarkSplitPart} for future changes
// compare with BenchmarkSplitPart (perf module) for future changes

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

String input, char delimiter, int index, int len, int start) {
// input is empty or contains only delimiters
if (start == len) {
return index == 1 ? "" : "null";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Why is it returning empty string?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question—

Even in the old implementation, splitByWholeSeparator("+++++", "+") returns [""] — a single-element array with one empty string (i.e. the trailing content after all delimiters are consumed, which is an empty char). So index=-1 maps to parts[0] = "". This is also why I found the StringUtils behavior confusing..

Reproducible in jshell:

❯   jshell --class-path pinot-server/target/pinot-server-pkg/lib/commons-lang3-3.20.0.jar

|  Welcome to JShell -- Version 17.0.5
|  For an introduction type: /help intro

jshell> import org.apache.commons.lang3.StringUtils;

jshell> StringUtils.splitByWholeSeparator("+++++", "+")
$2 ==> String[1] { "" }

Also see intelliJ test coverage where it shows the randomized test (which uses the older StringUtils version for the expected value) hitting this branch as well:

Image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also do have test cases that hit both these branches:

{"+++++", "+", -1, 100, "", ""},

{"+++++", "+", -2, 100, "null", "null"},

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

@Jackie-Jiang Jackie-Jiang merged commit 61009f6 into apache:master Mar 26, 2026
30 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality performance Related to performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants