-
-
Notifications
You must be signed in to change notification settings - Fork 34k
gh-144569: Scalar replacement of BINARY_SLICE for list, tuple, and unicode
#144590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I think the tests failure that occurred is unrelated to the PR. |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty close, thanks!
|
You forgot to add the new recorder to BINARY_SLICE macro op |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm just one comment and add news please
Python/bytecodes.c
Outdated
| PySlice_AdjustIndices(PyUnicode_GET_LENGTH(container_o), &istart, &istop, 1); | ||
| res_o = PyUnicode_Substring(container_o, istart, istop); | ||
| } | ||
| DECREF_INPUTS(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should split this into the POP_TOP POP_TOP form, can you please do that in a follow up PR?
|
Actually I've changed my mind. Can you please open a PR to convert BINARY_SLICE to the POP_TOP form please? Then after we merge that PR we can update this one |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave the DECREF_INPUTS alone for now and try to remove it in a future PR.
The code has repeated instances that should be factored out to a micro-op
Python/bytecodes.c
Outdated
| if (_PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(start), &istart) && | ||
| _PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(stop), &istop)) { | ||
| PySlice_AdjustIndices(PyList_GET_SIZE(container_o), &istart, &istop, 1); | ||
| res_o = PyList_GetSlice(container_o, istart, istop); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is repeated code, it should be factored out to a _UNPACK_INDICES micro-op that looks like this:
| if (_PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(start), &istart) && | |
| _PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(stop), &istop)) { | |
| PySlice_AdjustIndices(PyList_GET_SIZE(container_o), &istart, &istop, 1); | |
| res_o = PyList_GetSlice(container_o, istart, istop); | |
| } | |
| int err = _PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(start), &istart); | |
| ERROR_IF(err == 0); | |
| err = _PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(stop), &istop); | |
| ERROR_IF(err = 0); | |
| ... | |
| istart = PyStackRef_Wrap(...) | |
| ... |
Misc/NEWS.d/next/Core_and_Builtins/2026-02-08-13-14-00.gh-issue-144569.pjlJVe.rst
Outdated
Show resolved
Hide resolved
|
This is pretty close, I'm going to push some changes in, then wait a day to see if Mark has any objections tomorrow. |
|
Thanks for your guidance! |
1. We cannot exit after unpacking indices, as the stack contains tagged ints now which may lead to a crash. We must insert a guard for the type before unpacking indices. 2. It is possible for an indice to not fit in a tagged int, in which case we must deopt. 3. Recorded values do not always mean that that's the actual type we will see at runtime. So we must guard on that as well.
|
@cocolato please check out the latest commit and the extended commit message for the bugs fixed. Thanks! |
|
Results using hyperfine: So we made it >10% faster. Nice! |
LGTM! Thanks again for the fix, I learned a lot from it.
This is my first time learning about this tool, and it looks great. In the past, when I needed to run benchmarks locally, |
|
At the same time, I've noticed that when |
markshannon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
This looks sound, but there are some inefficiencies.
Beware the size of uops. They will be converted to machine code, so keeping code size down is important for performance.
| self.assertIn("_UNPACK_SEQUENCE_TWO_TUPLE", uops) | ||
| self.assertNotIn("_GUARD_TOS_TUPLE", uops) | ||
|
|
||
| def test_binary_slice_list(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests that ensure that it deopts correctly.
Something like:
def f(seq):
return f[0:1]
for _ range(TIER2_THRESHOLD):
f([1, 2])
f((1,2))
| type == &PyList_Type || | ||
| type == &PyTuple_Type) | ||
| { | ||
| PyTypeObject *type = sym_get_probable_type(container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying getting the known type as well and only adding the type guard if the type is not known
| ERROR_NO_POP(); | ||
| } | ||
| PySlice_AdjustIndices(len, &istart, &istop, 1); | ||
| DEOPT_IF(!PyStackRef_CanTagInt(istart)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace these with asserts.
Once the indices have been adjusted, they cannot be outside the range of tagged integers. We cannot allocate objects that large.
| } | ||
|
|
||
| macro(BINARY_SLICE) = _SPECIALIZE_BINARY_SLICE + _BINARY_SLICE; | ||
| tier2 op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite bulky for jitted code. Could you factor out as much as possible as a helper function?
So that this uop would look like:
tier2 op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) {
Py_ssize_t istart, istop;
int err = _PyEval_UnpackIndices(container, &istart, &istop);
if (err == 0) {
ERROR_NO_POP();
}
PyStackRef_CLOSE(stop);
PyStackRef_CLOSE(start);
sta = PyStackRef_TagInt(istart);
sto = PyStackRef_TagInt(istop);
| } | ||
| } | ||
|
|
||
| op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start and stop may well be constants. Slices like [:n] and [n:] are quite common, so this can be optimized, popping the None and replacing it with tagged(0) for start, or tagged(MAX) for the end.
If both values are constant we can eliminate the _UNPACK_INDICES. If only one is constant, then we can replace _UNPACK_INDICES with a non-negative compact int guard and tagging.
|
|
||
| op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) { | ||
| (void)container; | ||
| sta = sym_new_compact_int(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sta = sym_new_compact_int(ctx); | |
| sta = sym_new_type(ctx, &PyLong_Type); |
Compact ints may not be as large as indices.
| op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) { | ||
| (void)container; | ||
| sta = sym_new_compact_int(ctx); | ||
| sto = sym_new_compact_int(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
When the tier 2 optimizer can determine the type of the container being sliced,
_BINARY_SLICEcan be replaced with a specialized micro-op that bypasses thePySliceObjectallocation entirely.This PR implements scalar replacement of
BINARY_SLICEby add new tier2 op_BINARY_SLICE_LIST,_BINARY_SLICE_TUPLE, and_BINARY_SLICE_UNICODE.