Skip to content

Conversation

@cocolato
Copy link
Contributor

@cocolato cocolato commented Feb 8, 2026

When the tier 2 optimizer can determine the type of the container being sliced, _BINARY_SLICE can be replaced with a specialized micro-op that bypasses the PySliceObject allocation entirely.

This PR implements scalar replacement of BINARY_SLICE by add new tier2 op _BINARY_SLICE_LIST, _BINARY_SLICE_TUPLE, and _BINARY_SLICE_UNICODE.

@cocolato
Copy link
Contributor Author

cocolato commented Feb 8, 2026

I think the tests failure that occurred is unrelated to the PR.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty close, thanks!

@Fidget-Spinner
Copy link
Member

You forgot to add the new recorder to BINARY_SLICE macro op

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just one comment and add news please

PySlice_AdjustIndices(PyUnicode_GET_LENGTH(container_o), &istart, &istop, 1);
res_o = PyUnicode_Substring(container_o, istart, istop);
}
DECREF_INPUTS();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should split this into the POP_TOP POP_TOP form, can you please do that in a follow up PR?

@Fidget-Spinner
Copy link
Member

Actually I've changed my mind. Can you please open a PR to convert BINARY_SLICE to the POP_TOP form please? Then after we merge that PR we can update this one

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave the DECREF_INPUTS alone for now and try to remove it in a future PR.

The code has repeated instances that should be factored out to a micro-op

Comment on lines 888 to 892
if (_PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(start), &istart) &&
_PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(stop), &istop)) {
PySlice_AdjustIndices(PyList_GET_SIZE(container_o), &istart, &istop, 1);
res_o = PyList_GetSlice(container_o, istart, istop);
}
Copy link
Member

@Fidget-Spinner Fidget-Spinner Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated code, it should be factored out to a _UNPACK_INDICES micro-op that looks like this:

Suggested change
if (_PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(start), &istart) &&
_PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(stop), &istop)) {
PySlice_AdjustIndices(PyList_GET_SIZE(container_o), &istart, &istop, 1);
res_o = PyList_GetSlice(container_o, istart, istop);
}
int err = _PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(start), &istart);
ERROR_IF(err == 0);
err = _PyEval_SliceIndex(PyStackRef_AsPyObjectBorrow(stop), &istop);
ERROR_IF(err = 0);
...
istart = PyStackRef_Wrap(...)
...

@Fidget-Spinner
Copy link
Member

This is pretty close, I'm going to push some changes in, then wait a day to see if Mark has any objections tomorrow.

@cocolato
Copy link
Contributor Author

cocolato commented Feb 8, 2026

Thanks for your guidance!

1. We cannot exit after unpacking indices, as the stack contains tagged ints now which may lead to a crash. We must insert a guard for the type before unpacking indices.
2. It is possible for an indice to not fit in a tagged int, in which case we must deopt.
3. Recorded values do not always mean that that's the actual type we will see at runtime. So we must guard on that as well.
@Fidget-Spinner
Copy link
Member

@cocolato please check out the latest commit and the extended commit message for the bugs fixed. Thanks!

@Fidget-Spinner
Copy link
Member

add_int.py

def testfunc(n):
    data = [1, 2, 3, 4, 5]
    a, b = 1, 3
    for _ in range(n):
        x = data[a:b]
    return x

testfunc(50000000)

Results using hyperfine:

Summary
  PYTHON_JIT=1 ./python ../add_int.py ran
    1.12 ± 0.02 times faster than PYTHON_JIT=0 ./python ../add_int.py

So we made it >10% faster. Nice!

@cocolato
Copy link
Contributor Author

cocolato commented Feb 9, 2026

@cocolato please check out the latest commit and the extended commit message for the bugs fixed. Thanks!

LGTM! Thanks again for the fix, I learned a lot from it.

Results using hyperfine:

This is my first time learning about this tool, and it looks great. In the past, when I needed to run benchmarks locally, timeit was somewhat inaccurate, and pyperformance was a bit too heavy.

@cocolato
Copy link
Contributor Author

cocolato commented Feb 9, 2026

At the same time, I've noticed that when set PYTHON_OPT_DEBUG=5 is enabled, DUMP_UOP currently only outputs the first ADDED uop. I'll fix this in other PR.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.
This looks sound, but there are some inefficiencies.

Beware the size of uops. They will be converted to machine code, so keeping code size down is important for performance.

self.assertIn("_UNPACK_SEQUENCE_TWO_TUPLE", uops)
self.assertNotIn("_GUARD_TOS_TUPLE", uops)

def test_binary_slice_list(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests that ensure that it deopts correctly.
Something like:

def f(seq):
   return f[0:1]
for _ range(TIER2_THRESHOLD):
    f([1, 2])
f((1,2))

type == &PyList_Type ||
type == &PyTuple_Type)
{
PyTypeObject *type = sym_get_probable_type(container);
Copy link
Member

@markshannon markshannon Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying getting the known type as well and only adding the type guard if the type is not known

ERROR_NO_POP();
}
PySlice_AdjustIndices(len, &istart, &istop, 1);
DEOPT_IF(!PyStackRef_CanTagInt(istart));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace these with asserts.
Once the indices have been adjusted, they cannot be outside the range of tagged integers. We cannot allocate objects that large.

}

macro(BINARY_SLICE) = _SPECIALIZE_BINARY_SLICE + _BINARY_SLICE;
tier2 op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite bulky for jitted code. Could you factor out as much as possible as a helper function?
So that this uop would look like:

tier2 op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) {
    Py_ssize_t istart, istop;
    int err =  _PyEval_UnpackIndices(container, &istart, &istop);
    if (err == 0) {
        ERROR_NO_POP();
    }
    PyStackRef_CLOSE(stop);
    PyStackRef_CLOSE(start);
    sta = PyStackRef_TagInt(istart);
    sto = PyStackRef_TagInt(istop);

}
}

op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start and stop may well be constants. Slices like [:n] and [n:] are quite common, so this can be optimized, popping the None and replacing it with tagged(0) for start, or tagged(MAX) for the end.

If both values are constant we can eliminate the _UNPACK_INDICES. If only one is constant, then we can replace _UNPACK_INDICES with a non-negative compact int guard and tagging.


op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) {
(void)container;
sta = sym_new_compact_int(ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sta = sym_new_compact_int(ctx);
sta = sym_new_type(ctx, &PyLong_Type);

Compact ints may not be as large as indices.

op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) {
(void)container;
sta = sym_new_compact_int(ctx);
sto = sym_new_compact_int(ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@bedevere-app
Copy link

bedevere-app bot commented Feb 9, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants