- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.3k
          gh-139871: Add bytearray.take_bytes([n]) to efficiently extract bytes
          #140128
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This sets up so the bytes can be "taken" as a byes object without requiring a copy. I ran pyperformance (results below) and don't see any major speedups or slowdowns with this; all seems to be in the noise of my machine. ------ pyperformance compare main.json bytearray_bytes.json -O table main.json ========= Performance version: 1.11.0 Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42 Number of logical CPUs: 32 Start date: 2025-10-14 00:55:52.519236 End date: 2025-10-14 02:23:01.308400 bytearray_bytes.json ==================== Performance version: 1.11.0 Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42 Number of logical CPUs: 32 Start date: 2025-10-13 23:22:29.928152 End date: 2025-10-14 00:49:34.467284 +----------------------------------+-----------+----------------------+--------------+------------------------+ | Benchmark | main.json | bytearray_bytes.json | Change | Significance | +==================================+===========+======================+==============+========================+ | 2to3 | 137 ms | 136 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_generators | 193 ms | 195 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_cpu_io_mixed | 285 ms | 286 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_cpu_io_mixed_tg | 289 ms | 290 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager | 50.4 ms | 51.5 ms | 1.02x slower | Significant (t=-10.40) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_cpu_io_mixed | 223 ms | 225 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_cpu_io_mixed_tg | 263 ms | 264 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_io | 370 ms | 372 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_io_tg | 380 ms | 384 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_memoization | 125 ms | 126 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_memoization_tg | 161 ms | 162 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_tg | 125 ms | 125 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_io | 366 ms | 360 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_io_tg | 359 ms | 361 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_memoization | 177 ms | 181 ms | 1.02x slower | Significant (t=-9.20) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_memoization_tg | 188 ms | 189 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_none | 151 ms | 151 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_none_tg | 150 ms | 151 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | asyncio_tcp | 182 ms | 161 ms | 1.13x faster | Significant (t=32.85) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | asyncio_tcp_ssl | 548 ms | 553 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | asyncio_websockets | 342 ms | 339 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | bench_mp_pool | 7.12 ms | 7.08 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | bench_thread_pool | 818 us | 819 us | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | bpe_tokeniser | 2.10 sec | 2.09 sec | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | chaos | 27.9 ms | 28.0 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | comprehensions | 7.45 us | 7.24 us | 1.03x faster | Significant (t=3.27) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | connected_components | 308 ms | 309 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | coroutines | 11.1 ms | 11.2 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | coverage | 33.6 ms | 34.1 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | create_gc_cycles | 1.16 ms | 1.16 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | crypto_pyaes | 37.1 ms | 35.6 ms | 1.04x faster | Significant (t=10.63) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | dask | 347 ms | 351 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deepcopy | 118 us | 117 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deepcopy_memo | 12.8 us | 12.7 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deepcopy_reduce | 1.32 us | 1.34 us | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deltablue | 1.65 ms | 1.64 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | django_template | 17.9 ms | 17.8 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | docutils | 1.19 sec | 1.20 sec | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | dulwich_log | 19.5 ms | 19.7 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | fannkuch | 184 ms | 181 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | float | 37.1 ms | 36.7 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | gc_traversal | 3.04 ms | 2.84 ms | 1.07x faster | Significant (t=19.48) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | generators | 15.9 ms | 15.3 ms | 1.04x faster | Significant (t=7.03) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | genshi_text | 11.3 ms | 11.2 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | genshi_xml | 25.5 ms | 25.5 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | go | 57.6 ms | 56.7 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | hexiom | 2.92 ms | 2.88 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | html5lib | 26.0 ms | 26.5 ms | 1.02x slower | Significant (t=-9.20) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | json_dumps | 4.48 ms | 4.44 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | json_loads | 11.7 us | 11.7 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | k_core | 1.41 sec | 1.42 sec | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | logging_format | 3.27 us | 3.30 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | logging_silent | 45.5 ns | 45.8 ns | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | logging_simple | 3.02 us | 3.01 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | mako | 6.02 ms | 6.03 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | many_optionals | 473 us | 478 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | mdp | 587 ms | 578 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | meteor_contest | 50.2 ms | 50.5 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | nbody | 54.6 ms | 52.4 ms | 1.04x faster | Significant (t=10.72) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | nqueens | 41.7 ms | 40.4 ms | 1.03x faster | Significant (t=6.79) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pathlib | 9.77 ms | 9.73 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle | 5.99 us | 6.01 us | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle_dict | 12.5 us | 12.8 us | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle_list | 1.98 us | 1.96 us | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle_pure_python | 149 us | 150 us | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pidigits | 111 ms | 115 ms | 1.03x slower | Significant (t=-18.53) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pprint_pformat | 737 ms | 748 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pprint_safe_repr | 362 ms | 369 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pyflate | 211 ms | 205 ms | 1.03x faster | Significant (t=7.43) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | python_startup | 7.88 ms | 7.88 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | python_startup_no_site | 4.72 ms | 4.76 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | raytrace | 130 ms | 128 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_compile | 50.0 ms | 50.2 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_dna | 101 ms | 103 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_effbot | 1.72 ms | 1.77 ms | 1.03x slower | Significant (t=-26.42) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_v8 | 12.5 ms | 12.3 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | richards | 20.4 ms | 20.0 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | richards_super | 23.4 ms | 22.8 ms | 1.03x faster | Significant (t=11.36) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_fft | 154 ms | 153 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_lu | 55.4 ms | 57.0 ms | 1.03x slower | Significant (t=-5.67) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_monte_carlo | 32.8 ms | 32.8 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_sor | 57.8 ms | 56.9 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_sparse_mat_mult | 2.75 ms | 2.76 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | shortest_path | 316 ms | 318 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | spectral_norm | 47.7 ms | 51.6 ms | 1.08x slower | Significant (t=-2.01) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sphinx | 465 ms | 467 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_normalize | 50.3 ms | 50.2 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_optimize | 24.2 ms | 24.4 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_parse | 576 us | 572 us | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_transpile | 724 us | 722 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlite_synth | 1.14 us | 1.15 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | subparsers | 20.6 ms | 20.7 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_expand | 181 ms | 184 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_integrate | 8.54 ms | 8.55 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_str | 103 ms | 105 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_sum | 55.9 ms | 56.0 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | telco | 3.39 ms | 3.34 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | tomli_loads | 971 ms | 982 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | typing_runtime_protocols | 73.2 us | 73.6 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpack_sequence | 25.2 ns | 23.0 ns | 1.10x faster | Significant (t=7.03) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpickle | 6.99 us | 7.05 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpickle_list | 2.07 us | 2.10 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpickle_pure_python | 105 us | 104 us | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_generate | 40.5 ms | 40.7 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_iterparse | 49.7 ms | 50.4 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_parse | 77.2 ms | 79.1 ms | 1.02x slower | Significant (t=-16.14) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_process | 29.5 ms | 29.8 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+
bytearray.take_bytes([n]) to efficiently extract bytes
      bytearray.take_bytes([n]) to efficiently extract bytesbytearray.take_bytes([n]) to efficiently extract bytes
              
          
                Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst
              
                Outdated
          
            Show resolved
            Hide resolved
        
      Co-authored-by: Victor Stinner <[email protected]>
| Threading tests found a non-threading issue that after this change  | 
Co-authored-by: Maurycy Pawłowski-Wieroński <[email protected]>
… which is resulting in threading failure
Co-authored-by: Maurycy Pawłowski-Wieroński <[email protected]>
        
          
                Objects/bytearrayobject.c
              
                Outdated
          
        
      | return PyLong_FromSsize_t(FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc)); | ||
| Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc); | ||
| if (alloc > 0) { | ||
| alloc += sizeof(PyBytesObject); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding in the size of PyBytesObject here (and in sizeof) because ob_alloc is expected by code to be the number of bytes of space available (vs ob_size, the number of bytes in use). Felt more straightforward to me to leave ob_alloc and ob_size definitions as they were and rather add in to the size reporting here.
| @vstinner I think this is ready for another pass; I left github comments around some places I am unsure the CPython standard way to do as well as ones where I'm not sure what the right decision is | 
| With a little more tweaking can rely on  | 
| Microbenchmark: import pyperf
CHUNK_A = b'a' * 1_000
CHUNK_B = b'b' * 100
CHUNK_C = b'c' * 10_000
def build_ba():
    ba = bytearray()
    ba += CHUNK_A
    ba += CHUNK_B
    ba += CHUNK_C
if hasattr(bytearray, 'take_bytes'):
    def take_bytes():
        ba = bytearray()
        ba += CHUNK_A
        ba += CHUNK_B
        ba += CHUNK_C
        return ba.take_bytes()
else:
    def take_bytes():
        ba = bytearray()
        ba += CHUNK_A
        ba += CHUNK_B
        ba += CHUNK_C
        return bytes(ba)
runner = pyperf.Runner()
runner.bench_func('build', build_ba)
runner.bench_func('take_bytes', take_bytes)Results: 
 
 | 
Co-authored-by: Victor Stinner <[email protected]>
De-duplicate the code to set `ob_bytes`, `ob_start`, `ob_alloc` and `ob_size`; rely in resize on ob_start being always set.
| For the micro case there are some optimizations can do in future PRs: For a more macro benchmark, updating  
 | 
| I wrote #140770 to add  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, congrats! ✨️
I would prefer to have a second core dev reviewing a change which changes a core structure of CPython: PyByteArrayObject.
@encukou @serhiy-storchaka: Would you be interested to review this change?
|  | ||
| Taking all bytes is a zero-copy operation. | ||
|  | ||
| .. list-table:: Suggested Replacements | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to put porting notes better in What's New, rather than in general documentation? In a few years, take_bytes won't be “new”.
The versionadded note could link to the 3.15 What's New.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End state to me is these (hopefully) get incorporated into flake8 / ruff as code improvement suggestions rather than more things to memorize. To that end maybe just bytes(bytearray()) -> .take_bytes() in what's new would be enough and rely on external pieces to socialize more (ex. I'm planning to try and teach this a bit through blogs/talks if it ships).
I slightly prefer on bytearray so someone hand-optimizing bytearray code and browsing the page for alternatives will find them. The less efficient patterns are stable Python APIs to me so it's similar to https://docs.python.org/3/library/pathlib.html#corresponding-tools and applies both in 3.15 specifically and forseeable future versions.
        
          
                Lib/test/test_bytes.py
              
                Outdated
          
        
      | c = a.zfill(0x400000) | ||
| assert not c or c[-1] not in (0xdd, 0xcd) | ||
|  | ||
| def take_bytes(b, a): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other modifying functions have a # MODIFIES! comment.
| check([clear] + [splitlines] * 10, bytearray(b'\n' * 0x400)) | ||
| check([clear] + [startswith] * 10) | ||
| check([clear] + [strip] * 10) | ||
| check([clear] + [take_bytes] * 10) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be interesting to call a bunch of take_bytes with an argument, without clear, and check the end result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a take_bytes_n that starts with a repeating 10-byte pattern and checks either get a. IndexError / out of bounds, b. the expected byte pattern exactly
| /* Object layout */ | ||
| typedef struct { | ||
| PyObject_VAR_HEAD | ||
| Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as Py_SIZE(ob_bytes_object)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; currently reading it outside of locks always uses FT_ATOMIC_LOAD_SSIZE_RELAXED while Py_SIZE doesn't so not sure if can swap the two. In earlier iteration on this idea there were also ABI concerns about adding/removing members anywhere other than end of struct (can potentially deprecate the member likely but not remove it)
| memmove(obj->ob_bytes, obj->ob_start, | ||
| Py_MIN(requested_size, Py_SIZE(self))); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider creating a new bytes object here, copying, and decrefing the old -- similar to what the old code does?
It seems that an extra memmove would defeat any benefits of _PyBytes_Resize's optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from a new bytes (malloc + memcpy) to memmove + realloc I think resulted in this pyperformance shift (asyncio streams utilize bytearray fast front deletion):
| asyncio_tcp                      | 182 ms    | 161 ms               | 1.13x faster | Significant (t=32.85)  |
That 20ms in delta is similar to the speedup from changing asyncio.streams to use .take_bytes() removing one more copy (see: #140128 (comment))
If the offset is really large realloc internally may decide to do a alloc + memcpy meaning this path would be memmove + alloc + memcpy which will be slower than just doing a new bytes up front.
Can add a special case around pymalloc's rules on this if wanted:
Lines 2669 to 2681 in abd19ed
| /* The block is staying the same or shrinking. | |
| If it's shrinking, there's a tradeoff: it costs cycles to copy the | |
| block to a smaller size class, but it wastes memory not to copy it. | |
| The compromise here is to copy on shrink only if at least 25% of | |
| size can be shaved off. */ | |
| if (4 * nbytes > 3 * size) { | |
| /* It's the same, or shrinking and new/old > 3/4. */ | |
| *newptr_p = p; | |
| return 1; | |
| } | |
| size = nbytes; | 
| PyObject *it; | ||
| PyObject *(*iternext)(PyObject *); | ||
|  | ||
| /* First __init__; set ob_bytes_object so ob_bytes is always non-null. */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__init__ can be called several times from Python code, but also skipped:
>>> ba = bytearray.__new__(bytearray)
>>> ba.append(123)
Invariants like this need to be introduced in tp_init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the tp_init I think, in PyByteArray_Type:
    bytearray___init__,                 /* tp_init */
I keep debating if it would be better in a tp_new (currently PyType_GenericNew)
Co-authored-by: Petr Viktorin <[email protected]>
Validates take_byes(10) always either is past the end of the input or gets exactly one run of 10 bytes at a 10 byte offset.
|  | ||
| // Copy remaining bytes to a new bytes. | ||
| Py_ssize_t remaining_length = size - to_take; | ||
| PyObject *remaining = PyBytes_FromStringAndSize(self->ob_start + to_take, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential optimization: Given bytearray is often used as a mutbale "holding" place while gathering bytes it's possible code will take_bytes() a very small portion of the buffer (ex. 10 bytes out of a 4096 byte buffer); if that's the case it would be more efficient to copy out the beginning of the buffer into a new bytes and advance ob_start by the 10 bytes.  Exchanges a 4086 byte new alloc + 4086 byte memcpy + 10 byte memmove + resize shrinking 4086 bytes (probably is a new allocation) for a 10 byte new alloc + 10 byte memcpy + modifying a pointer
Update
bytearrayto contain abytesand provide a zero-copy path to "extract" thebytes. This allows making several code paths more efficient.This does not move any codepaths to make use of this new API. The documentation changes include common code patterns which can be made more efficient with this API.
When just changing
bytearrayto containbytesI ran pyperformance on a--with-lto --enable-optimizations --with-static-libpythonbuild (results below) and don't see any major speedups or slowdowns with this; all seems to be in the noise of my machine (Generally changes under 5% or benchmarks that don't touch bytes/bytearray).pyperformance compare main.json bytearray_bytes.json
main.json
Performance version: 1.11.0
Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42
Number of logical CPUs: 32
Start date: 2025-10-14 00:55:52.519236
End date: 2025-10-14 02:23:01.308400
bytearray_bytes.json
Performance version: 1.11.0
Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42
Number of logical CPUs: 32
Start date: 2025-10-13 23:22:29.928152
End date: 2025-10-14 00:49:34.467284
.take_bytes([n])a zero-copy path frombytearraytobytes#139871