Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Oct 14, 2025

Update bytearray to contain a bytes and provide a zero-copy path to "extract" the bytes. This allows making several code paths more efficient.

This does not move any codepaths to make use of this new API. The documentation changes include common code patterns which can be made more efficient with this API.


When just changing bytearray to contain bytes I ran pyperformance on a --with-lto --enable-optimizations --with-static-libpython build (results below) and don't see any major speedups or slowdowns with this; all seems to be in the noise of my machine (Generally changes under 5% or benchmarks that don't touch bytes/bytearray).

pyperformance compare main.json bytearray_bytes.json

main.json

Performance version: 1.11.0
Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42
Number of logical CPUs: 32
Start date: 2025-10-14 00:55:52.519236
End date: 2025-10-14 02:23:01.308400

bytearray_bytes.json

Performance version: 1.11.0
Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42
Number of logical CPUs: 32
Start date: 2025-10-13 23:22:29.928152
End date: 2025-10-14 00:49:34.467284

+----------------------------------+-----------+----------------------+--------------+------------------------+
| Benchmark                        | main.json | bytearray_bytes.json | Change       | Significance           |
+==================================+===========+======================+==============+========================+
| 2to3                             | 137 ms    | 136 ms               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_generators                 | 193 ms    | 195 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_cpu_io_mixed          | 285 ms    | 286 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_cpu_io_mixed_tg       | 289 ms    | 290 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager                 | 50.4 ms   | 51.5 ms              | 1.02x slower | Significant (t=-10.40) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed    | 223 ms    | 225 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed_tg | 263 ms    | 264 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_io              | 370 ms    | 372 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_io_tg           | 380 ms    | 384 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_memoization     | 125 ms    | 126 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_memoization_tg  | 161 ms    | 162 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_tg              | 125 ms    | 125 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_io                    | 366 ms    | 360 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_io_tg                 | 359 ms    | 361 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_memoization           | 177 ms    | 181 ms               | 1.02x slower | Significant (t=-9.20)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_memoization_tg        | 188 ms    | 189 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_none                  | 151 ms    | 151 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_none_tg               | 150 ms    | 151 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| asyncio_tcp                      | 182 ms    | 161 ms               | 1.13x faster | Significant (t=32.85)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| asyncio_tcp_ssl                  | 548 ms    | 553 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| asyncio_websockets               | 342 ms    | 339 ms               | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| bench_mp_pool                    | 7.12 ms   | 7.08 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| bench_thread_pool                | 818 us    | 819 us               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| bpe_tokeniser                    | 2.10 sec  | 2.09 sec             | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| chaos                            | 27.9 ms   | 28.0 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| comprehensions                   | 7.45 us   | 7.24 us              | 1.03x faster | Significant (t=3.27)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| connected_components             | 308 ms    | 309 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| coroutines                       | 11.1 ms   | 11.2 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| coverage                         | 33.6 ms   | 34.1 ms              | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| create_gc_cycles                 | 1.16 ms   | 1.16 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| crypto_pyaes                     | 37.1 ms   | 35.6 ms              | 1.04x faster | Significant (t=10.63)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| dask                             | 347 ms    | 351 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deepcopy                         | 118 us    | 117 us               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deepcopy_memo                    | 12.8 us   | 12.7 us              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deepcopy_reduce                  | 1.32 us   | 1.34 us              | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deltablue                        | 1.65 ms   | 1.64 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| django_template                  | 17.9 ms   | 17.8 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| docutils                         | 1.19 sec  | 1.20 sec             | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| dulwich_log                      | 19.5 ms   | 19.7 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| fannkuch                         | 184 ms    | 181 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| float                            | 37.1 ms   | 36.7 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| gc_traversal                     | 3.04 ms   | 2.84 ms              | 1.07x faster | Significant (t=19.48)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| generators                       | 15.9 ms   | 15.3 ms              | 1.04x faster | Significant (t=7.03)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| genshi_text                      | 11.3 ms   | 11.2 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| genshi_xml                       | 25.5 ms   | 25.5 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| go                               | 57.6 ms   | 56.7 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| hexiom                           | 2.92 ms   | 2.88 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| html5lib                         | 26.0 ms   | 26.5 ms              | 1.02x slower | Significant (t=-9.20)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| json_dumps                       | 4.48 ms   | 4.44 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| json_loads                       | 11.7 us   | 11.7 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| k_core                           | 1.41 sec  | 1.42 sec             | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| logging_format                   | 3.27 us   | 3.30 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| logging_silent                   | 45.5 ns   | 45.8 ns              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| logging_simple                   | 3.02 us   | 3.01 us              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| mako                             | 6.02 ms   | 6.03 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| many_optionals                   | 473 us    | 478 us               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| mdp                              | 587 ms    | 578 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| meteor_contest                   | 50.2 ms   | 50.5 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| nbody                            | 54.6 ms   | 52.4 ms              | 1.04x faster | Significant (t=10.72)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| nqueens                          | 41.7 ms   | 40.4 ms              | 1.03x faster | Significant (t=6.79)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pathlib                          | 9.77 ms   | 9.73 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle                           | 5.99 us   | 6.01 us              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle_dict                      | 12.5 us   | 12.8 us              | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle_list                      | 1.98 us   | 1.96 us              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle_pure_python               | 149 us    | 150 us               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pidigits                         | 111 ms    | 115 ms               | 1.03x slower | Significant (t=-18.53) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pprint_pformat                   | 737 ms    | 748 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pprint_safe_repr                 | 362 ms    | 369 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pyflate                          | 211 ms    | 205 ms               | 1.03x faster | Significant (t=7.43)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| python_startup                   | 7.88 ms   | 7.88 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| python_startup_no_site           | 4.72 ms   | 4.76 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| raytrace                         | 130 ms    | 128 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_compile                    | 50.0 ms   | 50.2 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_dna                        | 101 ms    | 103 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_effbot                     | 1.72 ms   | 1.77 ms              | 1.03x slower | Significant (t=-26.42) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_v8                         | 12.5 ms   | 12.3 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| richards                         | 20.4 ms   | 20.0 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| richards_super                   | 23.4 ms   | 22.8 ms              | 1.03x faster | Significant (t=11.36)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_fft                      | 154 ms    | 153 ms               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_lu                       | 55.4 ms   | 57.0 ms              | 1.03x slower | Significant (t=-5.67)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_monte_carlo              | 32.8 ms   | 32.8 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_sor                      | 57.8 ms   | 56.9 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_sparse_mat_mult          | 2.75 ms   | 2.76 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| shortest_path                    | 316 ms    | 318 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| spectral_norm                    | 47.7 ms   | 51.6 ms              | 1.08x slower | Significant (t=-2.01)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sphinx                           | 465 ms    | 467 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_normalize             | 50.3 ms   | 50.2 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_optimize              | 24.2 ms   | 24.4 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_parse                 | 576 us    | 572 us               | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_transpile             | 724 us    | 722 us               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlite_synth                     | 1.14 us   | 1.15 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| subparsers                       | 20.6 ms   | 20.7 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_expand                     | 181 ms    | 184 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_integrate                  | 8.54 ms   | 8.55 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_str                        | 103 ms    | 105 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_sum                        | 55.9 ms   | 56.0 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| telco                            | 3.39 ms   | 3.34 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| tomli_loads                      | 971 ms    | 982 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| typing_runtime_protocols         | 73.2 us   | 73.6 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpack_sequence                  | 25.2 ns   | 23.0 ns              | 1.10x faster | Significant (t=7.03)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpickle                         | 6.99 us   | 7.05 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpickle_list                    | 2.07 us   | 2.10 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpickle_pure_python             | 105 us    | 104 us               | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_generate               | 40.5 ms   | 40.7 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_iterparse              | 49.7 ms   | 50.4 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_parse                  | 77.2 ms   | 79.1 ms              | 1.02x slower | Significant (t=-16.14) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_process                | 29.5 ms   | 29.8 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+

This sets up so the bytes can be "taken" as a byes object without
requiring a copy.

I ran pyperformance (results below) and don't see any major speedups
or slowdowns with this; all seems to be in the noise of my machine.

------

pyperformance compare main.json bytearray_bytes.json -O table
main.json
=========

Performance version: 1.11.0
Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42
Number of logical CPUs: 32
Start date: 2025-10-14 00:55:52.519236
End date: 2025-10-14 02:23:01.308400

bytearray_bytes.json
====================

Performance version: 1.11.0
Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42
Number of logical CPUs: 32
Start date: 2025-10-13 23:22:29.928152
End date: 2025-10-14 00:49:34.467284

+----------------------------------+-----------+----------------------+--------------+------------------------+
| Benchmark                        | main.json | bytearray_bytes.json | Change       | Significance           |
+==================================+===========+======================+==============+========================+
| 2to3                             | 137 ms    | 136 ms               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_generators                 | 193 ms    | 195 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_cpu_io_mixed          | 285 ms    | 286 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_cpu_io_mixed_tg       | 289 ms    | 290 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager                 | 50.4 ms   | 51.5 ms              | 1.02x slower | Significant (t=-10.40) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed    | 223 ms    | 225 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed_tg | 263 ms    | 264 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_io              | 370 ms    | 372 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_io_tg           | 380 ms    | 384 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_memoization     | 125 ms    | 126 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_memoization_tg  | 161 ms    | 162 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_eager_tg              | 125 ms    | 125 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_io                    | 366 ms    | 360 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_io_tg                 | 359 ms    | 361 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_memoization           | 177 ms    | 181 ms               | 1.02x slower | Significant (t=-9.20)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_memoization_tg        | 188 ms    | 189 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_none                  | 151 ms    | 151 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| async_tree_none_tg               | 150 ms    | 151 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| asyncio_tcp                      | 182 ms    | 161 ms               | 1.13x faster | Significant (t=32.85)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| asyncio_tcp_ssl                  | 548 ms    | 553 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| asyncio_websockets               | 342 ms    | 339 ms               | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| bench_mp_pool                    | 7.12 ms   | 7.08 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| bench_thread_pool                | 818 us    | 819 us               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| bpe_tokeniser                    | 2.10 sec  | 2.09 sec             | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| chaos                            | 27.9 ms   | 28.0 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| comprehensions                   | 7.45 us   | 7.24 us              | 1.03x faster | Significant (t=3.27)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| connected_components             | 308 ms    | 309 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| coroutines                       | 11.1 ms   | 11.2 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| coverage                         | 33.6 ms   | 34.1 ms              | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| create_gc_cycles                 | 1.16 ms   | 1.16 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| crypto_pyaes                     | 37.1 ms   | 35.6 ms              | 1.04x faster | Significant (t=10.63)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| dask                             | 347 ms    | 351 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deepcopy                         | 118 us    | 117 us               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deepcopy_memo                    | 12.8 us   | 12.7 us              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deepcopy_reduce                  | 1.32 us   | 1.34 us              | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| deltablue                        | 1.65 ms   | 1.64 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| django_template                  | 17.9 ms   | 17.8 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| docutils                         | 1.19 sec  | 1.20 sec             | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| dulwich_log                      | 19.5 ms   | 19.7 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| fannkuch                         | 184 ms    | 181 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| float                            | 37.1 ms   | 36.7 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| gc_traversal                     | 3.04 ms   | 2.84 ms              | 1.07x faster | Significant (t=19.48)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| generators                       | 15.9 ms   | 15.3 ms              | 1.04x faster | Significant (t=7.03)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| genshi_text                      | 11.3 ms   | 11.2 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| genshi_xml                       | 25.5 ms   | 25.5 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| go                               | 57.6 ms   | 56.7 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| hexiom                           | 2.92 ms   | 2.88 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| html5lib                         | 26.0 ms   | 26.5 ms              | 1.02x slower | Significant (t=-9.20)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| json_dumps                       | 4.48 ms   | 4.44 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| json_loads                       | 11.7 us   | 11.7 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| k_core                           | 1.41 sec  | 1.42 sec             | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| logging_format                   | 3.27 us   | 3.30 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| logging_silent                   | 45.5 ns   | 45.8 ns              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| logging_simple                   | 3.02 us   | 3.01 us              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| mako                             | 6.02 ms   | 6.03 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| many_optionals                   | 473 us    | 478 us               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| mdp                              | 587 ms    | 578 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| meteor_contest                   | 50.2 ms   | 50.5 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| nbody                            | 54.6 ms   | 52.4 ms              | 1.04x faster | Significant (t=10.72)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| nqueens                          | 41.7 ms   | 40.4 ms              | 1.03x faster | Significant (t=6.79)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pathlib                          | 9.77 ms   | 9.73 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle                           | 5.99 us   | 6.01 us              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle_dict                      | 12.5 us   | 12.8 us              | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle_list                      | 1.98 us   | 1.96 us              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pickle_pure_python               | 149 us    | 150 us               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pidigits                         | 111 ms    | 115 ms               | 1.03x slower | Significant (t=-18.53) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pprint_pformat                   | 737 ms    | 748 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pprint_safe_repr                 | 362 ms    | 369 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| pyflate                          | 211 ms    | 205 ms               | 1.03x faster | Significant (t=7.43)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| python_startup                   | 7.88 ms   | 7.88 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| python_startup_no_site           | 4.72 ms   | 4.76 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| raytrace                         | 130 ms    | 128 ms               | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_compile                    | 50.0 ms   | 50.2 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_dna                        | 101 ms    | 103 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_effbot                     | 1.72 ms   | 1.77 ms              | 1.03x slower | Significant (t=-26.42) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| regex_v8                         | 12.5 ms   | 12.3 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| richards                         | 20.4 ms   | 20.0 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| richards_super                   | 23.4 ms   | 22.8 ms              | 1.03x faster | Significant (t=11.36)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_fft                      | 154 ms    | 153 ms               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_lu                       | 55.4 ms   | 57.0 ms              | 1.03x slower | Significant (t=-5.67)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_monte_carlo              | 32.8 ms   | 32.8 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_sor                      | 57.8 ms   | 56.9 ms              | 1.02x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| scimark_sparse_mat_mult          | 2.75 ms   | 2.76 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| shortest_path                    | 316 ms    | 318 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| spectral_norm                    | 47.7 ms   | 51.6 ms              | 1.08x slower | Significant (t=-2.01)  |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sphinx                           | 465 ms    | 467 ms               | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_normalize             | 50.3 ms   | 50.2 ms              | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_optimize              | 24.2 ms   | 24.4 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_parse                 | 576 us    | 572 us               | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlglot_v2_transpile             | 724 us    | 722 us               | 1.00x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sqlite_synth                     | 1.14 us   | 1.15 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| subparsers                       | 20.6 ms   | 20.7 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_expand                     | 181 ms    | 184 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_integrate                  | 8.54 ms   | 8.55 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_str                        | 103 ms    | 105 ms               | 1.02x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| sympy_sum                        | 55.9 ms   | 56.0 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| telco                            | 3.39 ms   | 3.34 ms              | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| tomli_loads                      | 971 ms    | 982 ms               | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| typing_runtime_protocols         | 73.2 us   | 73.6 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpack_sequence                  | 25.2 ns   | 23.0 ns              | 1.10x faster | Significant (t=7.03)   |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpickle                         | 6.99 us   | 7.05 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpickle_list                    | 2.07 us   | 2.10 us              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| unpickle_pure_python             | 105 us    | 104 us               | 1.01x faster | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_generate               | 40.5 ms   | 40.7 ms              | 1.00x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_iterparse              | 49.7 ms   | 50.4 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_parse                  | 77.2 ms   | 79.1 ms              | 1.02x slower | Significant (t=-16.14) |
+----------------------------------+-----------+----------------------+--------------+------------------------+
| xml_etree_process                | 29.5 ms   | 29.8 ms              | 1.01x slower | Not significant        |
+----------------------------------+-----------+----------------------+--------------+------------------------+
@cmaloney cmaloney changed the title gh-139871: Update bytearray to contain PyBytesObject gh-139871: Implement bytearray.take_bytes([n]) to efficiently extract bytes Oct 15, 2025
@cmaloney cmaloney changed the title gh-139871: Implement bytearray.take_bytes([n]) to efficiently extract bytes gh-139871: Add bytearray.take_bytes([n]) to efficiently extract bytes Oct 15, 2025
@cmaloney
Copy link
Contributor Author

Threading tests found a non-threading issue that after this change ba = bytearray(b'123'); ba.clear(); ba.copy() has slightly different internals (sizeof, alloc) than before. Exploring options.

Co-authored-by: Maurycy Pawłowski-Wieroński <[email protected]>
return PyLong_FromSsize_t(FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc));
Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc);
if (alloc > 0) {
alloc += sizeof(PyBytesObject);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding in the size of PyBytesObject here (and in sizeof) because ob_alloc is expected by code to be the number of bytes of space available (vs ob_size, the number of bytes in use). Felt more straightforward to me to leave ob_alloc and ob_size definitions as they were and rather add in to the size reporting here.

@cmaloney
Copy link
Contributor Author

@vstinner I think this is ready for another pass; I left github comments around some places I am unsure the CPython standard way to do as well as ones where I'm not sure what the right decision is

@cmaloney
Copy link
Contributor Author

cmaloney commented Oct 29, 2025

With a little more tweaking can rely on bytes for _PyByteArray_empty_string (changes from this PR: https://github.com/cmaloney/cpython/pull/1/files); makes __init__ slightly more complicated but simplifies bytearray_resize_lock_held; not sure worth incorporating here.

@vstinner
Copy link
Member

Microbenchmark:

import pyperf

CHUNK_A = b'a' * 1_000
CHUNK_B = b'b' * 100
CHUNK_C = b'c' * 10_000

def build_ba():
    ba = bytearray()
    ba += CHUNK_A
    ba += CHUNK_B
    ba += CHUNK_C

if hasattr(bytearray, 'take_bytes'):
    def take_bytes():
        ba = bytearray()
        ba += CHUNK_A
        ba += CHUNK_B
        ba += CHUNK_C
        return ba.take_bytes()
else:
    def take_bytes():
        ba = bytearray()
        ba += CHUNK_A
        ba += CHUNK_B
        ba += CHUNK_C
        return bytes(ba)

runner = pyperf.Runner()
runner.bench_func('build', build_ba)
runner.bench_func('take_bytes', take_bytes)

Results:

Benchmark ref change
build 441 ns 459 ns: 1.04x slower
take_bytes 739 ns 488 ns: 1.51x faster
Geometric mean (ref) 1.20x faster

take_bytes is 1.5x faster which is quite nice! Sadly, build is a little bit slower (1.04x) but IMO it's acceptable on a microbenchmark (it's less than 10%).

cmaloney and others added 2 commits October 29, 2025 10:27
Co-authored-by: Victor Stinner <[email protected]>
De-duplicate the code to set `ob_bytes`, `ob_start`, `ob_alloc` and `ob_size`; rely in resize on ob_start being always set.
@cmaloney
Copy link
Contributor Author

cmaloney commented Oct 29, 2025

For the micro case there are some optimizations can do in future PRs:
1. Make construction from str with encoding= move its result bytes into the bytearray (currently copies unnecessarily)
2. Add a classmethod .from_bytes; the __init__ takes *args and **kwargs which, as far as I've found so far, makes it impossible to no-copy construct off a uniquely referenced bytes. This means patterns like ba = byearray.from_bytes(b'\01' * 4096) would no longer copy into the bytearray which should be faster

For a more macro benchmark, updating asyncio streams to use take_bytes (cmaloney@18176b9) and running pyperformance run -f -b asyncio_tcp,asyncio_tcp_ssl gives:

Benchmark main.json take_bytes_asyncio.json Change Significance
asyncio_tcp 169 ms 140 ms 1.21x faster Significant (t=27.99)
asyncio_tcp_ssl 558 ms 536 ms 1.04x faster Significant (t=6.14)

@vstinner
Copy link
Member

I wrote #140770 to add _PyByteArray_empty_string to the stable ABI.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, congrats! ✨️

I would prefer to have a second core dev reviewing a change which changes a core structure of CPython: PyByteArrayObject.

@encukou @serhiy-storchaka: Would you be interested to review this change?


Taking all bytes is a zero-copy operation.

.. list-table:: Suggested Replacements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put porting notes better in What's New, rather than in general documentation? In a few years, take_bytes won't be “new”.
The versionadded note could link to the 3.15 What's New.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End state to me is these (hopefully) get incorporated into flake8 / ruff as code improvement suggestions rather than more things to memorize. To that end maybe just bytes(bytearray()) -> .take_bytes() in what's new would be enough and rely on external pieces to socialize more (ex. I'm planning to try and teach this a bit through blogs/talks if it ships).

I slightly prefer on bytearray so someone hand-optimizing bytearray code and browsing the page for alternatives will find them. The less efficient patterns are stable Python APIs to me so it's similar to https://docs.python.org/3/library/pathlib.html#corresponding-tools and applies both in 3.15 specifically and forseeable future versions.

c = a.zfill(0x400000)
assert not c or c[-1] not in (0xdd, 0xcd)

def take_bytes(b, a):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other modifying functions have a # MODIFIES! comment.

check([clear] + [splitlines] * 10, bytearray(b'\n' * 0x400))
check([clear] + [startswith] * 10)
check([clear] + [strip] * 10)
check([clear] + [take_bytes] * 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be interesting to call a bunch of take_bytes with an argument, without clear, and check the end result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a take_bytes_n that starts with a repeating 10-byte pattern and checks either get a. IndexError / out of bounds, b. the expected byte pattern exactly

/* Object layout */
typedef struct {
PyObject_VAR_HEAD
Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as Py_SIZE(ob_bytes_object)?

Copy link
Contributor Author

@cmaloney cmaloney Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; currently reading it outside of locks always uses FT_ATOMIC_LOAD_SSIZE_RELAXED while Py_SIZE doesn't so not sure if can swap the two. In earlier iteration on this idea there were also ABI concerns about adding/removing members anywhere other than end of struct (can potentially deprecate the member likely but not remove it)

Comment on lines +252 to +253
memmove(obj->ob_bytes, obj->ob_start,
Py_MIN(requested_size, Py_SIZE(self)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider creating a new bytes object here, copying, and decrefing the old -- similar to what the old code does?
It seems that an extra memmove would defeat any benefits of _PyBytes_Resize's optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from a new bytes (malloc + memcpy) to memmove + realloc I think resulted in this pyperformance shift (asyncio streams utilize bytearray fast front deletion):
| asyncio_tcp | 182 ms | 161 ms | 1.13x faster | Significant (t=32.85) |

That 20ms in delta is similar to the speedup from changing asyncio.streams to use .take_bytes() removing one more copy (see: #140128 (comment))

If the offset is really large realloc internally may decide to do a alloc + memcpy meaning this path would be memmove + alloc + memcpy which will be slower than just doing a new bytes up front.

Can add a special case around pymalloc's rules on this if wanted:

cpython/Objects/obmalloc.c

Lines 2669 to 2681 in abd19ed

/* The block is staying the same or shrinking.
If it's shrinking, there's a tradeoff: it costs cycles to copy the
block to a smaller size class, but it wastes memory not to copy it.
The compromise here is to copy on shrink only if at least 25% of
size can be shaved off. */
if (4 * nbytes > 3 * size) {
/* It's the same, or shrinking and new/old > 3/4. */
*newptr_p = p;
return 1;
}
size = nbytes;

PyObject *it;
PyObject *(*iternext)(PyObject *);

/* First __init__; set ob_bytes_object so ob_bytes is always non-null. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ can be called several times from Python code, but also skipped:

>>> ba = bytearray.__new__(bytearray)
>>> ba.append(123)

Invariants like this need to be introduced in tp_init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tp_init I think, in PyByteArray_Type:

    bytearray___init__,                 /* tp_init */

I keep debating if it would be better in a tp_new (currently PyType_GenericNew)

cmaloney and others added 4 commits October 30, 2025 13:21
Co-authored-by: Petr Viktorin <[email protected]>
Validates take_byes(10) always either is past the end of the input or
gets exactly one run of 10 bytes at a 10 byte offset.

// Copy remaining bytes to a new bytes.
Py_ssize_t remaining_length = size - to_take;
PyObject *remaining = PyBytes_FromStringAndSize(self->ob_start + to_take,
Copy link
Contributor Author

@cmaloney cmaloney Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential optimization: Given bytearray is often used as a mutbale "holding" place while gathering bytes it's possible code will take_bytes() a very small portion of the buffer (ex. 10 bytes out of a 4096 byte buffer); if that's the case it would be more efficient to copy out the beginning of the buffer into a new bytes and advance ob_start by the 10 bytes. Exchanges a 4086 byte new alloc + 4086 byte memcpy + 10 byte memmove + resize shrinking 4086 bytes (probably is a new allocation) for a 10 byte new alloc + 10 byte memcpy + modifying a pointer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants