Skip to content
Open
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
daf8e02
gh-139871: Update bytearray to contain PyBytesObject
cmaloney Oct 3, 2025
39b2d15
📜🤖 Added by blurb_it.
blurb-it[bot] Oct 14, 2025
86faf1d
Add bytearray.take_bytes
cmaloney Oct 3, 2025
a9328f4
Merge branch 'main' into bytearray_bytes
cmaloney Oct 15, 2025
e9f5ca9
Update Objects/bytearrayobject.c
cmaloney Oct 15, 2025
4784957
Review fixes
cmaloney Oct 15, 2025
db19def
Merge branch 'main' into bytearray_bytes
cmaloney Oct 15, 2025
451c302
Update Objects/bytearrayobject.c
cmaloney Oct 17, 2025
bab7151
Add tests around alloc and getsizeof that show clearing isn't working…
cmaloney Oct 15, 2025
cb2377c
Fix resizing to 0 length / clearing leaving one byte alloc
cmaloney Oct 15, 2025
20175f8
review fix: handle NULL return from from PyBytes_FromStringAndSize
cmaloney Oct 18, 2025
e485595
Add take_bytes to test_free_threading
cmaloney Oct 18, 2025
b5535d0
Missed line...
cmaloney Oct 18, 2025
7c6e8a8
Simplify getting out ob_bytes
cmaloney Oct 18, 2025
4e27d13
Include PyBytesObject in __alloc__ of bytearray.
cmaloney Oct 18, 2025
9887dad
Apply suggestion from @vstinner
cmaloney Oct 27, 2025
6e4b910
Don't multiply by sizeof(char) as it's always 1
cmaloney Oct 27, 2025
b6f8403
Rely on bytes for end of buffer NULL
cmaloney Oct 27, 2025
28cb8c5
Personal review fixes
cmaloney Oct 27, 2025
f03b895
Simplify resize error handling
cmaloney Oct 27, 2025
a45f3c2
Use right PyLong constructor
cmaloney Oct 27, 2025
c8943e3
Add a define for max bytearray size, comment size=0
cmaloney Oct 29, 2025
5bffb7e
Remove oold comment
cmaloney Oct 29, 2025
583ea4b
Update test_capi.test_bytearray for MemoryError vs OverflowError
cmaloney Oct 29, 2025
8ee14e6
More accurate size and alloc calculation
cmaloney Oct 29, 2025
d70e369
Comment and minor doc tweaks
cmaloney Oct 29, 2025
97be818
Apply suggestion from @vstinner
cmaloney Oct 29, 2025
99e49ef
Remove _PyByteArray_empty_string, add bytearray_reinit_from_bytes
cmaloney Oct 29, 2025
f4b62d9
Update Stable API concerns: restore _empty_string an dmove _PyBytesOb…
cmaloney Oct 29, 2025
48afb62
Restore _PyByteArray_empty_string in .c file
cmaloney Oct 29, 2025
8c81e03
remove line that shouldn't have been added
cmaloney Oct 29, 2025
313e78c
Apply suggestion from @encukou
cmaloney Oct 30, 2025
c028e2b
Remove original variable, no longer used
cmaloney Oct 30, 2025
a69b338
remove _PyByteArray_empty_string
cmaloney Oct 30, 2025
2a95118
Add take_bytes_n free-threading test
cmaloney Oct 30, 2025
9680e8a
Expand comment for ob_alloc
cmaloney Oct 31, 2025
02882af
Add note on memmove tradeoff
cmaloney Oct 31, 2025
b67d10c
Move suggested optimizing refactors to whatsnew
cmaloney Oct 31, 2025
6db8822
Remove unintended change
cmaloney Oct 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3173,6 +3173,92 @@ objects.

.. versionadded:: 3.14

.. method:: take_bytes(n=None, /)

Take the first *n* bytes as an immutable :class:`bytes`. Defaults to all
bytes.

If *n* is negative indexes from the end and takes the first :func:`len`
plus *n* bytes. If *n* is out of bounds raises :exc:`IndexError`.

Taking less than the full length will leave remaining bytes in the
:class:`bytearray` which requires a copy. If the remaining bytes should be
discarded use :func:`~bytearray.resize` or :keyword:`del` to truncate
then :func:`~bytearray.take_bytes` without a size.

.. impl-detail::

Taking all bytes is a zero-copy operation.

.. list-table:: Suggested Replacements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put porting notes better in What's New, rather than in general documentation? In a few years, take_bytes won't be “new”.
The versionadded note could link to the 3.15 What's New.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End state to me is these (hopefully) get incorporated into flake8 / ruff as code improvement suggestions rather than more things to memorize. To that end maybe just bytes(bytearray()) -> .take_bytes() in what's new would be enough and rely on external pieces to socialize more (ex. I'm planning to try and teach this a bit through blogs/talks if it ships).

I slightly prefer on bytearray so someone hand-optimizing bytearray code and browsing the page for alternatives will find them. The less efficient patterns are stable Python APIs to me so it's similar to https://docs.python.org/3/library/pathlib.html#corresponding-tools and applies both in 3.15 specifically and forseeable future versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the table can be improved here, but as it is it looks out of place to me.

End state to me is these (hopefully) get incorporated into flake8 / ruff as code improvement suggestions rather than more things to memorize.

Exactly -- and the more ephemeral-looking What's New seems better for that.

someone hand-optimizing bytearray code and browsing the page for alternatives will find them.

That's why I suggested the link to What's New :)

to me so it's similar to https://docs.python.org/3/library/pathlib.html#corresponding-tools

IMO, that's a very different thing: those docs say “pathlib is not a drop-in replacement for os.path”. It's a two-way mapping, not “old→new” porting suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Still looking at / experimenting with this one, changes for others comments pushed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to whatsnew; not fully satisfied with my wording but also assuming will rework over time (ex. reference stdlib improvements as they get added)

:header-rows: 1

* - Description
- Old
- New

* - Return :class:`bytes` after working with :class:`bytearray`
- .. code:: python


def read() -> bytes:
buffer = bytearray(1024)
...
return bytes(buffer)
- .. code:: python

def read() -> bytes:
buffer = bytearray(1024)
...
return buffer.take_bytes()

* - Empty a buffer getting the bytes
- .. code:: python

buffer = bytearray(1024)
...
data = bytes(buffer)
buffer.clear()
- .. code:: python

buffer = bytearray(1024)
...
data = buffer.take_bytes()

* - Split a buffer at a specific separator
- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = bytes(buffer[:n + 1])
del buffer[:n + 1]
assert buffer == bytearray(b'def')

- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = buffer.take_bytes(n + 1)

* - Split a buffer at a specific separator; discard after the separator
- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = bytes(buffer[:n])
buffer.clear()
assert data == b'abc'
assert len(buffer) == 0

- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
buffer.resize(n)
data = buffer.take_bytes()

.. versionadded:: next

Since bytearray objects are sequences of integers (akin to a list), for a
bytearray object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be
a bytearray object of length 1. (This contrasts with text strings, where
Expand Down
1 change: 1 addition & 0 deletions Include/cpython/bytearrayobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ typedef struct {
char *ob_bytes; /* Physical backing buffer */
char *ob_start; /* Logical start inside ob_bytes */
Py_ssize_t ob_exports; /* How many buffer exports */
PyObject *ob_bytes_object; /* PyBytes for zero-copy bytes conversion */
} PyByteArrayObject;

PyAPI_DATA(char) _PyByteArray_empty_string[];
Expand Down
7 changes: 7 additions & 0 deletions Include/cpython/bytesobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ _PyBytes_Join(PyObject *sep, PyObject *iterable)
return PyBytes_Join(sep, iterable);
}

/* _PyBytesObject_SIZE gives the basic size of a bytes object; any memory allocation
for a bytes object of length n should request PyBytesObject_SIZE + n bytes.

Using _PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves
3 or 7 bytes per bytes object allocation on a typical system.
*/
#define _PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1)

// --- PyBytesWriter API -----------------------------------------------------

Expand Down
71 changes: 71 additions & 0 deletions Lib/test/test_bytes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1390,6 +1390,16 @@ def test_clear(self):
b.append(ord('p'))
self.assertEqual(b, b'p')

# Cleared object should be empty.
b = bytearray(b'abc')
b.clear()
self.assertEqual(b.__alloc__(), 0)
base_size = sys.getsizeof(bytearray())
self.assertEqual(sys.getsizeof(b), base_size)
c = b.copy()
self.assertEqual(c.__alloc__(), 0)
self.assertEqual(sys.getsizeof(c), base_size)

def test_copy(self):
b = bytearray(b'abc')
bb = b.copy()
Expand Down Expand Up @@ -1451,6 +1461,61 @@ def test_resize(self):
self.assertRaises(MemoryError, bytearray().resize, sys.maxsize)
self.assertRaises(MemoryError, bytearray(1000).resize, sys.maxsize)

def test_take_bytes(self):
ba = bytearray(b'ab')
self.assertEqual(ba.take_bytes(), b'ab')
self.assertEqual(len(ba), 0)
self.assertEqual(ba, bytearray(b''))
self.assertEqual(ba.__alloc__(), 0)
base_size = sys.getsizeof(bytearray())
self.assertEqual(sys.getsizeof(ba), base_size)

# Positive and negative slicing.
ba = bytearray(b'abcdef')
self.assertEqual(ba.take_bytes(1), b'a')
self.assertEqual(ba, bytearray(b'bcdef'))
self.assertEqual(len(ba), 5)
self.assertEqual(ba.take_bytes(-5), b'')
self.assertEqual(ba, bytearray(b'bcdef'))
self.assertEqual(len(ba), 5)
self.assertEqual(ba.take_bytes(-3), b'bc')
self.assertEqual(ba, bytearray(b'def'))
self.assertEqual(len(ba), 3)
self.assertEqual(ba.take_bytes(3), b'def')
self.assertEqual(ba, bytearray(b''))
self.assertEqual(len(ba), 0)

# Take nothing from emptiness.
self.assertEqual(ba.take_bytes(0), b'')
self.assertEqual(ba.take_bytes(), b'')
self.assertEqual(ba.take_bytes(None), b'')

# Out of bounds, bad take value.
self.assertRaises(IndexError, ba.take_bytes, -1)
self.assertRaises(TypeError, ba.take_bytes, 3.14)
ba = bytearray(b'abcdef')
self.assertRaises(IndexError, ba.take_bytes, 7)

# Offset between physical and logical start (ob_bytes != ob_start).
ba = bytearray(b'abcde')
del ba[:2]
self.assertEqual(ba, bytearray(b'cde'))
self.assertEqual(ba.take_bytes(), b'cde')

# Overallocation at end.
ba = bytearray(b'abcde')
del ba[-2:]
self.assertEqual(ba, bytearray(b'abc'))
self.assertEqual(ba.take_bytes(), b'abc')
ba = bytearray(b'abcde')
ba.resize(4)
self.assertEqual(ba.take_bytes(), b'abcd')

# Take of a bytearray with references should fail.
ba = bytearray(b'abc')
with memoryview(ba) as mv:
self.assertRaises(BufferError, ba.take_bytes)
self.assertEqual(ba.take_bytes(), b'abc')

def test_setitem(self):
def setitem_as_mapping(b, i, val):
Expand Down Expand Up @@ -2557,6 +2622,11 @@ def zfill(b, a):
c = a.zfill(0x400000)
assert not c or c[-1] not in (0xdd, 0xcd)

def take_bytes(b, a):
b.wait()
c = a.take_bytes()
assert not c or c[0] == 48 # '0'

def check(funcs, a=None, *args):
if a is None:
a = bytearray(b'0' * 0x400000)
Expand Down Expand Up @@ -2617,6 +2687,7 @@ def check(funcs, a=None, *args):
check([clear] + [splitlines] * 10, bytearray(b'\n' * 0x400))
check([clear] + [startswith] * 10)
check([clear] + [strip] * 10)
check([clear] + [take_bytes] * 10)

check([clear] + [contains] * 10)
check([clear] + [subscript] * 10)
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test_sys.py
Original file line number Diff line number Diff line change
Expand Up @@ -1583,7 +1583,7 @@ def test_objecttypes(self):
samples = [b'', b'u'*100000]
for sample in samples:
x = bytearray(sample)
check(x, vsize('n2Pi') + x.__alloc__())
check(x, vsize('n2PiP') + x.__alloc__())
# bytearray_iterator
check(iter(bytearray()), size('nP'))
# bytes
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Update :class:`bytearray` to use a :class:`bytes` under the hood as its buffer
and add :func:`bytearray.take_bytes` to take it out.
Loading
Loading