Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.mem: Add packedArrayByteLen function (and use it) #23406

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

squeek502
Copy link
Collaborator

@squeek502 squeek502 commented Mar 30, 2025

Context: #21682

This is a helper function intended to make using writePackedInt/readPackedInt to write to/read from packed arrays a bit less error prone. The math involved is easy, but a naive implementation will overflow when calculating a valid length. For example:

std.math.divCeil(usize, @bitSizeOf(u4) * some_slice.len, byte_size_in_bits)

This will overflow during the multiplication if some_slice.len is >= maxInt(usize) / 4 + 1, even though the final calculated byte count would be able to fit in a usize.

When using this newly added function, error.Overflow is returned only when the final byte count can't fit in a usize.

Comment on lines 392 to 393
const if_kind_buf_size = comptime mem.packedArrayByteLen(u2, 256) catch unreachable;
var if_kind: [if_kind_buf_size]u8 = .{0} ** if_kind_buf_size;
Copy link
Collaborator Author

@squeek502 squeek502 Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that this code is out-of-sync with upstream Aro (Vexu/arocc#787). Will submit an upstream PR if this is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would var if_kind: [if_kind_buf_size]u8 = @splat(0); be better here?

This is a helper function intended to make using writePackedInt/readPackedInt to write to/read from packed arrays a bit less error prone. The math involved is easy, but a naive implementation will overflow when calculating a valid length. For example:

    std.math.divCeil(usize, @bitSizeOf(u4) * some_slice.len, byte_size_in_bits)

This will overflow during the multiplication if `some_slice.len` is >= `maxInt(usize) / 4 + 1`, even though the final calculated byte count would be able to fit in a `usize`.

When using this newly added function, `error.Overflow` is returned only when the final byte count can't fit in a `usize`.
@aqrit
Copy link

aqrit commented Apr 5, 2025

This PR optimizes for maintainability since packedArrayByteLen is likely on a cold path
(which is good and correct).

However since I'm thinking about it...,
A widening multiply is probably only good for sizes u3, or u5.
Finding the array size of u4, u6, u7 types can utilize a shortcut that requires just two instructions.

pub fn packedArrayByteLen(comptime T: type, num_elements: usize) error{Overflow}!usize {
    if (@bitSizeOf(T) == 0) return 0;
    const max_num_elements: comptime_int = (std.math.maxInt(usize) * 8) / @bitSizeOf(T);
    if (num_elements > max_num_elements) return error.Overflow;

    const whole_bytes: usize = num_elements *% (@bitSizeOf(T) / 8);
    const part_bytes: usize = switch (@as(u3, @truncate(@bitSizeOf(T)))) {
        0 => 0,
        // 1 => (num_elements >> 3) +% @intFromBool(((num_elements & 7) != 0)),
        1 => ((num_elements -% (num_elements >> 1)) +% 0x3) >> 2,
        2 => ((num_elements -% (num_elements >> 1)) +% 0x1) >> 1,
        3 => ((num_elements >> 3) *% 3) +%
            ((((num_elements & 0x07) *% 3) +% 0x07) >> 3),
        4 => num_elements -% (num_elements >> 1),
        5 => ((num_elements >> 3) *% 5) +%
            ((((num_elements & 0x07) *% 5) +% 0x07) >> 3),
        6 => num_elements -% (num_elements >> 2),
        7 => num_elements -% (num_elements >> 3),
    };
    return whole_bytes +% part_bytes;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants