Skip to content

Conversation

@FabHof
Copy link
Contributor

@FabHof FabHof commented Oct 21, 2025

The DecompressingHashReader now actually uses the reader.buffer and let's the interface handle the rest.

@FabHof FabHof marked this pull request as ready for review October 21, 2025 22:19
bhansconnect
bhansconnect previously approved these changes Oct 22, 2025
Copy link
Member

@bhansconnect bhansconnect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of comments about the interface as a whole. Maybe worth addressing, but I think they also apply to the original code. So I don't think they necessarily should block this PR. Would be curious to here answers though around the design.

var actual_hash: [32]u8 = undefined;
self.hasher.final(&actual_hash);
if (std.mem.eql(u8, &actual_hash, &self.expected_hash)) {
self.hash_verified = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda surprised this is just a bool on the struct and not an error state if unverified at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we could return a read error here.

pub fn verifyComplete(self: *Self) !void {
// Read any remaining data to ensure we process the entire stream
var discard_buffer: [1024]u8 = undefined;
while (true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still need a while(true)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because discard does not have to discard everything up to the limit. But i found ther is an discardRemaining that already does the loop, so I will use that one.

while (true) {
const n = try self.read(&discard_buffer);
if (n == 0) break;
_ = self.interface.discard(std.Io.Limit.unlimited) catch |err| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more robust to error if any data is left and ensure the caller reads all data before calling this function? This feels potentially error prone. Better to make it explicit for the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep it this way for now. The current bundler depends on it. I think the tar iter does not read unit the end of the stream, so it's simplest to do that in the verifyComplete

self.allocator_ptr.free(self.interface.buffer);
}

fn stream(r: *std.Io.Reader, w: *std.Io.Writer, limit: std.Io.Limit) std.Io.Reader.StreamError!usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand this correctly. This means it never returns any data. So this only ever calculates the hash? This can never return decompressed data to a caller. So this really is like a form of final consumer that thats a reader and creates a hash. If so, what does it use this api? Why not just a function that takes a reader, and then computes the hash. This feels like misuse of this api, but maybe I don't understand some part of how it is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the point is that this stream function is just called by the reader interface and not directly. The reader interface will first look at the buffer, and then calls this stream function. So if this function only fills the internal buffer and but not the writer, it will still work, because the interface function will use the data in the buffer.

More details: https://github.com/ziglang/zig/blob/eef8deb918210a9d9d0732ef523b20e2cb963f01/lib/std/Io/Reader.zig#L42

@FabHof FabHof marked this pull request as draft October 22, 2025 14:03
@FabHof
Copy link
Contributor Author

FabHof commented Oct 22, 2025

I think I have a bug somewhere, I need to investigate again.

@FabHof FabHof marked this pull request as ready for review October 22, 2025 14:32
@FabHof
Copy link
Contributor Author

FabHof commented Oct 22, 2025

@bhansconnect I found a bug in the CompressingHashWriter that I also fixed now.

And we are now actually writing to the writer. It's not actually more complex than writing to the internal buffer and probably aligns more with how it is supposed to be used.

Sorry if I spammed you with my changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants