-
-
Notifications
You must be signed in to change notification settings - Fork 349
Simplify DecompressingHashReader #8314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
66bfeca to
6159ea3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of comments about the interface as a whole. Maybe worth addressing, but I think they also apply to the original code. So I don't think they necessarily should block this PR. Would be curious to here answers though around the design.
src/bundle/streaming_reader.zig
Outdated
| var actual_hash: [32]u8 = undefined; | ||
| self.hasher.final(&actual_hash); | ||
| if (std.mem.eql(u8, &actual_hash, &self.expected_hash)) { | ||
| self.hash_verified = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda surprised this is just a bool on the struct and not an error state if unverified at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we could return a read error here.
src/bundle/streaming_reader.zig
Outdated
| pub fn verifyComplete(self: *Self) !void { | ||
| // Read any remaining data to ensure we process the entire stream | ||
| var discard_buffer: [1024]u8 = undefined; | ||
| while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need a while(true)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because discard does not have to discard everything up to the limit. But i found ther is an discardRemaining that already does the loop, so I will use that one.
src/bundle/streaming_reader.zig
Outdated
| while (true) { | ||
| const n = try self.read(&discard_buffer); | ||
| if (n == 0) break; | ||
| _ = self.interface.discard(std.Io.Limit.unlimited) catch |err| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more robust to error if any data is left and ensure the caller reads all data before calling this function? This feels potentially error prone. Better to make it explicit for the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to keep it this way for now. The current bundler depends on it. I think the tar iter does not read unit the end of the stream, so it's simplest to do that in the verifyComplete
| self.allocator_ptr.free(self.interface.buffer); | ||
| } | ||
|
|
||
| fn stream(r: *std.Io.Reader, w: *std.Io.Writer, limit: std.Io.Limit) std.Io.Reader.StreamError!usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand this correctly. This means it never returns any data. So this only ever calculates the hash? This can never return decompressed data to a caller. So this really is like a form of final consumer that thats a reader and creates a hash. If so, what does it use this api? Why not just a function that takes a reader, and then computes the hash. This feels like misuse of this api, but maybe I don't understand some part of how it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the point is that this stream function is just called by the reader interface and not directly. The reader interface will first look at the buffer, and then calls this stream function. So if this function only fills the internal buffer and but not the writer, it will still work, because the interface function will use the data in the buffer.
More details: https://github.com/ziglang/zig/blob/eef8deb918210a9d9d0732ef523b20e2cb963f01/lib/std/Io/Reader.zig#L42
|
I think I have a bug somewhere, I need to investigate again. |
|
@bhansconnect I found a bug in the CompressingHashWriter that I also fixed now. And we are now actually writing to the writer. It's not actually more complex than writing to the internal buffer and probably aligns more with how it is supposed to be used. Sorry if I spammed you with my changes. |
The DecompressingHashReader now actually uses the reader.buffer and let's the interface handle the rest.