-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling contents added after header creation #283
base: main
Are you sure you want to change the base?
Fix handling contents added after header creation #283
Conversation
Thanks for the PR, but I think this is working as intended. As the documentation mentions if the I/O stream is not the same size as the header size then the archive will be corrupted. I don't think it's correct for this crate to use |
@alexcrichton I think the challenge with handling this (at least easily) at the application layer is that it makes it challenging to use convenience methods like Do you see there being another layer in this crate that would be appropriate to handle this concern or is it the case that the functionality from |
Oh sorry I can generalize what I'm thinking a bit from "application layer" to "callers of this function". This is almost surely a bug in |
Addresses alexcrichton#282. Signed-off-by: Grant Elbert <[email protected]>
6942937
to
9edb4c2
Compare
@alexcrichton your point about With the new approach I'm not seeing a good way to reliably get test coverage on this edge case unfortunately. But I'm of course open to any guidance on this as well. |
Yeah this is where I roughly expected a fix to go, if any. One thing this makes me wonder about though is not only file extensions but also file truncations, as presumably that would also cause issues with concurrently modified files and archiving? I have little-to-no experience with working with the filesystem while it's being concurrently modified, so I don't know what to do in the face of situations like this. One possible fix would be to open the file and then from the open file get the metadata about the length, but I don't know if the file 'snapshots' like that and if the file is appended to whether or not the changes are reflected live on the file descriptor just opened. |
ab642af
to
9edb4c2
Compare
I don't think there is going to be a great way to handle cases like truncation or writes at varying offsets in a file. For typical log rotation I don't believe that move and remove operations will present a problem as I believe the kernel allows access to continue and only does the remove once open fds are closed. Processes which know they may be working on a shared file concurrently might use file locking but that isn't the situation we are in. Even I think the proposed change is probably still worthwhile to avoid corruption in a large number of cases, even if it is not able to eliminate the (possibly intractable) problem entirely. |
Personally I'm hesitant to give a semblance that everything in this crate works with concurrent filesystem interactions because basically nothing has been written with that in mind. While things could be restructured in the case to work for the precise situation where files are only appended to that's hard to explain in the code where there's theoretically a comment saying "we handle the file extension case but the file truncation case continues to produce a corrupt tarball". Sorry to change minds but now I'm sort of thinking that this belongs externally. The goal of this crate is that all the helper functions are layered on top of one another so it's ok to call whichever layer you need, so if you find it difficult to build |
Addresses #282.
To solve the issue we limit the amount of data that can be read from the file to whatever was available at the time of creating the
Header
.