Fix handling contents added after header creation #283

elbe0046 · 2022-03-03T18:06:34Z

Addresses #282.

To solve the issue we limit the amount of data that can be read from the file to whatever was available at the time of creating the Header.

alexcrichton · 2022-03-03T20:53:02Z

Thanks for the PR, but I think this is working as intended. As the documentation mentions if the I/O stream is not the same size as the header size then the archive will be corrupted. I don't think it's correct for this crate to use .take(), instead that seems like something better done at the application layer if the input stream changes size after you originally witness the size.

posborne · 2022-03-03T21:10:00Z

@alexcrichton I think the challenge with handling this (at least easily) at the application layer is that it makes it challenging to use convenience methods like append_dir_all; the use case where we encountered these issues is one that I expect is not uncommon of providing an endpoint to stream an archive containing log files from a directory that may be receiving frequent writes.

Do you see there being another layer in this crate that would be appropriate to handle this concern or is it the case that the functionality from append_dir_all needs to be recreated elsewhere to handle this case? We are OK going this route but I expect there are a number of users of this crate that may, intermittently, end up with corrupted archives without some change.

alexcrichton · 2022-03-03T21:12:00Z

Oh sorry I can generalize what I'm thinking a bit from "application layer" to "callers of this function". This is almost surely a bug in append_dir_all and it's fine to fix there, I just don't think it should be fixed in append which takes a generic input stream

Addresses alexcrichton#282. Signed-off-by: Grant Elbert <[email protected]>

elbe0046 · 2022-03-04T00:13:26Z

@alexcrichton your point about append being the wrong layer for the fix definitely makes sense to me, thanks for that explanation! You preferred the fix to be applied in append_dir_all, but these changes are still a layer lower than that, where now the header's size is enforced within append_path_with_name and append_file. Does this seem reasonable to you? I may be missing a more obvious solution that you had in mind.

With the new approach I'm not seeing a good way to reliably get test coverage on this edge case unfortunately. But I'm of course open to any guidance on this as well.

alexcrichton · 2022-03-04T15:43:19Z

Yeah this is where I roughly expected a fix to go, if any. One thing this makes me wonder about though is not only file extensions but also file truncations, as presumably that would also cause issues with concurrently modified files and archiving?

I have little-to-no experience with working with the filesystem while it's being concurrently modified, so I don't know what to do in the face of situations like this. One possible fix would be to open the file and then from the open file get the metadata about the length, but I don't know if the file 'snapshots' like that and if the file is appended to whether or not the changes are reflected live on the file descriptor just opened.

posborne · 2022-03-04T17:56:49Z

Yeah this is where I roughly expected a fix to go, if any. One thing this makes me wonder about though is not only file extensions but also file truncations, as presumably that would also cause issues with concurrently modified files and archiving?

I don't think there is going to be a great way to handle cases like truncation or writes at varying offsets in a file. For typical log rotation I don't believe that move and remove operations will present a problem as I believe the kernel allows access to continue and only does the remove once open fds are closed. Processes which know they may be working on a shared file concurrently might use file locking but that isn't the situation we are in.

Even cp has no feature to copy a snapshot. In the case of a truncation, you'll just end up with the contents up until the point of truncation. Of course cp isn't writing the size of the dst file ahead of time in the same way as we are here.

I think the proposed change is probably still worthwhile to avoid corruption in a large number of cases, even if it is not able to eliminate the (possibly intractable) problem entirely.

alexcrichton · 2022-03-07T16:08:16Z

Personally I'm hesitant to give a semblance that everything in this crate works with concurrent filesystem interactions because basically nothing has been written with that in mind. While things could be restructured in the case to work for the precise situation where files are only appended to that's hard to explain in the code where there's theoretically a comment saying "we handle the file extension case but the file truncation case continues to produce a corrupt tarball".

Sorry to change minds but now I'm sort of thinking that this belongs externally. The goal of this crate is that all the helper functions are layered on top of one another so it's ok to call whichever layer you need, so if you find it difficult to build append_dir_all or similar externally then this could perhaps grow configuration options or similar to make it easier to build externally as well.

elbe0046 mentioned this pull request Mar 3, 2022

Fix handling contents added after header creation PhysicalGraph/tar-rs#1

Merged

elbe0046 closed this Mar 3, 2022

Fix handling contents added after header creation

9edb4c2

Addresses alexcrichton#282. Signed-off-by: Grant Elbert <[email protected]>

elbe0046 mentioned this pull request Mar 3, 2022

Fix handling contents added after header creation PhysicalGraph/tar-rs#2

Merged

elbe0046 reopened this Mar 3, 2022

elbe0046 force-pushed the fix-contents-added-after-creation branch from 6942937 to 9edb4c2 Compare March 3, 2022 23:43

elbe0046 force-pushed the fix-contents-added-after-creation branch from ab642af to 9edb4c2 Compare March 4, 2022 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling contents added after header creation #283

Fix handling contents added after header creation #283

elbe0046 commented Mar 3, 2022

alexcrichton commented Mar 3, 2022

posborne commented Mar 3, 2022

alexcrichton commented Mar 3, 2022

elbe0046 commented Mar 4, 2022

alexcrichton commented Mar 4, 2022

posborne commented Mar 4, 2022

alexcrichton commented Mar 7, 2022

Fix handling contents added after header creation #283

Are you sure you want to change the base?

Fix handling contents added after header creation #283

Conversation

elbe0046 commented Mar 3, 2022

alexcrichton commented Mar 3, 2022

posborne commented Mar 3, 2022

alexcrichton commented Mar 3, 2022

elbe0046 commented Mar 4, 2022

alexcrichton commented Mar 4, 2022

posborne commented Mar 4, 2022

alexcrichton commented Mar 7, 2022