-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix data-loss issue with replays w/o checksums #689
Conversation
5967cbf
to
ee27d64
Compare
I would suggest to move this to its own issue as a future improvement, so it doesn't get forgotten? |
ee27d64
to
e99b001
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried several extended cases. All were fine 👍
I did bump into an issue with symbolic/soft links but can't find tests to explain how the behavior is expected to be.
If you take the error case and use soft instead of hard link the duplicate file is removed but the soft link is not. If you have many of those you are left with a lot dead links. That doesn't seem expected.
I note it here but I assume it needs its own issue? Unless this is intended?
I will test it a bit later, in the meantime I do not merge this fix. |
Can you details the commands that you used to generate this case ? I'm not sure I understand. |
Create x number of duplicate files and define soft link (the error case uses hard links).
Run
Run
Final result
I would expect the soft links of the deleted files to be removed as well. |
Would a subsequent pass detect the broken links? |
It does.. but that was a bit the base of my question. Should you expect to run it again to do some additional cleaning or would it be more logical that the first clean also cleans the soft links? I couldn't find conclusive information on how this is supposed to be treated. It is not a big issue but I can image that some users after running the generated script assume things have been cleared. |
e99b001
to
36341a9
Compare
That's an interesting point, but unrelated to this bug that concerns replays exclusively. We could do two passes and check for dangling links but I'm not sure if that would be a bug fix or a feature. |
Me neither :grin |
Closes: #672
See bug details in #672 (comment).
This is just a bugfix, but I feel we could fix a bit more, although I'm not sure if that wouldn't introduce a regression. I haven't checked all the consumers of
file->digest
.In
replay.c
we do:I would do instead:
And reintroduce checks for
file->digest == NULL
; that would avoid pointless allocations.Another way to be safe would be to check for
file->inode
to confirm that they belong to the same group.Are hardlinks the only situation when rmlint does not generate checksums ?
PS: @sahib if you still follow this repository, the bird theme is nice. It's a change from the overly formal naming in code of other projects :)