-
Notifications
You must be signed in to change notification settings - Fork 53
TAR: Implement extraction and archiving of hardlinks. #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
My intention with this is to be able to create TAR archives with hardlinks with maven-assembly-plugin. However, I am not sure if this is the best possible way to do this. I am not sure if it fits right into the PlexusIoResource framework. |
plamentotev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing. Hard links is something we still missing to better support tar files.
This change does not cover some edge cases. Lets say file and link are file and hard link in a tar file. In the most straight forward case we are going to extract both files and everything will work fine. But there are other cases.
We can extract only link. If file is not present then I'm not sure the proper exception is caught in order to just create the file. But more importantly if file do exist on the file system, a hard link to it would be created instead of creating new file.
Plexus Archiver have a feature named file mappers. So after extracting file may be called someFile. What we should do in such cases?
Also does the order of the entries matter. Is it possible that link is extracted before file and in such case would file be hard link to link?
| if (entry.getType() == ArchiveEntry.SYMLINK) { | ||
| final SymlinkDestinationSupplier plexusIoSymlinkResource = | ||
| (SymlinkDestinationSupplier) entry.getResource(); | ||
| final SymlinkDestinationSupplier plexusIoSymlinkResource = (SymlinkDestinationSupplier) ioResource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to hard links? It seems that it is just refactoring to avoid calling entry.getResource() multiple times. If this is the case I would rather have this in separate PR or at least separate commit as the change is already complex enough to understand on its own.
| final Path file = fileResource.getFile().toPath(); | ||
| if (Files.exists(file)) { | ||
| final BasicFileAttributeView fileAttributeView = | ||
| Files.getFileAttributeView(file, BasicFileAttributeView.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather use LinkOption.NOFOLLOW_LINKS to be on the safe side, although it seems that symbolic links are already handled (haven't tested it).
| } | ||
| } | ||
|
|
||
| boolean doCopy = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this name confusing. isLink, or something else, probably would be more appropriate as it seems that the flag indicates whether the entry is link (symbolic or hard) to some other entry.
| if (Files.exists(file)) { | ||
| final BasicFileAttributeView fileAttributeView = | ||
| Files.getFileAttributeView(file, BasicFileAttributeView.class); | ||
| if (fileAttributeView != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to extract this check to separate method to make this method easier to read.
| } | ||
| } | ||
| } | ||
| if (te == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I don't follow why this is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
te is assigned either from the symlink branch or the hardlink branch. But the hardlink branch can also not assign value at all. This handles that case plus all other cases of non-symlink&non-hardlink.
| String entryName, | ||
| final Date entryDate, | ||
| final boolean isDirectory, | ||
| final boolean isSymlink, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking change. I wonder if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how.
|
Also we need to be extra careful when extracting multiple copies instead of creating hard links. If a tar file contains single 1 MB file and thousands of hard links, then extracted (if files are copied) could be 1 GB, possibly resulting in DoS attack. |
doCopy -> isLink
Use NOFOLLOW_LINKS to get file attributes for hardlink.
|
FYI, I will not be pursuing this pull request further. I was curious if it could be done. But it is unclear to me how to do this so that it matches all the requirements and design criteria of the library. Feel free to pick it up or close it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements hardlink support for TAR archives, enabling both extraction and creation of hardlinks in TAR files. The implementation detects hardlinks during archiving by tracking file keys, creates appropriate TAR entries, and properly extracts them by creating actual hardlinks on the filesystem (with fallback to file copying when hardlink creation is unsupported).
- Added hardlink detection during TAR archiving using file key tracking
- Modified extraction logic to create hardlinks when encountered in TAR archives
- Added comprehensive test coverage for the hardlink round-trip scenario
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/org/codehaus/plexus/archiver/tar/TarArchiver.java | Implements hardlink detection using file keys and creates LF_LINK entries; adds preserveHardLinks option to TarOptions |
| src/main/java/org/codehaus/plexus/archiver/tar/TarUnArchiver.java | Updates extraction to detect both symlinks and hardlinks, passing link destination to extractFile method |
| src/main/java/org/codehaus/plexus/archiver/AbstractUnArchiver.java | Adds hardlink creation logic with UnsupportedOperationException handling and copy fallback; adds isSymlink parameter to distinguish symlinks from hardlinks |
| src/main/java/org/codehaus/plexus/archiver/zip/AbstractZipUnArchiver.java | Updates extractFile call to include new isSymlink parameter |
| src/test/java/org/codehaus/plexus/archiver/HardlinkTest.java | Adds test verifying hardlink preservation through extraction and re-archiving |
| src/test/java/org/codehaus/plexus/archiver/tar/TarFileTest.java | Updates to skip hardlinks when iterating entries and improves assertion method |
| src/test/java/org/codehaus/plexus/archiver/AbstractUnArchiverTest.java | Updates extractFile method call to include new isSymlink parameter |
| src/test/resources/symlinks/regen.sh | Adds commands to generate hardlinks test archive |
| src/test/resources/hardlinks/hardlinks.tar | Binary TAR archive containing hardlink test data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| private TarArchiveOutputStream tOut; | ||
|
|
||
| private final Map<Object, String> seenFiles = new HashMap<>(10); |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seenFiles map is never cleared between archive creation operations. If a TarArchiver instance is reused to create multiple archives, the map will contain stale file keys from previous archives, potentially causing incorrect hardlink references across different archive creation operations. Consider clearing the map at the beginning of the execute() method or in the cleanUp() method.
| try { | ||
| Files.createLink( | ||
| targetFileName.toPath(), | ||
| FileUtils.resolveFile(dir, linkDestination).toPath()); | ||
| doCopy = false; | ||
| } catch (final UnsupportedOperationException ex) { | ||
| if (warnCannotHardlink) { | ||
| getLogger().warn("Creating hardlinks is not supported"); | ||
| warnCannotHardlink = false; | ||
| } | ||
| // We will do a copy instead. | ||
| } |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardlink creation code only catches UnsupportedOperationException but Files.createLink() can throw IOException for various error conditions (e.g., file already exists, insufficient permissions, I/O errors). These IOExceptions will propagate up and may not be handled appropriately, potentially causing the extraction to fail when it could fall back to copying the file content instead. Consider also catching IOException and falling back to copying in that case.
| boolean doCopy = true; | ||
| if (!StringUtils.isEmpty(linkDestination)) { | ||
| if (isSymlink) { | ||
| SymlinkUtils.createSymbolicLink(targetFileName, new File(linkDestination)); | ||
| doCopy = false; | ||
| } else { | ||
| try { | ||
| Files.createLink( | ||
| targetFileName.toPath(), | ||
| FileUtils.resolveFile(dir, linkDestination).toPath()); | ||
| doCopy = false; | ||
| } catch (final UnsupportedOperationException ex) { | ||
| if (warnCannotHardlink) { | ||
| getLogger().warn("Creating hardlinks is not supported"); | ||
| warnCannotHardlink = false; | ||
| } | ||
| // We will do a copy instead. | ||
| } | ||
| } | ||
| } else if (isDirectory) { | ||
| targetFileName.mkdirs(); | ||
| } else { | ||
| doCopy = false; | ||
| } |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardlink creation logic has several issues:
- Only
UnsupportedOperationExceptionis caught, butFiles.createLink()can throwIOExceptionfor various error conditions (e.g., file already exists, insufficient permissions, I/O errors), which should also be caught to fall back to copying. - After creating links (hardlinks or symlinks), the code continues to execute outside this block and will call
setLastModified()andchmod()on the link at lines 347 and 350. On most platforms, these operations on links affect the target file, not the link itself, which is likely not the intended behavior.
Consider catching IOException in addition to UnsupportedOperationException, and tracking whether a link was created to avoid calling setLastModified() and chmod() on links.
This implements hardlinks in TAR. There is also a test to prove it.