Skip to content

Conversation

@wilx
Copy link

@wilx wilx commented May 25, 2023

This implements hardlinks in TAR. There is also a test to prove it.

@wilx
Copy link
Author

wilx commented May 25, 2023

My intention with this is to be able to create TAR archives with hardlinks with maven-assembly-plugin.

However, I am not sure if this is the best possible way to do this. I am not sure if it fits right into the PlexusIoResource framework.

@wilx wilx force-pushed the master-hardlinks branch from 1ffb0f5 to 913783b Compare May 25, 2023 19:28
Copy link

@plamentotev plamentotev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing. Hard links is something we still missing to better support tar files.

This change does not cover some edge cases. Lets say file and link are file and hard link in a tar file. In the most straight forward case we are going to extract both files and everything will work fine. But there are other cases.

We can extract only link. If file is not present then I'm not sure the proper exception is caught in order to just create the file. But more importantly if file do exist on the file system, a hard link to it would be created instead of creating new file.

Plexus Archiver have a feature named file mappers. So after extracting file may be called someFile. What we should do in such cases?

Also does the order of the entries matter. Is it possible that link is extracted before file and in such case would file be hard link to link?

if (entry.getType() == ArchiveEntry.SYMLINK) {
final SymlinkDestinationSupplier plexusIoSymlinkResource =
(SymlinkDestinationSupplier) entry.getResource();
final SymlinkDestinationSupplier plexusIoSymlinkResource = (SymlinkDestinationSupplier) ioResource;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to hard links? It seems that it is just refactoring to avoid calling entry.getResource() multiple times. If this is the case I would rather have this in separate PR or at least separate commit as the change is already complex enough to understand on its own.

final Path file = fileResource.getFile().toPath();
if (Files.exists(file)) {
final BasicFileAttributeView fileAttributeView =
Files.getFileAttributeView(file, BasicFileAttributeView.class);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use LinkOption.NOFOLLOW_LINKS to be on the safe side, although it seems that symbolic links are already handled (haven't tested it).

}
}

boolean doCopy = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this name confusing. isLink, or something else, probably would be more appropriate as it seems that the flag indicates whether the entry is link (symbolic or hard) to some other entry.

if (Files.exists(file)) {
final BasicFileAttributeView fileAttributeView =
Files.getFileAttributeView(file, BasicFileAttributeView.class);
if (fileAttributeView != null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to extract this check to separate method to make this method easier to read.

}
}
}
if (te == null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don't follow why this is added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

te is assigned either from the symlink branch or the hardlink branch. But the hardlink branch can also not assign value at all. This handles that case plus all other cases of non-symlink&non-hardlink.

String entryName,
final Date entryDate,
final boolean isDirectory,
final boolean isSymlink,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking change. I wonder if we can avoid it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how.

@plamentotev
Copy link

plamentotev commented May 27, 2023

Also we need to be extra careful when extracting multiple copies instead of creating hard links. If a tar file contains single 1 MB file and thousands of hard links, then extracted (if files are copied) could be 1 GB, possibly resulting in DoS attack.

@wilx
Copy link
Author

wilx commented Aug 21, 2023

FYI, I will not be pursuing this pull request further. I was curious if it could be done. But it is unclear to me how to do this so that it matches all the requirements and design criteria of the library. Feel free to pick it up or close it.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements hardlink support for TAR archives, enabling both extraction and creation of hardlinks in TAR files. The implementation detects hardlinks during archiving by tracking file keys, creates appropriate TAR entries, and properly extracts them by creating actual hardlinks on the filesystem (with fallback to file copying when hardlink creation is unsupported).

  • Added hardlink detection during TAR archiving using file key tracking
  • Modified extraction logic to create hardlinks when encountered in TAR archives
  • Added comprehensive test coverage for the hardlink round-trip scenario

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/main/java/org/codehaus/plexus/archiver/tar/TarArchiver.java Implements hardlink detection using file keys and creates LF_LINK entries; adds preserveHardLinks option to TarOptions
src/main/java/org/codehaus/plexus/archiver/tar/TarUnArchiver.java Updates extraction to detect both symlinks and hardlinks, passing link destination to extractFile method
src/main/java/org/codehaus/plexus/archiver/AbstractUnArchiver.java Adds hardlink creation logic with UnsupportedOperationException handling and copy fallback; adds isSymlink parameter to distinguish symlinks from hardlinks
src/main/java/org/codehaus/plexus/archiver/zip/AbstractZipUnArchiver.java Updates extractFile call to include new isSymlink parameter
src/test/java/org/codehaus/plexus/archiver/HardlinkTest.java Adds test verifying hardlink preservation through extraction and re-archiving
src/test/java/org/codehaus/plexus/archiver/tar/TarFileTest.java Updates to skip hardlinks when iterating entries and improves assertion method
src/test/java/org/codehaus/plexus/archiver/AbstractUnArchiverTest.java Updates extractFile method call to include new isSymlink parameter
src/test/resources/symlinks/regen.sh Adds commands to generate hardlinks test archive
src/test/resources/hardlinks/hardlinks.tar Binary TAR archive containing hardlink test data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


private TarArchiveOutputStream tOut;

private final Map<Object, String> seenFiles = new HashMap<>(10);
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seenFiles map is never cleared between archive creation operations. If a TarArchiver instance is reused to create multiple archives, the map will contain stale file keys from previous archives, potentially causing incorrect hardlink references across different archive creation operations. Consider clearing the map at the beginning of the execute() method or in the cleanUp() method.

Copilot uses AI. Check for mistakes.
Comment on lines +324 to +335
try {
Files.createLink(
targetFileName.toPath(),
FileUtils.resolveFile(dir, linkDestination).toPath());
doCopy = false;
} catch (final UnsupportedOperationException ex) {
if (warnCannotHardlink) {
getLogger().warn("Creating hardlinks is not supported");
warnCannotHardlink = false;
}
// We will do a copy instead.
}
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardlink creation code only catches UnsupportedOperationException but Files.createLink() can throw IOException for various error conditions (e.g., file already exists, insufficient permissions, I/O errors). These IOExceptions will propagate up and may not be handled appropriately, potentially causing the extraction to fail when it could fall back to copying the file content instead. Consider also catching IOException and falling back to copying in that case.

Copilot uses AI. Check for mistakes.
Comment on lines +318 to +340
boolean doCopy = true;
if (!StringUtils.isEmpty(linkDestination)) {
if (isSymlink) {
SymlinkUtils.createSymbolicLink(targetFileName, new File(linkDestination));
doCopy = false;
} else {
try {
Files.createLink(
targetFileName.toPath(),
FileUtils.resolveFile(dir, linkDestination).toPath());
doCopy = false;
} catch (final UnsupportedOperationException ex) {
if (warnCannotHardlink) {
getLogger().warn("Creating hardlinks is not supported");
warnCannotHardlink = false;
}
// We will do a copy instead.
}
}
} else if (isDirectory) {
targetFileName.mkdirs();
} else {
doCopy = false;
}
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardlink creation logic has several issues:

  1. Only UnsupportedOperationException is caught, but Files.createLink() can throw IOException for various error conditions (e.g., file already exists, insufficient permissions, I/O errors), which should also be caught to fall back to copying.
  2. After creating links (hardlinks or symlinks), the code continues to execute outside this block and will call setLastModified() and chmod() on the link at lines 347 and 350. On most platforms, these operations on links affect the target file, not the link itself, which is likely not the intended behavior.

Consider catching IOException in addition to UnsupportedOperationException, and tracking whether a link was created to avoid calling setLastModified() and chmod() on links.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants