Support .gitattributes export-ignore in distribution archives#515
Support .gitattributes export-ignore in distribution archives#515jberdine wants to merge 2 commits intotarides:mainfrom
Conversation
Parse .gitattributes files and exclude paths marked with the export-ignore
attribute from distribution tarballs. This is the mechanism used by
git-archive to exclude files, allowing projects to exclude dev-only files
like dune-workspace from releases.
Supported patterns:
- Exact matches: dune-workspace
- Directory patterns: .github/**
- Glob patterns: *.log, test_*, file?.txt
- Double star: **/build, src/**/test.ml
Paths are normalized before matching (handles ./ and ../).
Not supported:
- Escaped patterns (\\! for literal \!)
- Quoted patterns ("a b" for patterns with spaces)
- Case insensitivity (core.ignorecase)
- Negation patterns (\!pattern)
- Subdirectory .gitattributes files
Testing:
- Unit tests comparing behavior against git check-attr
- Tests cover pattern matching and parsing edge cases
- .gitattributes content is generated from test cases to ensure sync
- Archive integration test verifies end-to-end exclusion without requiring git
Signed-off-by: Josh Berdine <josh@berdine.net>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Leonidas-from-XIV
left a comment
There was a problem hiding this comment.
Very nice PR, the extensive test coverage is definitely a highlight and good to see as the glob to regex code is somewhat hairy.
I have some suggestions for improvements here.
| let path_set_of_dir dir ~exclude_paths = | ||
| let not_excluded p = Ok (not (Fpath.Set.mem (Fpath.base p) exclude_paths)) in | ||
| let path_set_of_dir dir ~exclude_paths ~export_ignore = | ||
| let not_excluded p = |
There was a problem hiding this comment.
To avoid the negation, maybe rename it to included. It's kind of hard to wrap one's head around the double negated name and then the negation in the code.
| (* Skip directories - they will be created implicitly when their | ||
| contents are added. This ensures that directories whose contents | ||
| are excluded via export-ignore patterns don't appear as empty | ||
| directories in the archive. *) |
There was a problem hiding this comment.
But isn't this what one would expect to do? If I exclude a directory, I would expect the directory to show up, but not its contents.
Git does this but its more to do with the fact that Git doesn't track directories at all, so an empty directory is unrepresentable. But dune-release doesn't have this issue.
| is preserved. | ||
| (** [tar dir ~exclude_paths ~export_ignore ~root ~mtime] is a (us)tar archive | ||
| that contains the file hierarchy [dir] except: | ||
| - relative hierarchies present in [exclude_paths] (basename matching) |
There was a problem hiding this comment.
I wonder if it wouldn't make more sense to just translate exclude_paths to a Gitattributes.pattern and use the more generic mechanism. That way there's only one way to ignore paths.
| (any chars except /), [?] (single char except /), and [**] (any path | ||
| segments, but only when adjacent to /). *) | ||
| let glob_to_re pattern = | ||
| let buf = Buffer.create (String.length pattern * 2) in |
There was a problem hiding this comment.
Why is the initial size pattern * 2?
| let glob_to_re pattern = | ||
| let buf = Buffer.create (String.length pattern * 2) in | ||
| Buffer.add_char buf '^'; | ||
| let len = String.length pattern in |
There was a problem hiding this comment.
Nitpick but the length of the pattern is determined twice, so could be pulled up in the function an reused.
| (* Strip UTF-8 BOM if present at start of file *) | ||
| let content = | ||
| if String.is_prefix ~affix:"\xef\xbb\xbf" content then | ||
| String.Sub.to_string (String.sub ~start:3 content) |
There was a problem hiding this comment.
And use String.length utf8_bom here.
|
|
||
| (** {1 Patterns} *) | ||
|
|
||
| type pattern |
There was a problem hiding this comment.
| type pattern | |
| type t |
| let files = | ||
| [ | ||
| ("CHANGES.md", "changes"); | ||
| ("foo.opam", "opam"); |
There was a problem hiding this comment.
| ("foo.opam", "opam"); | |
| ("foo.opam", {|opam-version: "2.0"|}); |
| in | ||
| List.fold_left | ||
| (fun acc file -> acc >>= fun () -> create_file file) | ||
| (Ok ()) files |
There was a problem hiding this comment.
I think List.iter followed by some kind of Result.List.all_unit (or the like) would be a bit easier to understand but it's of minor importance.
| - Double star in path: `**/build`, `src/**/test.ml` | ||
| - Path normalization: handles `./` and `../` in paths | ||
|
|
||
| **Not supported:** |
There was a problem hiding this comment.
I think it would be good to state what "not supported" means in practice. Will it error? Will it ignore the pattern?
Parse .gitattributes files and exclude paths marked with the export-ignore
attribute from distribution tarballs. This is the mechanism used by
git-archive to exclude files, allowing projects to exclude dev-only files
like dune-workspace from releases.
Supported patterns:
Paths are normalized before matching (handles ./ and ../).
Not supported:
Testing:
Signed-off-by: Josh Berdine josh@berdine.net