Skip to content

Support .gitattributes export-ignore in distribution archives#515

Open
jberdine wants to merge 2 commits intotarides:mainfrom
jberdine:export-ignore
Open

Support .gitattributes export-ignore in distribution archives#515
jberdine wants to merge 2 commits intotarides:mainfrom
jberdine:export-ignore

Conversation

@jberdine
Copy link
Copy Markdown

Parse .gitattributes files and exclude paths marked with the export-ignore
attribute from distribution tarballs. This is the mechanism used by
git-archive to exclude files, allowing projects to exclude dev-only files
like dune-workspace from releases.

Supported patterns:

  • Exact matches: dune-workspace
  • Directory patterns: .github/**
  • Glob patterns: .log, test_, file?.txt
  • Double star: /build, src//test.ml

Paths are normalized before matching (handles ./ and ../).

Not supported:

  • Escaped patterns (\! for literal !)
  • Quoted patterns ("a b" for patterns with spaces)
  • Case insensitivity (core.ignorecase)
  • Negation patterns (!pattern)
  • Subdirectory .gitattributes files

Testing:

  • Unit tests comparing behavior against git check-attr
  • Tests cover pattern matching and parsing edge cases
  • .gitattributes content is generated from test cases to ensure sync
  • Archive integration test verifies end-to-end exclusion without requiring git

Signed-off-by: Josh Berdine josh@berdine.net

Parse .gitattributes files and exclude paths marked with the export-ignore
attribute from distribution tarballs. This is the mechanism used by
git-archive to exclude files, allowing projects to exclude dev-only files
like dune-workspace from releases.

Supported patterns:
- Exact matches: dune-workspace
- Directory patterns: .github/**
- Glob patterns: *.log, test_*, file?.txt
- Double star: **/build, src/**/test.ml

Paths are normalized before matching (handles ./ and ../).

Not supported:
- Escaped patterns (\\! for literal \!)
- Quoted patterns ("a b" for patterns with spaces)
- Case insensitivity (core.ignorecase)
- Negation patterns (\!pattern)
- Subdirectory .gitattributes files

Testing:
- Unit tests comparing behavior against git check-attr
- Tests cover pattern matching and parsing edge cases
- .gitattributes content is generated from test cases to ensure sync
- Archive integration test verifies end-to-end exclusion without requiring git

Signed-off-by: Josh Berdine <josh@berdine.net>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR, the extensive test coverage is definitely a highlight and good to see as the glob to regex code is somewhat hairy.

I have some suggestions for improvements here.

let path_set_of_dir dir ~exclude_paths =
let not_excluded p = Ok (not (Fpath.Set.mem (Fpath.base p) exclude_paths)) in
let path_set_of_dir dir ~exclude_paths ~export_ignore =
let not_excluded p =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the negation, maybe rename it to included. It's kind of hard to wrap one's head around the double negated name and then the negation in the code.

(* Skip directories - they will be created implicitly when their
contents are added. This ensures that directories whose contents
are excluded via export-ignore patterns don't appear as empty
directories in the archive. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't this what one would expect to do? If I exclude a directory, I would expect the directory to show up, but not its contents.

Git does this but its more to do with the fact that Git doesn't track directories at all, so an empty directory is unrepresentable. But dune-release doesn't have this issue.

is preserved.
(** [tar dir ~exclude_paths ~export_ignore ~root ~mtime] is a (us)tar archive
that contains the file hierarchy [dir] except:
- relative hierarchies present in [exclude_paths] (basename matching)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't make more sense to just translate exclude_paths to a Gitattributes.pattern and use the more generic mechanism. That way there's only one way to ignore paths.

(any chars except /), [?] (single char except /), and [**] (any path
segments, but only when adjacent to /). *)
let glob_to_re pattern =
let buf = Buffer.create (String.length pattern * 2) in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the initial size pattern * 2?

let glob_to_re pattern =
let buf = Buffer.create (String.length pattern * 2) in
Buffer.add_char buf '^';
let len = String.length pattern in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick but the length of the pattern is determined twice, so could be pulled up in the function an reused.

(* Strip UTF-8 BOM if present at start of file *)
let content =
if String.is_prefix ~affix:"\xef\xbb\xbf" content then
String.Sub.to_string (String.sub ~start:3 content)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use String.length utf8_bom here.


(** {1 Patterns} *)

type pattern
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type pattern
type t

let files =
[
("CHANGES.md", "changes");
("foo.opam", "opam");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("foo.opam", "opam");
("foo.opam", {|opam-version: "2.0"|});

in
List.fold_left
(fun acc file -> acc >>= fun () -> create_file file)
(Ok ()) files
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think List.iter followed by some kind of Result.List.all_unit (or the like) would be a bit easier to understand but it's of minor importance.

- Double star in path: `**/build`, `src/**/test.ml`
- Path normalization: handles `./` and `../` in paths

**Not supported:**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to state what "not supported" means in practice. Will it error? Will it ignore the pattern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants