Skip to content

Conversation

@edolstra
Copy link
Collaborator

@edolstra edolstra commented Nov 20, 2025

Motivation

Before Nix 2.20, we used the git CLI, which applies Git filters (in particular doing end-of-line conversion based on .gitattributes) and applies export-ignore. In 2.20, we switched to libgit2 and stopped doing those things, which is probably better for reproducibility. However, this breaks existing lock files / fetchTree calls for Git inputs that rely on those features, since it invalidates the NAR hash.

So as a backward compatibility hack, we now check the NAR hash computed over the Git tree without filters and export-ignore applied. If there is a hash mismatch, we try again with filters and export-ignore. If that succeeds, we print a warning and return the latter tree.

Context

Summary by CodeRabbit

  • New Features

    • Optional content filters for Git fetches (supports attribute-based transforms like CRLF/LFS).
  • Bug Fixes

    • More reliable upfront fingerprint caching for Git sources to stabilize fetch/upsert behavior.
    • Backward-compatibility handling to detect and adjust when filtered vs. unfiltered content differ.
  • Tests

    • Added compatibility tests covering Git filter/CRLF behavior and NAR-hash scenarios across Nix versions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Caution

Review failed

The head commit changed during the review from 90bb354 to ec9252d.

Walkthrough

Propagates computed Input fingerprints earlier, introduces GitAccessorOptions (including applyFilters) through Git accessor APIs and state, applies optional git blob filtering during reads, updates fingerprint construction via a helper, and adds backward-compatibility tests for NAR hash behavior.

Changes

Cohort / File(s) Summary
Fingerprint caching
src/libfetchers/fetchers.cc
Assigns computed fingerprint into Input::cachedFingerprint immediately after creating SubstitutedSourceAccessor, ensuring upstream code sees the cached fingerprint during accessor construction/lookup.
Git accessor options & state
src/libfetchers/git-utils.cc, src/libfetchers/include/nix/fetchers/git-utils.hh
Introduces GitAccessorOptions (fields: exportIgnore, smudgeLfs, applyFilters) and threads it through getRawAccessor / getAccessor signatures and GitSourceAccessor state; adds git_oid oid and stores options in state.
Blob/file filtering
src/libfetchers/git-utils.cc
Read paths (blob/file and smudgeLfs) respect state->options.applyFilters and, when true, run git_blob_filter (apply .gitattributes/CRLF/LFS filters) before returning content; preserves prior behavior when false.
Fingerprint helper & compatibility flow
src/libfetchers/git.cc
Adds GitInputScheme::makeFingerprint(const Input&, const Hash&), replaces scattered fingerprint construction with it, and adds logic to re-fetch/recompute (including appending ;e) for backward compatibility when NAR hashes differ due to filters.
Call-site updates for options
src/libfetchers/github.cc, src/libfetchers/tarball.cc, src/libfetchers-tests/git-utils.cc
Updates multiple callers to pass {} or constructed GitAccessorOptions instead of raw boolean flags (e.g., tarball/cache accessor and test helpers) to match new signatures.
Tests: backward-compat & fetchGit
tests/functional/fetchGit.sh
Adds a test block creating a repo with CRLF and .gitattributes, exercising fetchTree/fetchGit with narHash checks across filtered/unfiltered scenarios and asserting expected NAR values, mismatch behavior, and compatibility flow.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Fetch caller
    participant Scheme as GitInputScheme
    participant Repo as GitRepo / GitSourceAccessor
    participant Libgit as libgit2

    Caller->>Scheme: request accessor/read (input, rev, options)
    Scheme->>Repo: getAccessor(rev, options, displayPrefix)
    Repo->>Repo: construct GitSourceAccessor(state.oid <- rev, state.options <- options)
    Scheme->>Scheme: compute fingerprint via makeFingerprint(input, rev)
    Scheme->>Scheme: set Input.cachedFingerprint <- fingerprint

    Caller->>Repo: readBlob(oid)
    Repo->>Libgit: git_blob_lookup(oid)
    Libgit-->>Repo: raw blob
    alt state.options.applyFilters == true
        Repo->>Libgit: git_blob_filter(raw, commit/attrs)
        Libgit-->>Repo: filtered blob
        Repo-->>Caller: filtered content
    else
        Repo-->>Caller: raw content
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to:
    • Updated public signatures for GitRepo::getAccessor / getRawAccessor and all caller updates.
    • GitSourceAccessor::State initialization (setting oid, storing options) and correctness of fingerprint initialization.
    • Blob filtering integration (attribute lookup, CRLF/LFS/smudge handling) to ensure prior semantics when applyFilters is false.
    • Backward-compatibility flow in git.cc (re-fetch, fingerprint recompute, ;e suffix, and accessor substitution).
    • Tests that assert exact narHash values and mismatch message behavior.

Suggested reviewers

  • cole-h

Poem

🐰
I hopped through blobs and tiny quirks,
I set the fingerprints before the works.
Filters hum and hashes stay true,
Old nar checks wink — I chewed them through.
A small patch dance — the rabbit bows. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change—adding backward compatibility for Nix < 2.20 Git inputs regarding Git filter handling—which is the core purpose of this PR.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 20, 2025

@github-actions github-actions bot temporarily deployed to pull request November 20, 2025 11:41 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/libfetchers/fetchers.cc (1)

327-333: Redundant cachedFingerprint assignment

Input::getFingerprint(store) already sets cachedFingerprint internally before returning, so assigning cachedFingerprint = accessor->fingerprint; is redundant and can be dropped for clarity.

src/libfetchers/include/nix/fetchers/git-utils.hh (1)

91-96: New applyFilters flag is reasonable; consider documenting intent

Extending GitRepo::getAccessor with bool applyFilters = false keeps existing callers unchanged and matches the implementation in GitRepoImpl. Since this flag reintroduces Git blob filters (for the compat path), it would help future readers to document here that:

  • applyFilters = false is the default, reproducible behavior, and
  • applyFilters = true is only for legacy NAR‑hash compatibility.
src/libfetchers/git-utils.cc (1)

748-769: GitSourceAccessor filtering path looks correct; consider tightening attr scope with verified flag

The new applyFilters plumbing in GitSourceAccessor:

  • Stores the commit/tree OID and the applyFilters flag in State.
  • Uses git_blob_filter with attr_commit_id = state->oid and GIT_BLOB_FILTER_ATTRIBUTES_FROM_COMMIT when applyFilters is true.
  • Leaves behavior unchanged when applyFilters is false, and continues to short‑circuit via git‑LFS smudging when lfsFetch is active.

That matches the intended "only when explicitly asked" behavior and is safe for the legacy EOL path.

To avoid host‑specific/global attributes affecting this special‑case compat behavior, consider adding the GIT_FILTER_NO_SYSTEM_ATTRIBUTES flag (equivalent to GIT_ATTR_CHECK_NO_SYSTEM), which is available in libgit2:

opts.flags = GIT_FILTER_ATTRIBUTES_FROM_COMMIT | GIT_FILTER_NO_SYSTEM_ATTRIBUTES;

This prevents loading /etc/gitattributes during blob filtering, keeping the scope tight to commit‑provided attributes only.

Also applies to: 795-817

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e681209 and 428f45a.

📒 Files selected for processing (5)
  • src/libfetchers/fetchers.cc (1 hunks)
  • src/libfetchers/git-utils.cc (4 hunks)
  • src/libfetchers/git.cc (5 hunks)
  • src/libfetchers/include/nix/fetchers/git-utils.hh (1 hunks)
  • tests/functional/fetchGit.sh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/libfetchers/include/nix/fetchers/git-utils.hh (2)
src/libfetchers/git-utils.cc (9)
  • rev (350-390)
  • rev (350-350)
  • rev (392-397)
  • rev (392-392)
  • rev (524-524)
  • rev (555-555)
  • rev (557-562)
  • rev (632-703)
  • rev (632-632)
src/libutil/include/nix/util/source-accessor.hh (1)
  • displayPrefix (164-164)
src/libfetchers/git.cc (2)
src/libfetchers/include/nix/fetchers/fetchers.hh (11)
  • input (215-215)
  • input (217-217)
  • input (222-222)
  • input (224-228)
  • input (238-241)
  • input (238-238)
  • input (253-256)
  • store (158-158)
  • store (176-176)
  • store (243-246)
  • store (243-243)
src/libfetchers/include/nix/fetchers/git-utils.hh (5)
  • rev (31-31)
  • rev (33-33)
  • rev (85-85)
  • rev (91-96)
  • rev (111-111)
src/libfetchers/git-utils.cc (1)
src/libfetchers/include/nix/fetchers/git-utils.hh (6)
  • rev (31-31)
  • rev (33-33)
  • rev (85-85)
  • rev (91-96)
  • rev (111-111)
  • ref (38-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_x86_64-linux / build
  • GitHub Check: build_aarch64-darwin / build
🔇 Additional comments (3)
src/libfetchers/git.cc (1)

641-645: Centralizing fingerprint construction looks good

Factoring fingerprint construction into makeFingerprint and reusing it in getFingerprint (including the ;d= dirty-suffix case) keeps the fingerprint semantics consistent and easier to maintain. No correctness issues spotted.

Also applies to: 977-999

tests/functional/fetchGit.sh (1)

314-338: CRLF / narHash compatibility tests are well‑scoped

The new tests correctly exercise:

  • Legacy narHash matching only with filters applied (plus warning mentioning the new hash),
  • The updated hash matching the unfiltered content,
  • A deliberately wrong narHash yielding the 102 “NAR hash mismatch” failure.

Quoting and expected contents ("\r\n" vs "\n") look correct.

src/libfetchers/git-utils.cc (1)

555-563: applyFilters propagation through GitRepoImpl accessors

The extended signatures for:

  • getRawAccessor(const Hash & rev, bool smudgeLfs = false, bool applyFilters = false), and
  • getAccessor(const Hash & rev, bool exportIgnore, std::string displayPrefix, bool smudgeLfs = false, bool applyFilters = false)

correctly pass applyFilters down into GitSourceAccessor. Existing callers (e.g. treeHashToNarHash, getSubmodules) continue to use the default false, while the new compat logic in git.cc explicitly sets true only where needed.

Also applies to: 1339-1355

Before Nix 2.20, we used git, which applies Git filters (in particular
doing end-of-line conversion based on .gitattributes). In 2.20, we
switched to libgit2 and stopped applying filters, which is probably
better for reproducibility. However, that breaks existing lock files /
fetchTree calls for Git inputs that use those filters, since it
invalidates the NAR hash.

So as a backward compatibility hack, we now check the NAR hash
computed over the Git tree without filtering applied. If there is a
hash mismatch, we try again *with* filtering. If that succeeds, we
print a warning and return the filtered tree.
@edolstra edolstra force-pushed the eelcodolstra/fh-1150-legacy-eol-handling branch from 428f45a to 04b1b1e Compare November 20, 2025 11:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/libfetchers/git.cc (1)

793-818: Path format issue: Use CanonPath(".gitattributes") without leading slash

Line 796 uses CanonPath("/.gitattributes") with a leading slash, which is inconsistent with the established pattern in the codebase (e.g., CanonPath(".gitmodules") in git-utils.cc:1381). This inconsistency may cause pathExists to return false even when .gitattributes exists, preventing the backward compatibility logic from triggering.

Apply this diff:

-            if (accessor->pathExists(CanonPath("/.gitattributes"))) {
+            if (accessor->pathExists(CanonPath(".gitattributes"))) {
🧹 Nitpick comments (1)
tests/functional/fetchGit.sh (1)

329-329: Clarify the purpose of _NIX_TEST_BARF_ON_UNCACHEABLE

The test sets _NIX_TEST_BARF_ON_UNCACHEABLE=1 without explanation. Please add a comment explaining why this environment variable is needed for this test.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428f45a and 04b1b1e.

📒 Files selected for processing (2)
  • src/libfetchers/git.cc (5 hunks)
  • tests/functional/fetchGit.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_aarch64-darwin / build
  • GitHub Check: build_x86_64-linux / build
🔇 Additional comments (6)
src/libfetchers/git.cc (4)

19-19: LGTM!

The include is necessary for the fetchToStore2 function used in the backward compatibility logic below.


641-645: LGTM!

Good refactoring to extract fingerprint construction into a reusable helper method, eliminating duplication.


980-980: LGTM!

Correctly updated to use the new makeFingerprint helper method.


996-997: LGTM!

Correctly updated to use the new makeFingerprint helper method for the dirty workdir case.

tests/functional/fetchGit.sh (2)

314-328: LGTM!

The test setup correctly creates a scenario where Git filters affect content (LF vs CRLF), which is exactly what the backward compatibility logic handles.


331-340: Test scenarios effectively cover the backward compatibility logic

The three test cases properly validate:

  1. Pre-Nix 2.20 behavior (filters applied, CRLF content)
  2. Post-Nix 2.20 behavior (no filters, LF content)
  3. Error handling for mismatched hashes

Note: If the CanonPath issue flagged in git.cc is not fixed, Test 1 would fail because the backward compatibility logic wouldn't trigger.

@github-actions github-actions bot temporarily deployed to pull request November 20, 2025 11:53 Inactive
@edolstra edolstra changed the title Backward compatibility hack for Git inputs that depend on Git filters Backward compatibility hack for Nix < 2.20 Git inputs Nov 20, 2025
@github-actions github-actions bot temporarily deployed to pull request November 20, 2025 13:57 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 20, 2025 16:00 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/libfetchers/fetchers.cc (2)

327-347: Fingerprint caching in makeStoreAccessor is slightly redundant but harmless

getFingerprint(store) already populates cachedFingerprint inside Input, so the explicit cachedFingerprint = accessor->fingerprint; write is effectively a no‑op refresh. It’s fine as is; if you want to trim dead writes, you could safely drop that line without changing behavior.


358-365: New fingerprint propagation between accessor and Input result is reasonable; minor enhancement possible

The new logic that:

  • prefers accessor->getFingerprint(CanonPath::root).second to fill result.cachedFingerprint, and
  • otherwise computes via result.getFingerprint(store) and assigns it to accessor->fingerprint

is a good way to keep accessors and Input’s cached fingerprint in sync.

If getFingerprint(CanonPath::root) may compute the fingerprint on demand rather than reading a cached field, you might also set accessor->fingerprint from that value to avoid recomputation later:

-        if (auto fp = accessor->getFingerprint(CanonPath::root).second)
-            result.cachedFingerprint = *fp;
-        else
-            accessor->fingerprint = result.getFingerprint(store);
+        if (auto fp = accessor->getFingerprint(CanonPath::root).second) {
+            accessor->fingerprint = *fp;
+            result.cachedFingerprint = *fp;
+        } else {
+            accessor->fingerprint = result.getFingerprint(store);
+        }

Not required for correctness, but could reduce duplicate work depending on SourceAccessor implementations.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1ab3bb and ec9252d.

📒 Files selected for processing (7)
  • src/libfetchers-tests/git-utils.cc (1 hunks)
  • src/libfetchers/fetchers.cc (2 hunks)
  • src/libfetchers/git-utils.cc (8 hunks)
  • src/libfetchers/git.cc (7 hunks)
  • src/libfetchers/github.cc (1 hunks)
  • src/libfetchers/include/nix/fetchers/git-utils.hh (2 hunks)
  • src/libfetchers/tarball.cc (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/libfetchers/git.cc (2)
src/libfetchers/include/nix/fetchers/fetchers.hh (11)
  • input (215-215)
  • input (217-217)
  • input (222-222)
  • input (224-228)
  • input (238-241)
  • input (238-238)
  • input (253-256)
  • store (158-158)
  • store (176-176)
  • store (243-246)
  • store (243-243)
src/libfetchers/git-utils.cc (13)
  • rev (350-390)
  • rev (350-350)
  • rev (392-397)
  • rev (392-392)
  • rev (524-524)
  • rev (555-555)
  • rev (558-558)
  • rev (629-700)
  • rev (629-629)
  • settings (702-716)
  • settings (702-702)
  • makeFingerprint (741-744)
  • makeFingerprint (741-741)
src/libfetchers/tarball.cc (1)
src/libfetchers/github.cc (16)
  • settings (35-111)
  • settings (36-36)
  • settings (127-135)
  • settings (127-127)
  • settings (182-204)
  • settings (182-183)
  • settings (206-213)
  • settings (207-207)
  • settings (215-228)
  • settings (215-216)
  • settings (236-236)
  • settings (238-238)
  • settings (246-316)
  • settings (246-246)
  • settings (318-339)
  • settings (319-319)
src/libfetchers/include/nix/fetchers/git-utils.hh (2)
src/libfetchers/git-utils.cc (10)
  • rev (350-390)
  • rev (350-350)
  • rev (392-397)
  • rev (392-392)
  • rev (524-524)
  • rev (555-555)
  • rev (558-558)
  • rev (629-700)
  • rev (629-629)
  • wd (561-561)
src/libutil/include/nix/util/source-accessor.hh (1)
  • displayPrefix (164-164)
src/libfetchers/fetchers.cc (2)
src/libfetchers/github.cc (2)
  • store (350-356)
  • store (350-350)
src/libfetchers/include/nix/fetchers/fetchers.hh (4)
  • store (158-158)
  • store (176-176)
  • store (243-246)
  • store (243-243)
src/libfetchers/git-utils.cc (3)
src/libfetchers/include/nix/fetchers/git-utils.hh (10)
  • rev (31-31)
  • rev (40-40)
  • rev (42-42)
  • rev (94-94)
  • rev (101-101)
  • rev (116-116)
  • wd (103-104)
  • path (38-38)
  • path (85-85)
  • ref (47-47)
src/libutil/include/nix/util/source-accessor.hh (1)
  • displayPrefix (164-164)
src/libfetchers/git.cc (5)
  • getAccessor (959-980)
  • getAccessor (960-960)
  • makeFingerprint (641-648)
  • makeFingerprint (641-641)
  • path (399-405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_x86_64-linux / build
  • GitHub Check: build_aarch64-darwin / build
🔇 Additional comments (5)
src/libfetchers/include/nix/fetchers/git-utils.hh (2)

25-32: GitAccessorOptions struct looks coherent and backwards‑compatible by default

Centralizing exportIgnore, smudgeLfs, and applyFilters into GitAccessorOptions with all‑false defaults preserves prior behavior at call sites that now pass {}. The makeFingerprint declaration here also makes sense as the natural place to incorporate these flags into the fingerprint.


100-104: Updated GitRepo::getAccessor signatures align with the new options type

Switching both getAccessor overloads to take const GitAccessorOptions & is consistent with the new struct and keeps the API surface clean. As long as all GitRepo implementors are updated accordingly (which compilation should catch), this change looks good.

src/libfetchers-tests/git-utils.cc (1)

94-96: Test call site correctly adapted to GitAccessorOptions

Using {} here to construct GitAccessorOptions preserves the old behavior (exportIgnore == false and no filters) while matching the new API. No changes to test semantics.

src/libfetchers/tarball.cc (1)

133-141: Tarball cache accessor call correctly migrated to options object

Passing {} as the GitAccessorOptions when opening the tarball cache keeps the previous “no export‑ignore / no filters” behavior while conforming to the new signature. This is an appropriate default for tarball‑derived trees.

src/libfetchers/github.cc (1)

323-331: Git forge tarball accessor updated consistently with tarball.cc

The switch to getAccessor(tarballInfo.treeHash, {}, …) mirrors the tarball path and preserves prior behavior (no filters / export‑ignore) for cached archive trees. This keeps the GitArchive input scheme aligned with the tarball cache API.

warn(
"Git input '%s' specifies a NAR hash '%s' that was created by Nix < 2.20.\n"
"Nix >= 2.20 does not apply Git filters and `export-ignore` by default, which changes the NAR hash.\n"
"Please update the NAR hash to '%s'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: update this to recommend re-locking with a newer Nix version, and include a link to docs.determinate.systems on other approaches for extending backwards compatibility

@grahamc
Copy link
Member

grahamc commented Nov 21, 2025

Should we change the fetcher-cache version to avoid issues when using mixed Nix versions?

@edolstra edolstra force-pushed the eelcodolstra/fh-1150-legacy-eol-handling branch from 90bb354 to ec9252d Compare November 25, 2025 14:42
@github-actions github-actions bot temporarily deployed to pull request November 25, 2025 14:44 Inactive
@edolstra
Copy link
Collaborator Author

Closing this for #278.

@edolstra edolstra closed this Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants