Skip to content

gix-filter hangs with clean=cat/smudge=cat on specific files #2080

@MggMuggins

Description

@MggMuggins

Current behavior 😯

Been following gix for a long time; this is an awesome tool :D

I'm using gix via helix and seeing hangs on a few large files when they're opened in the context of a git repo with my gitconfig (see reproducer steps). Backtrace.

Workaround is to simply remove the no-op filter from my gitconfig but gix should handle this situation the same way git does.

I'm happy to work on a fix for this (feel free to assign the issue to me) but I wanted to get the issue reported for any future visitors.

Expected behavior 🤔

No response

Git behavior

Git is able to check in/out this file, otherwise I suspect I wouldn't have been able to commit changes to it 😅

Steps to reproduce 🕹

# This has been hanging out in my .gitconfig with some commented out lines for a while
[filter "whitespace"]
	clean = cat
	smudge = cat

Source file/repo: https://github.com/canonical/lxd/blob/3bb6b5f7323d7aec585d21de5b99cb77b78415dd/lxd/instance/drivers/driver_lxc.go

Reproducer (just hacked out the relevant bits of helix):

use std::error::Error;
use std::io::Read;
use std::path::Path;

use gix::filter::plumbing::driver::apply::Delay;
use gix::objs::tree::EntryKind;
use gix::sec::trust::DefaultForLevel;
use gix::{Commit, ObjectId, Repository, ThreadSafeRepository};

fn open_repo(path: &Path) -> Result<ThreadSafeRepository, Box<dyn Error>> {
    // custom open options
    let mut git_open_opts_map = gix::sec::trust::Mapping::<gix::open::Options>::default();

    // On windows various configuration options are bundled as part of the installations
    // This path depends on the install location of git and therefore requires some overhead to lookup
    // This is basically only used on windows and has some overhead hence it's disabled on other platforms.
    // `gitoxide` doesn't use this as default
    let config = gix::open::permissions::Config {
        system: true,
        git: true,
        user: true,
        env: true,
        includes: true,
        git_binary: cfg!(windows),
    };
    // change options for config permissions without touching anything else
    git_open_opts_map.reduced = git_open_opts_map
        .reduced
        .permissions(gix::open::Permissions {
            config,
            ..gix::open::Permissions::default_for_level(gix::sec::Trust::Reduced)
        });
    git_open_opts_map.full = git_open_opts_map.full.permissions(gix::open::Permissions {
        config,
        ..gix::open::Permissions::default_for_level(gix::sec::Trust::Full)
    });

    let open_options = gix::discover::upwards::Options {
        dot_git_only: true,
        ..Default::default()
    };

    let res = ThreadSafeRepository::discover_with_environment_overrides_opts(
        path,
        open_options,
        git_open_opts_map,
    )?;

    Ok(res)
}

/// Finds the object that contains the contents of a file at a specific commit.
fn find_file_in_commit(repo: &Repository, commit: &Commit, file: &Path) -> Result<ObjectId, Box<dyn Error>> {
    let repo_dir = repo.workdir().unwrap();
    let rel_path = file.strip_prefix(repo_dir)?;
    let tree = commit.tree()?;
    let tree_entry = tree
        .lookup_entry_by_path(rel_path)
        .unwrap().unwrap();
    match tree_entry.mode().kind() {
        // not a file, everything is new, do not show diff
        mode @ (EntryKind::Tree | EntryKind::Commit | EntryKind::Link) => {
            panic!("entry at {} is not a file but a {mode:?}", file.display())
        }
        // found a file
        EntryKind::Blob | EntryKind::BlobExecutable => Ok(tree_entry.object_id()),
    }
}

pub fn get_diff_base(file: &Path) -> Result<Vec<u8>, Box<dyn Error>> {
    debug_assert!(!file.exists() || file.is_file());
    debug_assert!(file.is_absolute());
    let file = gix::path::realpath(file).unwrap();

    // TODO cache repository lookup

    let repo_dir = &file.parent().unwrap();
    let repo = open_repo(repo_dir)?
        .to_thread_local();
    let head = repo.head_commit()?;
    let file_oid = find_file_in_commit(&repo, &head, &file)?;

    let file_object = repo.find_object(file_oid)?;
    let data = file_object.detach().data;
    // Get the actual data that git would make out of the git object.
    // This will apply the user's git config or attributes like crlf conversions.
    if let Some(work_dir) = repo.workdir() {
        let rela_path = file.strip_prefix(work_dir)?;
        let rela_path = gix::path::try_into_bstr(rela_path)?;
        let (mut pipeline, _) = repo.filter_pipeline(None)?;
        let mut worktree_outcome =
            pipeline.convert_to_worktree(&data, rela_path.as_ref(), Delay::Forbid)?;
        let mut buf = Vec::with_capacity(data.len());
        worktree_outcome.read_to_end(&mut buf)?;
        Ok(buf)
    } else {
        Ok(data)
    }
}

fn main() {
    let path = Path::new("/home/wesley/Workspace/upstream/lxd/lxd/instance/drivers/driver_lxc.go");
    let _ = get_diff_base(path);
}

Metadata

Metadata

Assignees

Labels

acknowledgedan issue is accepted as shortcoming to be fixed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions