Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: "numeric field was not a number: t Regard when getting cksum for ...." #313

Open
npajkovsky opened this issue Mar 28, 2023 · 2 comments · May be fixed by #314
Open

error: "numeric field was not a number: t Regard when getting cksum for ...." #313

npajkovsky opened this issue Mar 28, 2023 · 2 comments · May be fixed by #314

Comments

@npajkovsky
Copy link

npajkovsky commented Mar 28, 2023

Hello,

I have a tarball that I want to read

-> % file backup.tar
backup.tar: POSIX tar archive

But I've got an error

error: "numeric field was not a number: t Regard when getting cksum for a, <=\nm>\n> wrote:\n>\n> Dear \n>\n>\n>\n> Correct. We will have more detai"

It looks like that it starts reading a header from a wrong position.

My code looks like this

use std::fs::File;

use eyre::Result;
use tar::Archive;

fn main() -> Result<()> {
    let file: File = File::open(path: "backup.tar")?;
    let mut archive: Archive<File> = Archive::new(obj: file);

    archive.entries().unwrap().for_each(|x: Result<Entry<File>, Error>| {
        let x: Entry<File> = x.unwrap();
        println!("{}", x.path().unwrap().display())
    });

    Ok(())
}

And with a couple of dbg!() I've got this.

[/home/nikola/src/tar-rs/src/archive.rs:268] self.next = 1081462272
[/home/nikola/src/tar-rs/src/archive.rs:271] self.next - self.archive.inner.pos.get() = 7680
[/home/nikola/src/tar-rs/src/archive.rs:294] &header = UstarHeader {
    entry_size: 138,
    size: 138,
    path: "././@LongLink",
    link_name: None,
    username: Some(
        "",
    ),
    groupname: Some(
        "",
    ),
    cksum: 2567,
    cksum_valid: true,
}
[/home/nikola/src/tar-rs/src/archive.rs:268] self.next = 1081463296
[/home/nikola/src/tar-rs/src/archive.rs:271] self.next - self.archive.inner.pos.get() = 374
[/home/nikola/src/tar-rs/src/archive.rs:294] &header = UstarHeader {
    entry_size: 64700,
    size: 64700,
    path: "backup-3.22.2023_17-45-34_acaegypt/homedir/mail/....t/management/cur/1673424346.M756912P292403.demosthenes.datapack.net,S=64700,W=65570:2,",
    link_name: Some(
        "backup-3.22.2023_17-45-34_acaegypt/homedir/mail/...t/acc/cur/1673424346.M756912P292403.demos",
    ),
    mode: 0o640,
    uid: 1058,
    gid: 1061,
    mtime: 1673424346,
    username: Some(
        "user",
    ),
    groupname: Some(
        "group",
    ),
    device_major: Some(
        9,
    ),
    device_minor: Some(
        2,
    ),
    cksum: 24700,
    cksum_valid: true,
}
backup-3.22.2023_17-45-34_example/homedir/mail/example.net/management/cur/1673424346.M756912P292403.....net,S=64700,W=65570:2,
[/home/nikola/src/tar-rs/src/archive.rs:268] self.next = 1081528832
[/home/nikola/src/tar-rs/src/archive.rs:271] self.next - self.archive.inner.pos.get() = 65024
[/home/nikola/src/tar-rs/src/archive.rs:294] &header = OldHeader {
    path: "a, <=\nm>\n> wrote:\n>\n> Dear Eslam,\n>\n>\n>\n> Correct. We will have more detai",
    link_name: Some(
        ",\n>\n> *\n>\n>\n>\n> Tel: +380 (48) 7 340 34",
    ),
    username: None,
    groupname: None,
    device_major: None,
    device_minor: None,
}

Any ideas what could go wrong?

@npajkovsky
Copy link
Author

Found the problem, I'll submit MR tomorrow.

@npajkovsky npajkovsky changed the title error: "numeric field was not a number: t Regard when getting cksum for a, <[email protected]=\nm>\n> wrote:\n>\n> Dear Eslam,\n>\n>\n>\n> Correct. We will have more detai" error: "numeric field was not a number: t Regard when getting cksum for a, <=\nm>\n> wrote:\n>\n> Dear,\n>\n>\n>\n> Correct. We will have more detai" Mar 28, 2023
@npajkovsky npajkovsky changed the title error: "numeric field was not a number: t Regard when getting cksum for a, <=\nm>\n> wrote:\n>\n> Dear,\n>\n>\n>\n> Correct. We will have more detai" error: "numeric field was not a number: t Regard when getting cksum for ...." Mar 28, 2023
npajkovsky pushed a commit to npajkovsky/tar-rs that referenced this issue Mar 29, 2023
Fix the error: "numeric field was not a number: t 'Regard' when getting cksum for
'a text ...'. The error is caused by wrongly setting the `self.next`
position due to accounting harlinks size.

According to man 5 tar, the size of the hard-links should be set to 0.

size    Size of file, as octal number in ASCII.  For regular files	only,
        this indicates the	amount of data that follows the	header.	 In
        particular, this field was	ignored	by early tar implementations
        when extracting hardlinks.	 Modern	writers	should always store a
        zero length for hardlink entries.

But since the writer wasn't *modern*, the entry_size is 64700 which
causes miscalculation of the `self.next`.

[tar-rs/src/archive.rs:372] &entry.header() = UstarHeader {
    entry_size: 64700,
    size: 64700,
    path: "some/path",
    link_name: Some(
        "some/link",
    ),
    mode: 0o640,
    uid: 1058,
    gid: 1061,
    mtime: 1673424346,
    username: Some(
        "example",
    ),
    groupname: Some(
        "example",
    ),
    device_major: Some(
        9,
    ),
    device_minor: Some(
        2,
    ),
    cksum: 24700,
    cksum_valid: true,
}
[tar-rs/src/archive.rs:373] entry.header().entry_type() = Link

Closes: alexcrichton#313

Signed-off-by: Nikola Pajkovsky <[email protected]>
@npajkovsky npajkovsky linked a pull request Mar 29, 2023 that will close this issue
@jonathanstrong
Copy link

jonathanstrong commented May 2, 2023

I am running into a similar error while trying to read tar.gz archives generated by Cargo (.crate file), however, it seems to occur randomly, and I haven't been able to pin down what conditions trigger it.

here is a simplified code example to explain:

#[test]
fn check_extract_cargo_dot_toml_from_krate_file() {
    const DATA: &[u8] = include_bytes!("path/to/raw-cargo-publish-http-request-body");
    let krate_file: &[u8] = // code to split .crate file from json metadata in request body
    let cargo_dot_toml = extract_cargo_dot_toml(krate_file).unwrap();
    // ..
}

// simplified version to demonstrate usage of tar crate api
fn extract_cargo_dot_toml(krate_file: &[u8]) -> Result<_, _> {
    let decoder = flate2::read::GzDecoder::new(krate_file);
    let mut archive = tar::Archive::new(decoder);

    for entry in archive.entries()? {
        let mut entry = entry?;
        let path = entry.path()?;

        if path.ends_with("Cargo.toml.orig") {
            // store .read_to_string(..) output in `manifest`
        } else if path.ends_with(".cargo_vcs_info.json") {
            // store .read_to_string(..) output in `vcs_info`
        }

        if manifest.is_some() && vcs_info.is_some() {
            break
        }
    }
    // ..
}

The vast majority of the time, this code works fine. Once in a while, I get

Err(Custom {
    kind: Other,
    error: "numeric field was not a number:  when getting cksum for <path/to>/Cargo.toml.orig",
})

Strangely, it seems to correlate with running the tests with a slightly different set of command line flags, i.e. cargo test vs cargo test --lib. It also seems more likely in the several invocations following a full rebuild.

I have tried using a fuzzer to produce a self-contained example, but all that produced so far is an example that always fails, including using the tar binary and the code in the PR discussed here (it does fail with a similar error).

I would love to be able to identify the cause of this. Can you think of any theory for why the error would show up sporadically this way?

Update: following the code in src/header.rs, I logged the raw byte slice which ended up (randomly) triggering the error:

Err(
    Custom {
        kind: Other,
        error: "numeric field was not a number: , (input slice = [0, 0, 49, 51, 49, 51, 51, 0]) when getting cksum for <path/to>/Cargo.toml.orig",
    },
)

relevant code in src/header.rs:

impl Header {
    pub fn cksum(&self) -> io::Result<u32> {
        octal_from(&self.as_old().cksum)
            .map(|u| u as u32)
            .map_err(|err| {
                io::Error::new(
                    err.kind(),
                    format!("{} when getting cksum for {}", err, self.path_lossy()),
                )
            })
    }

    pub fn as_old(&self) -> &OldHeader {
        unsafe { cast(self) }
    }

    // ..
}

fn octal_from(slice: &[u8]) -> io::Result<u64> {
    let trun = truncate(slice);
    let num = match str::from_utf8(trun) {
        Ok(n) => n,
        Err(_) => {
            return Err(other(&format!(
                "numeric field did not have utf-8 text: {}",
                String::from_utf8_lossy(trun)
            )));
        }
    };
    match u64::from_str_radix(num.trim(), 8) {
        Ok(n) => Ok(n),
        Err(_) => Err(other(&format!("numeric field was not a number: {}, (input slice = {:?})", num, slice))),
    }
}

I know nothing about the tar format, so not sure what to make of this...

$ evcxr
>> let input: &[u8] = &[0, 0, 49, 51, 49, 51, 51, 0][..];
>> std::str::from_utf8(input)
Ok("\0\013133\0")
>> std::str::from_utf8(input).unwrap().trim()
"\0\013133\0"
>> u64::from_str_radix(std::str::from_utf8(input).unwrap().trim(), 8)
Err(ParseIntError { kind: InvalidDigit })
>> u64::from_str_radix("13133", 8)
Ok(5723)

Update 2: I only get the error when zlib-ng (or zlib-ng-compat) feature is enabled in flate2 crate. It seems to be a flate2 with zlib-ng enabled issue rather than tar.

obmarg added a commit to grafbase/grafbase that referenced this issue Feb 29, 2024
A user was running into issues with `grafbase deploy` yesterday, where
deployments failed with a bizzare error very similar to [this one][1].

We eventually tracked this down to files in the users `.direnv` or `.devenv`
folders - based on the tar-rs issue _probably_ hard links of some description.
While it would be good to not choke on hardlinks, I'd argue we shouldn't have
been archiving these files in the first place: they're clearly meant to stay
local to the users machine.

So this PR switches `walkdir` for `ignore`, which will respect `.gitignore`
files etc, providing users with a sensible (and in most cases zero effort) way
to exclude files from the archive we build on `grafbase deploy`.  I think this
is a sensible default.

This _could_ have ramifications for anyone doing a `grafbase deploy` after some
kind of local build step, though I don't know how likely that is really.  I
suggest we try it and see - if anyone complains we can revisit, maybe adding a
CLI flag to change the behaviour.

[1]: alexcrichton/tar-rs#313
obmarg added a commit to grafbase/grafbase that referenced this issue Mar 1, 2024
A user was running into issues with `grafbase deploy` yesterday, where
deployments failed with a bizzare error very similar to [this one][1].

We eventually tracked this down to files in the users `.direnv` or
`.devenv` folders - based on the tar-rs issue _probably_ hard links of
some description. While it would be good to not choke on hardlinks, I'd
argue we shouldn't have been archiving these files in the first place:
they're clearly meant to stay local to the users machine.

So this PR switches `walkdir` for `ignore`, which will respect
`.gitignore` files etc, providing users with a sensible (and in most
cases zero effort) way to exclude files from the archive we build on
`grafbase deploy`. I think this is a sensible default.

This _could_ have ramifications for anyone doing a `grafbase deploy`
after some kind of local build step, though I don't know how likely that
is really. I suggest we try it and see - if anyone complains we can
revisit, maybe adding a CLI flag to change the behaviour.

Fixes GB-6107

[1]: alexcrichton/tar-rs#313
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants