Skip to content

Parallel diff and cmp on binary files? #121

@pauschuu

Description

@pauschuu

I just had a revelation:

$ time b3sum dreamshaper_8\ \(1\).safetensors dreamshaper_8.safetensors
771c807db56dbfc33feda5638d920f6c507db971da44772ee44a08dc38c3b437  dreamshaper_8 (1).safetensors
771c807db56dbfc33feda5638d920f6c507db971da44772ee44a08dc38c3b437  dreamshaper_8.safetensors

real    0m0.172s
user    0m2.193s
sys     0m0.423s


$ time cmp dreamshaper_8\ \(1\).safetensors dreamshaper_8.safetensors

real    0m0.596s
user    0m0.183s
sys     0m0.411s

$ time diff dreamshaper_8\ \(1\).safetensors dreamshaper_8.safetensors

real    0m0.509s
user    0m0.079s
sys     0m0.428s

As you can see, even though the b3sum method has an additional cost (calculating a hash) it is way faster overall since it's leveraging parallelism.

Wouldn't it be a good improvement to bring parallelism to some of the tools like diff and cmp?
Maybe with a new (not-standardized) option?
Maybe by default because why not?

I guess diff has a special code path once it is sure that it's just a binary file, right? So in that code path it wouldn't be much of a problem to parallelize it.

This whole topic can even be pushed further when comparing directories... parallel diffing of files.

Come on it's 2025! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions