Skip to content

A (simple) command-line to estimate inference memory requirements on Hugging Face

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

alvarobartt/hf-mem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hf-mem

Crates.io

A (simple) command-line to estimate inference memory requirements on Hugging Face

Usage

cargo install hf-mem

And then:

hf-mem --model-id meta-llama/Llama-3.1-8B-Instruct --token ...

Features

  • Fast and light command-line, with a single installable binary
  • Fetches just the required bytes from the safetensors files on the Hugging Face Hub that contain the metadata
  • Provides an estimation based on the count of the parameters on the different dtypes
  • Supports both sharded i.e. model-00000-of-00000.safetensors and not sharded i.e. model.safetensors files

What's next?

  • Add tracing and progress bars when fetching from the Hub
  • Support other file types as e.g. gguf
  • Read metadata from local files if existing, instead of just fetching from the Hub every single time
  • Add more flags to support estimations assuming quantization, extended context lengths, any added memory overhead, etc.

License

This project is licensed under either of the following licenses, at your option:

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

A (simple) command-line to estimate inference memory requirements on Hugging Face

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages