DFloat11: Lossless LLM Compression for Efficient GPU Inference #11808
nitinmukesh
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
DFloat11 is a lossless compression framework that reduces the size of Large Language Models (LLMs) by approximately 30% while preserving bit-for-bit identical outputs to the original model. It enables efficient GPU inference on resource-constrained hardware without sacrificing accuracy.
Wan and FLUX are supported, if anyone wants to try
https://github.com/LeanModels/DFloat11
Looks interesting and diffusers is supported. I can't try due to VRAM but in case anyone is interested.
Beta Was this translation helpful? Give feedback.
All reactions