DFloat11: Lossless LLM Compression for Efficient GPU Inference #11808

nitinmukesh · 2025-06-25T19:43:06Z

nitinmukesh
Jun 25, 2025

DFloat11 is a lossless compression framework that reduces the size of Large Language Models (LLMs) by approximately 30% while preserving bit-for-bit identical outputs to the original model. It enables efficient GPU inference on resource-constrained hardware without sacrificing accuracy.

Wan and FLUX are supported, if anyone wants to try
https://github.com/LeanModels/DFloat11

Looks interesting and diffusers is supported. I can't try due to VRAM but in case anyone is interested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DFloat11: Lossless LLM Compression for Efficient GPU Inference #11808

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

DFloat11: Lossless LLM Compression for Efficient GPU Inference #11808

Uh oh!

nitinmukesh Jun 25, 2025

Replies: 0 comments

nitinmukesh
Jun 25, 2025