Skip to content

Conversation

@jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Oct 29, 2025

The C++ kernels.

cmake -DCOMPUTE_BACKEND=cpu -S . && make

Hi @matthewdouglas . I've implemented the CPU dequantize op for nf4/fp4. It will bring 10x+ speed-up in the e2e text-generation task compared with the original python kernel on llama3-8B model. Would you please review this PR? Thanks!

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng jiqing-feng marked this pull request as draft October 29, 2025 02:28
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng jiqing-feng marked this pull request as ready for review November 4, 2025 07:35
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <[email protected]>
Comment on lines 272 to 286
if (BUILD_CPU)
target_link_libraries(bitsandbytes PRIVATE OpenMP::OpenMP_CXX)
include(CheckCXXCompilerFlag)

check_cxx_compiler_flag(-mavx512f HAS_AVX512F)
check_cxx_compiler_flag(-mavx512bf16 HAS_AVX512BF16)

if(HAS_AVX512F)
target_compile_options(bitsandbytes PRIVATE -mavx512f)
endif()

if(HAS_AVX512BF16)
target_compile_options(bitsandbytes PRIVATE -mavx512bf16)
endif()
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jiqing-feng we still have some build issues with this.

A few things need consideration here:

  • We build for Linux aarch64
  • We also build for macOS arm64. I'm not sure how to use OpenMP on that platform - maybe we can skip for now?
  • On Windows x86-64 we build with MSVC. Apart from /arch:AVX512 I don't really know if there is a flag for AVX512-BF16.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the cmake file, please check it. Thanks.

Signed-off-by: jiqing-feng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants