vulkan int8 packing quantize dequantize requantize#3731
vulkan int8 packing quantize dequantize requantize#3731nihui merged 25 commits intoTencent:masterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3731 +/- ##
==========================================
- Coverage 95.81% 95.81% -0.01%
==========================================
Files 831 834 +3
Lines 264948 265433 +485
==========================================
+ Hits 253851 254315 +464
- Misses 11097 11118 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The binary size change of libncnn.so (bytes)
|
There was a problem hiding this comment.
Pull Request Overview
This pull request adds comprehensive int8 support to Vulkan operations across ncnn including quantization, dequantization, requantization, and packing. Key changes include new/int8 pipelines and shader modules, modifications in GPU utility functions to handle int8, and corresponding test updates for quantize/dequantize and packing operations.
Reviewed Changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_quantize.cpp & oom variant | Replace existing scale_data creation for 1D tests (use constant 1) |
| tests/test_packing.cpp | Renaming and restructuring of packing functions to separate FP32 and int8 tests |
| src/net.cpp | Added handling for int8 arithmetic adjustments |
| Multiple shader files (e.g., requantize_.comp, quantize_.comp) | New shader code supporting int8 operations |
| src/gpu.cpp, src/allocator.cpp | Updates to utility operators, image format selection, and int8 support |
| src/command.cpp | Adjustments to record int8 transfers |
| Other Vulkan layer files (requantize_vulkan., quantize_vulkan., dequantize_vulkan.*, packing_vulkan.cpp) | Pipeline creation and operator updates to incorporate int8 support |
Comments suppressed due to low confidence (3)
tests/test_quantize.cpp:27
- The change from using a.w to a constant '1' for creating scale_data in 1D cases is significant; please add an inline comment clarifying the reasoning for this modification to aid future maintainers.
if (a.dims == 1) scale_data.create(1);
src/allocator.cpp:900
- For elempack == 8, using VK_FORMAT_R8G8B8A8_SINT may not correctly represent 8 individual int8 channels since this format typically supports 4 channels; please verify that the chosen image format meets the int8 packing requirements or update the format selection accordingly.
if (elempack == 8) format = VK_FORMAT_R8G8B8A8_SINT;
tests/test_packing.cpp:217
- [nitpick] The renaming from 'test_packing_gpu_buffer' to 'test_packing_gpu_fp32' and the addition of an int8 variant are nontrivial changes; please ensure that all test calls and related documentation are updated to reflect this new naming and functionality split.
}
No description provided.