Skip to content

Conversation

@nihui
Copy link
Member

@nihui nihui commented Dec 4, 2025

  • move int468 dequant code to x86/arm/... create_pipeline
  • how to encode block_size and storage type ?
  • try fp4 e2m1/e3 type ?
  • comp table ?
  • expand to more general gemm ?
  • port union hack to platform-independent style
  • gemm test++
  • doc++
./ncnnllm2int468 qwen3_decoder.ncnn.param qwen3_decoder.ncnn.bin qwen3_decoder-int6.ncnn.param qwen3_decoder-int6.ncnn.bin

@tencent-adm
Copy link
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 7.52688% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.88%. Comparing base (37336e7) to head (58cc1f3).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/gemm.cpp 7.52% 86 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6439      +/-   ##
==========================================
+ Coverage   95.62%   95.88%   +0.26%     
==========================================
  Files         844      844              
  Lines      266761   266834      +73     
==========================================
+ Hits       255080   255859     +779     
+ Misses      11681    10975     -706     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 15316400 15324592 +8192 ⚠️
armhf 6229892 6234020 +4128 ⚠️
aarch64 9527616 9527536 -80 😘

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP pull request implements block quantization for GEMM layers to support 4-bit, 6-bit, and 8-bit quantization for LLM decoder-style models. The changes introduce a new quantization tool and corresponding dequantization logic in the GEMM layer implementation.

Key changes:

  • New ncnnllm2int468 tool for quantizing GEMM weight matrices with configurable block sizes and bit widths
  • Block-based quantization scheme using per-block scaling factors stored in B_data_quantize_scales
  • Dequantization logic in gemm.cpp that converts quantized weights back to fp32 during model loading

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
tools/quantize/ncnnllm2int468.cpp New quantization tool implementing 4/6/8-bit block quantization with custom bit-packed storage formats
tools/quantize/CMakeLists.txt Build configuration adding the new ncnnllm2int468 executable
tools/modelwriter.h Extended serialization to save block quantization scales for int8_scale_term values 4/5/6
src/layer/gemm.h Added B_data_quantize_scales member to store per-block scaling factors
src/layer/gemm.cpp Implemented loading and dequantization logic for 4/6/8-bit block-quantized weights

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +40 to +41
add_executable(ncnnllm2int468 ncnnllm2int468.cpp)
target_link_libraries(ncnnllm2int468 PRIVATE ncnn)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ncnnllm2int468 executable is not added to the virtual project group or installed via ncnn_install_tool(), unlike ncnn2int8 above. This creates inconsistency in how tools are organized and installed.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +178
union i6x4_t
{
signed char i6[3];
struct
{
signed char i6_a : 6;
signed char i6_b : 6;
signed char i6_c : 6;
signed char i6_d : 6;
} __attribute__((packed));
};
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i6x4_t union definition is duplicated in both the quantization tool and the dequantization code in gemm.cpp (lines 193-203). Consider moving this to a shared header to maintain consistency and avoid duplication.

Copilot uses AI. Check for mistakes.
{
signed char i4_low : 4;
signed char i4_high : 4;
} __attribute__((packed));
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __attribute__((packed)) attribute is GCC-specific and not portable. This will fail on MSVC. Consider using #pragma pack for cross-platform compatibility or conditionally compile based on compiler.

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +247
union i4x2_t
{
signed char i4;
struct
{
signed char i4_low : 4;
signed char i4_high : 4;
} __attribute__((packed));
};
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i4x2_t union definition is duplicated in both the quantization tool and the dequantization code in gemm.cpp (lines 264-272). Consider moving this to a shared header to maintain consistency and avoid duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +308 to +310
const int block_size = 64; // FIXME hardcode
// const int nbits = 8; // FIXME hardcode
const int nbits = 6; // FIXME hardcode
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block_size value of 64 is hardcoded here and also duplicated in gemm.cpp (lines 143, 183, 254). Consider making it a named constant or passing it as a configurable parameter to avoid inconsistencies if this value needs to change.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants