Release v0.6.0 · intel/auto-round

Highlights

provide experimental support for gguf q*_k format and customized mixed bits setting
support xpu in triton backend by @wenhuach21 in #563
add torch backend by @WeiweiZhang1 in #555
provide initial support of llmcompressor format, only INT8 W8A8 dynamic quantization is supported by @xin3he in #646

What's Changed

bump version into v0.5.1 by @XuehaoSun in #540
Freeze pytorch & ipex version in CI by @XuehaoSun in #541
fix_quantization_config_for_inference by @WeiweiZhang1 in #542
[critic bug]remove redundant round in dq simulation by @wenhuach21 in #543
update readme by @wenhuach21 in #550
add recipes for qwen3 8b and 14b by @n1ck-guo in #552
itrex requires torch<2.7 by @XuehaoSun in #548
[GGUF STEP4] fix search bug and improve packing & eval speed by @n1ck-guo in #545
refine xpu requirement/config json and fix several issues by @wenhuach21 in #558
add UE5M3 simulation by @wenhuach21 in #562
support xpu in triton backend by @wenhuach21 in #563
fix typo in backend by @wenhuach21 in #564
update habana docker to 1.21.0 by @XuehaoSun in #566
Support for more gguf format and float zp for Q*_1 by @n1ck-guo in #560
update readme by @wenhuach21 in #569
update readme by @wenhuach21 in #571
support for llava-based hf model by @n1ck-guo in #568
add gguf accuracy data by @wenhuach21 in #574
add sym & asym gguf quant for gguf baseline (iter==0) by @n1ck-guo in #573
modify default asym 4bits auto-round format to awq, fix save folder typo for mllm by @WeiweiZhang1 in #575
improve the robustness of parsing vlm config by @wenhuach21 in #577
switch to transformers API in cpu ut by @wenhuach21 in #580
add torch backend by @WeiweiZhang1 in #555
fix awq exporting at group_size=-1 by @wenhuach21 in #579
refact cuda ut to facilitate automation by @n1ck-guo in #559
fix tensor shape mismatch error for API usage by @WeiweiZhang1 in #582
fix device bug at calibration by @wenhuach21 in #587
Update gguf_accuracy (q3_ks) by @SinpackKonmakan in #590
add recipes for deepseek-r1-0528 by @n1ck-guo in #588
correct errors of deepseek-r1-0528 recipes by @n1ck-guo in #591
fix cuda ut by @wenhuach21 in #592
Bump protobuf from 3.20.1 to 3.20.2 in /test/test_cuda by @dependabot[bot] in #585
rm unnecessary forward to improve speed by @wenhuach21 in #593
update readme by @wenhuach21 in #597
fix q2k bug by @n1ck-guo in #599
support for q4_k_m by @n1ck-guo in #596
fix vlm uttest path error by @WeiweiZhang1 in #601
fix lots of gguf critic bugs and support imatrix in rtn mode by @wenhuach21 in #595
fix gguf bug by @wenhuach21 in #610
mv some checkers by @wenhuach21 in #611
fix gguf packing bug and moe regression by @wenhuach21 in #614
support customized mixed bits for gguf by @wenhuach21 in #615
fix double quant sym bug by @wenhuach21 in #616
FP8 WOQ export by @wenhuach21 in #617
fix bug of q5_k_s w/ imatrix by @n1ck-guo in #620
add auto-round related vllm and transformers UT by @WeiweiZhang1 in #613
refine_doc_0624 by @WeiweiZhang1 in #619
fix not using imatrix for gguf at rtn mode by @wenhuach21 in #623
fix vlm hf config loading issue by @WeiweiZhang1 in #624
refine gguf rtn algorithm and fix bugs by @wenhuach21 in #630
fix gguf bug of moe models and lmhead/embedding bits setting regression by @n1ck-guo in #628
[BUG FIX] fix bug of deepseek gguf:q*k by @n1ck-guo in #637
support packing immediately for gguf to reduce ram usage by @wenhuach21 in #638
support llmcompressor format by @xin3he in #646
fix norm_bias_tuning by @wenhuach21 in #639
[W4A8]Fix Packing by @yiliu30 in #648
Integrate RTN quantization into GGUF packing to enhance robustness by @n1ck-guo in #644
Remove vlm cuda UT dependencies version restrictions by @XuehaoSun in #651
speedup mxfp tuning and fix nvfp bug by @wenhuach21 in #647
support two more calib datasets and fix embedding layer bug by @wenhuach21 in #653
fix some issues by @wenhuach21 in #655
fix bug of q4_0 and q5_0 at iters==0 by @n1ck-guo in #658
support vlm models for gguf format by @n1ck-guo in #654
fix bug of block-wise quant imatrix by @n1ck-guo in #663
fix gguf block-wise issue by @wenhuach21 in #664
fix bugs of export deepseek gguf format when iters=0 and q3k accuracy by @n1ck-guo in #665
handle zeros in imatrix by @wenhuach21 in #667
fix ut issue by @WeiweiZhang1 in #668
fix cuda hanging issue during packing by @WeiweiZhang1 in #669
support to use lm_eval for vlm by @n1ck-guo in #670
add trust remote code to gguf format load tokenizer by @n1ck-guo in #675
fix 3bits asym accuracy and calib dataset issues by @WeiweiZhang1 in #674
restrict accelerate version to reduce ram usage by @wenhuach21 in #673
rm low_cpu when loading the model by @wenhuach21 in #676
rm_old_vlm_cuda_ut by @WeiweiZhang1 in #678
update gguf convert file and fix bug of permute bug by @n1ck-guo in #679
fix gguf regression for large models by @wenhuach21 in #680
fix gemma vlm gguf regression by @wenhuach21 in #685

New Contributors

@SinpackKonmakan made their first contribution in #590
@xin3he made their first contribution in #646

Full Changelog: v0.5.1...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!