Releases · intel/auto-round

14 Nov 12:32

wenhuach21

v0.9.0

8d8a1cd

v0.9.0 Latest

Latest

Highlights

support automatic mixed bits assignment by @wenhuach21 in #851
optimize rtn for int woq by @wenhuach21 in #924
support for model scope by @n1ck-guo in #957
enhance auto device map and support XPU by @xin3he in #961
support for immediate saving to reduce ram usage by @Kaihui-intel in #965
update gguf alg ext by @n1ck-guo in #1026

What's Changed

Fix rtn tuning_device issue by @Kaihui-intel in #893
fix vlm gguf ut by @n1ck-guo in #895
update alg_ext.abi3.so with python compatible version by @chensuyue in #894
move ste from quant to round for nvfp4 by @xin3he in #889
Add GPT-OSS quant support by @yiliu30 in #887
better help printing information by @n1ck-guo in #883
speedup quant and evaluation, fix recompile issue by @xin3he in #897
fix nvfp act quantization bug by @WeiweiZhang1 in #891
support automatic mixed bits assignment by @wenhuach21 in #851
try to fix gguf vram issue on windows by @wenhuach21 in #886
remove numba from requirments by @yiliu30 in #905
Extend mxfp loading dtypes by @yiliu30 in #907
block dataset logger info by @n1ck-guo in #908
fix torch compile issue in AutoScheme by @wenhuach21 in #909
Revert "Extend mxfp loading dtypes" by @wenhuach21 in #915
support disable_opt_rtn in auto-scheme by @wenhuach21 in #913
fix llama 4 ut by @n1ck-guo in #896
Add numba for cpu lib by @yiliu30 in #919
Loosen the packing restrictions for mxfp&nvfp by @WeiweiZhang1 in #911
Extend mxfp loading dtypes by @yiliu30 in #916
Fix act config exporting for mixed schemes by @WeiweiZhang1 in #903
optimize rtn for int woq by @wenhuach21 in #924
fix bug of gguf and support for LiquidAI/LFM2-1.2B by @n1ck-guo in #927
remove numpy<2.0 limitation by @xin3he in #921
enable regex quantization config saving for mixed bits by @WeiweiZhang1 in #825
Fix Flux tuning issue by @mengniwang95 in #936
gguf support for inclusionAI/Ling-flash-2.0 by @n1ck-guo in #940
remove low_cpu_mem by @n1ck-guo in #934
Add compatibility test by @XuehaoSun in #918
Add commit hash to version by @XuehaoSun in #941
gguf weight type align with original, output.weight, token_embed by @n1ck-guo in #900
support attention mask in user's dataset by @wenhuach21 in #930
Add diffusion README by @mengniwang95 in #923
update readme by @wenhuach21 in #949
refactor utils file by @n1ck-guo in #943
update readme for sglang support by @WeiweiZhang1 in #953
update gguf and support for CompressedLinear by @n1ck-guo in #950
Reduce AutoSchem VRAM usage by up to 10X by @wenhuach21 in #944
add self attribution and fix avg_bits error by @xin3he in #956
add logo by @wenhuach21 in #960
refine AutoScheme readme/code by @wenhuach21 in #958
update readme by @wenhuach21 in #962
fix critic disable_opt_rtn regression by @wenhuach21 in #963
[1/N] Initial vllm-ext evaluation support (MXFP4 MOE) by @yiliu30 in #935
fix bug of imatrix contains 0 by @n1ck-guo in #955
fix rtn bug by @mengniwang95 in #966
enhance flux doc by @mengniwang95 in #967
clean code by @wenhuach21 in #968
support for model scope by @n1ck-guo in #957
merge main branch to alg_ext by @wenhuach21 in #970
fix cuda CI backend issue, fixtypo by @WeiweiZhang1 in #974
disable compile packing by default by @yiliu30 in #975
enhance auto device map and support XPU by @xin3he in #961
refine readme by @wenhuach21 in #978
cli support for positional arguments model by @n1ck-guo in #979
update bits in UT by @xin3he in #986
fix guff scheme and device_map bug by @n1ck-guo in #969
add support for Magistral-Small by @n1ck-guo in #980
support model_dtype and fix bug of scheme contains quotes, mllm eval by @n1ck-guo in #985
fix bug of cannot create adam compressor by @n1ck-guo in #992
[CI] Update python to 3.12 and torch to 2.8.0 by @XuehaoSun in #741
fix lm head bug and rm clear_mem_reach_threhold by @wenhuach21 in #997
Reduce peak gpu memory usage and support moe estimation by @xin3he in #981
fix cuda ut bug by @n1ck-guo in #999
fix mllm device_map ut by @Kaihui-intel in #1000
refine md tables by @WeiweiZhang1 in #994
Refine exllamav2 ut by @WeiweiZhang1 in #1001
Support for immediate saving to reduce ram usage by @Kaihui-intel in #965
Fix diffusion multi-device ut issue by @mengniwang95 in #1002
fix multiple devices map issue in calibration by @wenhuach21 in #1003
Fix non auto device map by @WeiweiZhang1 in #1005
fix multiple devices issue in Compressor and AutoScheme by @wenhuach21 in #1007
fix cuda low_cpu_mem_usage ut by @Kaihui-intel in #1010
Fix param missing bug by @mengniwang95 in #1008
add device list to clear memory by @wenhuach21 in #1009
Minor refactor for LLMC by @yiliu30 in #993
fix one clear memory issue by @wenhuach21 in #1011
add ut for gguf alg_ext and update so file by @n1ck-guo in #1012
fix multi cuda ut bug by @n1ck-guo in #1014
Including auto_scheme.default_alg into pypi by @chensuyue in #1018
add num_device check for set_auto_device_map_for_block_with_tuning by @xin3he in #1021
dispatch model with real max memory by @xin3he in #1022
fix cuda ut by @n1ck-guo in #1020
disable itrex format first by @WeiweiZhang1 in #998
fix bug of lm_head and dispatch model,gguf eval by @n1ck-guo in #1025
Fix the missing temporary name by @yiliu30 in #1029
Reduce mem usage of GPT-OSS by @yiliu30 in #1013
update gguf alg ext by @n1ck-guo in #1026
optimize vram for gguf and add momentum by @wenhuach21 in #1031
fix incorrect model name in readme by @wenhuach21 in #1035
Bump into v0.9.0 by @XuehaoSun in #1024

Full Changelog: v0.8.0...v0.9.0

Contributors

chensuyue, mengniwang95, and 7 other contributors

Assets 2

23 Oct 08:53

wenhuach21

v0.8.0

cee6ac3

v0.8.0

Highlights

merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
MXFP4 and MXFP8 loading support by @yiliu30 in #832
Support Flux quantization by @mengniwang95 in #850

What's Changed

fix cuda ut bug of use_deterministic_algorithms by @n1ck-guo in #805
remove torch compile in nv quant by @wenhuach21 in #807
Support loading for static quant weight fp8 act fp8 by @yiliu30 in #730
fix bug of q_layer_inputs by @n1ck-guo in #811
fix gptqmodel inference issue by @wenhuach21 in #813
Bump version to v0.7.0 by @XuehaoSun in #814
fix nsamples in get_dataloader by @wenhuach21 in #804
Refine logger and add envs by @yiliu30 in #817
Fix llm-compressor export by @Kaihui-intel in #820
enhance auto-round eval with vllm backend by @xin3he in #815
rm triton from requirements and correct the supported python version to 3.10(+) by @wenhuach21 in #824
move environment variable setting into eval function by @xin3he in #829
bump version to 0.8.0.dev by @XuehaoSun in #830
[STEP 1] merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
add support for scheme FP8_STATIC to export llm_compressor format by @n1ck-guo in #816
fix format checking bug by @WeiweiZhang1 in #836
MXFP4 and MXFP8 loading support by @yiliu30 in #832
hpu build with auto_round package name by @chensuyue in #838
fix hpu detect issue by @xin3he in #823
fix severe vram leak regression in auto-round format packing by @wenhuach21 in #842
fix tp device issue caused by device_map by @xin3he in #833
fix log error by @n1ck-guo in #843
[High Risk]Refine inference code by @wenhuach21 in #840
fix gguf fp8 input model and vram issue by @wenhuach21 in #844
NVFP4 Loading support by @yiliu30 in #839
fix extra config by @n1ck-guo in #847
change the method of detecting linear by @n1ck-guo in #849
fix device_map setting by @Kaihui-intel in #854
Add typo checker by @XuehaoSun in #846
fix parse layer config bug by @wenhuach21 in #856
Refine BackendInfo to include act fields by @yiliu30 in #848
fix bug of data_type fp8_sym by @n1ck-guo in #855
fix save_quantied format cheaker by @WeiweiZhang1 in #857
fix bug of get_layer_names_in_block by @wenhuach21 in #861
raise vlm loading error by @wenhuach21 in #863
fix FP8 model as input and backend issue by @wenhuach21 in #864
fix seqlen bug and calib slow of mllm tuning by @n1ck-guo in #871
fix device bug by @xin3he in #873
fix vllm backend evaluation by @xin3he in #872
Optimize CPU unit test workflow by @XuehaoSun in #881
Fix Cuda CI failures due to Transformers and AWQ incompatibility by @WeiweiZhang1 in #882
Support Flux quantization by @mengniwang95 in #850
fp8 exporting bugfix by @WeiweiZhang1 in #874
lm_eval stop try except and add back missing arguments by @xin3he in #884
Fix act calibration bug by @mengniwang95 in #880
restrict accelerate version by @wenhuach21 in #885
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #868
update require accelerate version by @n1ck-guo in #888

Full Changelog: v0.7.1...v0.8.0

Contributors

chensuyue, pre-commit-ci, and 8 other contributors

Assets 2

23 Sep 04:54

wenhuach21

v0.7.1

4d72b45

v0.7.1 patch release

fix severe vram leak regression in auto-round format packing @ #842

Assets 2

10 Sep 09:12

wenhuach21

v0.7.0

072cb8b

v0.7.0

🚀 Highlights

Enhanced NVFP4 algorithm and added support to export MXFP4/NVFP4 to the llm-compressor format
by @WeiweiZhang1 and @wenhuach21
Improved W2A16 quantization algorithm
by @wenhuach21
Introduced the scheme interface for easier configuration of quantization settings
by @wenhuach21
Added support for using FP8 models as input and str name as model input in API
by @wenhuach21 and @n1ck-guo
Unified device and device_map arguments and introduced device_map="auto"
to simplify quantization of extremely large models
by @Kaihui-intel

What's Changed

fix ut import issue by @WeiweiZhang1 in #686
support to export static afp8 model by @n1ck-guo in #662
Add ruff and isort by @XuehaoSun in #578
Improved log message for unsupported dataset by @wenhuach21 in #688
support rceil for mxfp by @wenhuach21 in #660
Add black and blacken-docs in pre-commit by @XuehaoSun in #692
support static global scale for nvfp4 and update readme by @wenhuach21 in #691
Update readme by @wenhuach21 in #695
Add script for cuda unit test by @XuehaoSun in #567
support to save image_processor by @n1ck-guo in #694
support for static activation quantization calibration with group_size by @n1ck-guo in #693
fix xpu oom checker by @n1ck-guo in #705
FIXBUG: CPU Offloading for Cache Blocks in Low-Memory GPU Systems or Single GPU on ROCM Configs by @JartX in #703
fix bug of zero accuracy for mx-fp by @n1ck-guo in #709
catch oom error and move to cpu directly by @n1ck-guo in #708
code optimization of vlm by @n1ck-guo in #704
fix critic bug of gguf tuning by @wenhuach21 in #710
support fp8 model and str as input in llm quantization by @wenhuach21 in #699
change act_scale to input_scale for fp8 export by @n1ck-guo in #711
simply CpuInfo class by @wenhuach21 in #715
Update step_by_step.md by @wenhuach21 in #717
fix bug of activation quant when act_max is None by @n1ck-guo in #718
Bump transformers in /test/test_cuda by @dependabot[bot] in #719
Freeze torchvision version in CI by @XuehaoSun in #720
update autoround mllm and support Mistral 3.2 series by @n1ck-guo in #713
Fix hpu CI by @XuehaoSun in #723
fix fp8 model input issue by @wenhuach21 in #724
update gguf convert.py and support for gpt-oss by @n1ck-guo in #721
new cast_to_nvfp4 with high performance by @xin3he in #727
make the tuning deterministic and move infrequently used arguments to kwargs by @wenhuach21 in #726
add original convert file and support for the newest llama.cpp by @n1ck-guo in #729
fix bug for exporting afp8 fake format by @n1ck-guo in #731
Fix torch_zp infer bug & API disable_deterministic_algorithms bug by @WeiweiZhang1 in #733
fix gguf mistral_common import by @n1ck-guo in #736
Enable mxfp exporting by @WeiweiZhang1 in #649
support for glm4.5 gguf by @n1ck-guo in #735
support auto-round-mllm command by @n1ck-guo in #742
Optimize pack zeros for int sym by @WeiweiZhang1 in #743
fix UT check for int zp by @WeiweiZhang1 in #745
support llama4 quant by @mengniwang95 in #744
fix bug of loading fp8 model by @n1ck-guo in #747
improved algorithm for int2 by @wenhuach21 in #748
Add Static FP8 KV Support by @yiliu30 in #737
refine code by @wenhuach21 in #749
mllm supports loading fp8 model and fix bug of loading fp8 model by @n1ck-guo in #750
support deepspeed LinearLayer and LinearAllreduce by @xin3he in #698
fix alg_ext moe and model str input bug by @wenhuach21 in #751
api support for fp8 model and mllm api support load from str by @n1ck-guo in #752
fix some torch compile warnings by @wenhuach21 in #755
Speedup FP4 packing by @yiliu30 in #760
fix_script_fp_layer_config_for_bits_checking by @WeiweiZhang1 in #756
support quant lm_head for rtn w8afp8 static quant by @n1ck-guo in #754
Revert "Speedup FP4 packing" by @yiliu30 in #763
refine code and fix activation quantization eval regression by @wenhuach21 in #762
fix gguf ut bug by @n1ck-guo in #767
fix gguf bug by int zp by @n1ck-guo in #771
Keep the model’s buffer dtype unchanged in most cases by @wenhuach21 in #770
fix set_layer_config bug by @wenhuach21 in #768
fix bug of auto_round exporting by @n1ck-guo in #772
gguf format supports for fp8 model by @n1ck-guo in #778
[API CHANGE] Stage 1 add quant scheme and consolidate device and device_map by @wenhuach21 in #774
Speedup FP4 packing by @yiliu30 in #766
hot fix for nvfp4 scheme by @wenhuach21 in #784
fix alg_ext regression and support mxfp4 in it with slight improvement by @wenhuach21 in #785
refine nvfp code, typofix by @WeiweiZhang1 in #777
mxfp/nvfp/fp8 support torch compile in tuning by @wenhuach21 in #789
refine nvfp4 algorithm by @wenhuach21 in #790
add limit arg for eval by @n1ck-guo in #764
torch backend bugfix and speedup ut by @WeiweiZhang1 in #793
Support auto device mapping by @Kaihui-intel in #781
fix bug and add nvfp in alg-ext with slight improvement by @wenhuach21 in #794
rename llmcompressor to llm_compressor for align with other formats by @WeiweiZhang1 in #780
align formats packing device to API by @WeiweiZhang1 in #795
add fp8 export format check by @n1ck-guo in #779
fix several regressions including lm-head quantization, 3bit asym torch backend,etc by @wenhuach21 in #796
refine readme by @wenhuach21 in #798
fix typo in readme by @wenhuach21 in #799
fix several cuda ut bug by @n1ck-guo in #797
enable model python files saving by @WeiweiZhang1 in #802
AutoRoundMLLM supports scheme and fix device_map=dict regression by @n1ck-guo in #801
improve the robustness of scheme by @wenhuach21 in #803
fix mxfp exporting by @WeiweiZhang1 in #806

New Contributors

@JartX made their first contribution in #703
@mengniwang95 made their first contribution in #744

Full Changelog: v0.6.0...v0.7.0

Contributors

JartX, dependabot, and 8 other contributors

Assets 2

24 Jul 02:33

wenhuach21

v0.6.0

dd95bdb

v0.6.0

Highlights

provide experimental support for gguf q*_k format and customized mixed bits setting
support xpu in triton backend by @wenhuach21 in #563
add torch backend by @WeiweiZhang1 in #555
provide initial support of llmcompressor format, only INT8 W8A8 dynamic quantization is supported by @xin3he in #646

What's Changed

bump version into v0.5.1 by @XuehaoSun in #540
Freeze pytorch & ipex version in CI by @XuehaoSun in #541
fix_quantization_config_for_inference by @WeiweiZhang1 in #542
[critic bug]remove redundant round in dq simulation by @wenhuach21 in #543
update readme by @wenhuach21 in #550
add recipes for qwen3 8b and 14b by @n1ck-guo in #552
itrex requires torch<2.7 by @XuehaoSun in #548
[GGUF STEP4] fix search bug and improve packing & eval speed by @n1ck-guo in #545
refine xpu requirement/config json and fix several issues by @wenhuach21 in #558
add UE5M3 simulation by @wenhuach21 in #562
support xpu in triton backend by @wenhuach21 in #563
fix typo in backend by @wenhuach21 in #564
update habana docker to 1.21.0 by @XuehaoSun in #566
Support for more gguf format and float zp for Q*_1 by @n1ck-guo in #560
update readme by @wenhuach21 in #569
update readme by @wenhuach21 in #571
support for llava-based hf model by @n1ck-guo in #568
add gguf accuracy data by @wenhuach21 in #574
add sym & asym gguf quant for gguf baseline (iter==0) by @n1ck-guo in #573
modify default asym 4bits auto-round format to awq, fix save folder typo for mllm by @WeiweiZhang1 in #575
improve the robustness of parsing vlm config by @wenhuach21 in #577
switch to transformers API in cpu ut by @wenhuach21 in #580
add torch backend by @WeiweiZhang1 in #555
fix awq exporting at group_size=-1 by @wenhuach21 in #579
refact cuda ut to facilitate automation by @n1ck-guo in #559
fix tensor shape mismatch error for API usage by @WeiweiZhang1 in #582
fix device bug at calibration by @wenhuach21 in #587
Update gguf_accuracy (q3_ks) by @SinpackKonmakan in #590
add recipes for deepseek-r1-0528 by @n1ck-guo in #588
correct errors of deepseek-r1-0528 recipes by @n1ck-guo in #591
fix cuda ut by @wenhuach21 in #592
Bump protobuf from 3.20.1 to 3.20.2 in /test/test_cuda by @dependabot[bot] in #585
rm unnecessary forward to improve speed by @wenhuach21 in #593
update readme by @wenhuach21 in #597
fix q2k bug by @n1ck-guo in #599
support for q4_k_m by @n1ck-guo in #596
fix vlm uttest path error by @WeiweiZhang1 in #601
fix lots of gguf critic bugs and support imatrix in rtn mode by @wenhuach21 in #595
fix gguf bug by @wenhuach21 in #610
mv some checkers by @wenhuach21 in #611
fix gguf packing bug and moe regression by @wenhuach21 in #614
support customized mixed bits for gguf by @wenhuach21 in #615
fix double quant sym bug by @wenhuach21 in #616
FP8 WOQ export by @wenhuach21 in #617
fix bug of q5_k_s w/ imatrix by @n1ck-guo in #620
add auto-round related vllm and transformers UT by @WeiweiZhang1 in #613
refine_doc_0624 by @WeiweiZhang1 in #619
fix not using imatrix for gguf at rtn mode by @wenhuach21 in #623
fix vlm hf config loading issue by @WeiweiZhang1 in #624
refine gguf rtn algorithm and fix bugs by @wenhuach21 in #630
fix gguf bug of moe models and lmhead/embedding bits setting regression by @n1ck-guo in #628
[BUG FIX] fix bug of deepseek gguf:q*k by @n1ck-guo in #637
support packing immediately for gguf to reduce ram usage by @wenhuach21 in #638
support llmcompressor format by @xin3he in #646
fix norm_bias_tuning by @wenhuach21 in #639
[W4A8]Fix Packing by @yiliu30 in #648
Integrate RTN quantization into GGUF packing to enhance robustness by @n1ck-guo in #644
Remove vlm cuda UT dependencies version restrictions by @XuehaoSun in #651
speedup mxfp tuning and fix nvfp bug by @wenhuach21 in #647
support two more calib datasets and fix embedding layer bug by @wenhuach21 in #653
fix some issues by @wenhuach21 in #655
fix bug of q4_0 and q5_0 at iters==0 by @n1ck-guo in #658
support vlm models for gguf format by @n1ck-guo in #654
fix bug of block-wise quant imatrix by @n1ck-guo in #663
fix gguf block-wise issue by @wenhuach21 in #664
fix bugs of export deepseek gguf format when iters=0 and q3k accuracy by @n1ck-guo in #665
handle zeros in imatrix by @wenhuach21 in #667
fix ut issue by @WeiweiZhang1 in #668
fix cuda hanging issue during packing by @WeiweiZhang1 in #669
support to use lm_eval for vlm by @n1ck-guo in #670
add trust remote code to gguf format load tokenizer by @n1ck-guo in #675
fix 3bits asym accuracy and calib dataset issues by @WeiweiZhang1 in #674
restrict accelerate version to reduce ram usage by @wenhuach21 in #673
rm low_cpu when loading the model by @wenhuach21 in #676
rm_old_vlm_cuda_ut by @WeiweiZhang1 in #678
update gguf convert file and fix bug of permute bug by @n1ck-guo in #679
fix gguf regression for large models by @wenhuach21 in #680
fix gemma vlm gguf regression by @wenhuach21 in #685

New Contributors

@SinpackKonmakan made their first contribution in #590
@xin3he made their first contribution in #646

Full Changelog: v0.5.1...v0.6.0

Contributors

dependabot, xin3he, and 6 other contributors

Assets 2

23 Apr 08:50

wenhuach21

v0.5.1

73669aa

v0.5.1:bug fix release

What's Changed

bump version into v0.5.0 by @XuehaoSun in #538
fix triton multiple gpus and some other issues by @wenhuach21 in #539

Full Changelog: v0.5.0...v0.5.1

Contributors

wenhuach21 and XuehaoSun

Assets 2

22 Apr 08:05

wenhuach21

v0.5.0

e90f991

v0.5.0

Highlights

refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
support xpu in tuning and inference by @wenhuach21 in #481
support for more vlms by @n1ck-guo in #390
change quantization method name and made several refinements by @wenhuach21 in #500
support rtn via iters==0 by @wenhuach21 in #510
fix bug of mix calib dataset by @n1ck-guo in #492

What's Changed

support xpu in tuning and inference by @wenhuach21 in #481
add light ut, fixtypos by @WeiweiZhang1 in #483
bump into v0.4.7 by @XuehaoSun in #487
fix dataset combine bug by @wenhuach21 in #489
fix llama 8b time cost by @WeiweiZhang1 in #490
update 2bits acc results by @WeiweiZhang1 in #491
fix bug of mix calib dataset by @n1ck-guo in #492
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #494
[GGUF support step3]patch for double quant by @n1ck-guo in #473
refine inference backend/code step 1 by @wenhuach21 in #486
refine inference step 2 by @wenhuach21 in #498
change quantization method name and made several refinements by @wenhuach21 in #500
fix bug of awq/gptq modules_to_not_convert by @n1ck-guo in #501
use --tasks to control evaluation enabling by @wenhuach21 in #505
fix gguf eval regression bug by @n1ck-guo in #506
change to new api in readme by @wenhuach21 in #507
fix setup issue on cuda machine by @wenhuach21 in #511
support rtn via iters==0 by @wenhuach21 in #510
fix critical bug of get_multimodal_block_names by @n1ck-guo in #509
Update requirements-lib.txt by @yiliu30 in #513
add group_size divisible check in backend by @wenhuach21 in #512
support for more vlms by @n1ck-guo in #390
move gguf-dq test to cuda by @n1ck-guo in #520
fix bs!=1 for gemma and MiniMax-Text-01 by @wenhuach21 in #515
add regex support in layer_config setting by @wenhuach21 in #519
patch for vlm by @n1ck-guo in #518
rename backend to packing_format in config.json by @wenhuach21 in #521
fix example's model_dtype by @WeiweiZhang1 in #523
rm fp16 export in autoround format by @wenhuach21 in #525
update convert_hf_to_gguf to support more models by @n1ck-guo in #524
fix light config by @WeiweiZhang1 in #526
fix typos, add model card link for VLMs by @WeiweiZhang1 in #527
add backend readme by @wenhuach21 in #528
update mllm readme by @WeiweiZhang1 in #530
fix bug of cuda ut by @n1ck-guo in #532
fix inference issue by @wenhuach21 in #529
update readme by @wenhuach21 in #531
refine readme by @WeiweiZhang1 in #536
fix cuda ut by @n1ck-guo in #537

Full Changelog: v0.4.7...v0.5.0

Contributors

pre-commit-ci, yiliu30, and 4 other contributors

Assets 2

01 Apr 09:50

wenhuach21

v0.4.7

2d904a4

v0.4.7

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

step-1 support naive double quant in tuning by @wenhuach21 in #442
fix critic bug of mxfp4 by @wenhuach21 in #451
update readme by @wenhuach21 in #455
update eval by @n1ck-guo in #450
awq exporting bugfix by @WeiweiZhang1 in #456
Support force loading into autoround Format by @WeiweiZhang1 in #453
20x for awq and 4x for gptq packing speedup by @wenhuach21 in #459
fixl eval bug by @n1ck-guo in #461
[STEP-1]W4Afp8 export by @wenhuach21 in #378
[HPU] Update W4A8 for HPU by @yiliu30 in #467
support for gemma3 by @n1ck-guo in #468
upload_auto-round-light results by @WeiweiZhang1 in #454
GGUF support step2: add naive Q2_KS and Q4_KS by @n1ck-guo in #448
fix incorrect recipe data by @WeiweiZhang1 in #471
support for mistral3 by @n1ck-guo in #472
support to export gemma3 gguf format by @n1ck-guo in #470
Increase unit test timeout from 120 to 240 minutes by @XuehaoSun in #474
support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
rm redundant line break by @WeiweiZhang1 in #475
Temporarily close qxk api for new release by @n1ck-guo in #478
add restrict for exporting act-quant models by @n1ck-guo in #480

Full Changelog: v0.4.6...v0.4.7

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

24 Feb 09:23

wenhuach21

v0.4.6

1320752

v0.4.6

Highlights:

1 set torch compile to false by default in #447
2 Fix packing hang and force to fp16 at exporting in #430
3 align auto_quantizer with Transformers 4.49 in #437

What's Changed

Fix packing hang, torch compile and force to fp16 at exporting by @wenhuach21 in #430
fix nblocks issues by @wenhuach21 in #432
rm gc collect in packing by @wenhuach21 in #438
align auto_quantizer with main branch in Transformers by @WeiweiZhang1 in #437
[HPU]Fix compile bug when quant layer by @yiliu30 in #441
remove tricky setting in mxfp4 by @wenhuach21 in #445
fix bug of evaluate user model by @n1ck-guo in #444
Refine funcs by @WeiweiZhang1 in #446
set torch compile to false by default by @WeiweiZhang1 in #447

Full Changelog: v0.4.5...v0.4.6

Contributors

yiliu30, wenhuach21, and 2 other contributors

Assets 2

27 Jan 12:12

wenhuach21

v0.4.5

e38a306

v0.4.5

Highlights:
We have enhanced support for extremely large models with the following updates:

Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning

Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma

Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format

What's Changed

update format readme by @wenhuach21 in #411
fix log bug and device "auto" bug by @n1ck-guo in #409
speedup packing stage for autogptq and autoawq format by @wenhuach21 in #407
support naive multi-card tuning by @wenhuach21 in #415
support bf16 inference for autoround format by @wenhuach21 in #420
enable backup pile dataset loading by @WeiweiZhang1 in #417
fix evaluation device bug, relate to issue 413 by @n1ck-guo in #419
support to export deepseek v3 gguf format by @n1ck-guo in #416
fix cuda UT torch_dtype by @WeiweiZhang1 in #423
fix eval trust_remote_code by @n1ck-guo in #424

Full Changelog: v0.4.4...v0.4.5

Contributors

wenhuach21, WeiweiZhang1, and n1ck-guo

Assets 2

Releases: intel/auto-round

v0.9.0

Highlights

What's Changed

Contributors

Uh oh!

v0.8.0

Highlights

What's Changed

Contributors

Uh oh!

v0.7.1 patch release

Uh oh!

v0.7.0

🚀 Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.0

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.1:bug fix release

What's Changed

Contributors

Uh oh!

v0.5.0

Highlights

What's Changed

Contributors

Uh oh!

v0.4.7

Highlights

What's Changed

Contributors

Uh oh!

v0.4.6

Highlights:

What's Changed

Contributors

Uh oh!

v0.4.5

What's Changed

Contributors

Uh oh!