Reduction in the number of parameters #1584

sri-fiddler · 2025-01-28T07:30:29Z

The number of parameters for a Llama 3.1 8B model is 8.03B according to Huggingface. The same Unsloth model, when quantised to 4bits is 4.56B. How and why is there a reduction in the number of parameters? As far as I am aware, quantisation reduces the size of each parameter, but it doesn't do away with it.

Meta Llama 3.1 8B : https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

Unsloth Llama 3.1 8B : https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

danielhanchen · 2025-01-28T11:02:21Z

Oh not a bug - it's because the pre-quantized uploads saves it directly in 4bit, and packs them into 1 8bit number, hence the reduction

sri-fiddler · 2025-01-29T15:47:11Z

This is quite brilliant. Good optimisation.

sri-fiddler closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction in the number of parameters #1584

Reduction in the number of parameters #1584

sri-fiddler commented Jan 28, 2025

danielhanchen commented Jan 28, 2025

sri-fiddler commented Jan 29, 2025

Reduction in the number of parameters #1584

Reduction in the number of parameters #1584

Comments

sri-fiddler commented Jan 28, 2025

danielhanchen commented Jan 28, 2025

sri-fiddler commented Jan 29, 2025