convert: rework ftype heuristics #18214

taronaeo · 2025-12-20T04:36:02Z

This PR updates the heuristic detection logic for the default ftype. When --outtype is not specified, the heuristics will attempt to figure out the highest-fidelity 16-bit ftype based on the first tensor.

If the first tensor does not match the following, it will continue to the second and nth tensor until it finds a tensor that matches:

At least a 2D tensor
Tensor not F32 dtype

If all tensors do not match the criteria above, it will default to f16 ftype.

Tested against the following models:

Granite-4.0-1B, defaulted to bf16 ftype (correct)
GPT-NeoX-20B, defaulted to f16 ftype (correct)

Note about alternative methods, such as relying on config.json dtype: some finetunes or quantisation lie about the correct dtype, so it can't be trusted. And some models like GPT-OSS do not actually contain a dtype key within their config.json, so the easier route was to do heuristics.

AI Declaration: AI was used when creating this PR to identify existing logic relating to heuristics and to scaffold the code.

taronaeo · 2025-12-20T04:38:00Z

I was wondering if we should warn the user about using --outtype f16 when the model is trained using bfloat16 since this issue came up with Granite 4.0. It would be good to stop these occurrences of ??????? or having faulty models distributed online.

Signed-off-by: Aaron Teo <[email protected]> convert: fix type-check Signed-off-by: Aaron Teo <[email protected]> convert: bring back heuristics comment Signed-off-by: Aaron Teo <[email protected]>

CISC · 2025-12-20T12:03:03Z

Is it really necessary to check all the tensors? Wasn't the issue just that it defaulted to f16 instead of auto (and that auto only really checks for f16)?

The reason I'm saying is because this duplicates tensor loading logic that really needs to be refactored, see #18043 (review)

Apart from MXFP4/FP8 models I can't remember seeing any safetensors with mixed datatypes...

pwilkin · 2025-12-20T13:06:32Z

@CISC once I finish my refactoring of convert_hf_to_gguf.py it won't really make that much of a difference :) I don't think it's that complicated, it's just like 10 lines of code.

pwilkin

The heuristics should work a bit differently - you should only take into account (a) at least 2D tensors (b) non-F32 tensors.

taronaeo · 2025-12-21T04:53:12Z

Is it really necessary to check all the tensors?

I was a bit unsure whether the first tensor gave enough information about the dtype to determine the correct type. But looking at how many tensors there are for models bigger than 1B, I guess reverting back to the first tensor still makes sense.

Edit: Ignore what I said. Will revert to using first tensor :)

taronaeo · 2025-12-21T05:32:36Z

The heuristics should work a bit differently - you should only take into account (a) at least 2D tensors (b) non-F32 tensors.

In retrospect, I think looping through all the tensors might not have been a good idea, especially if the model is large. In this case I'll revert back to using the first tensor and I guess, check to see if it's tensor.dim >= 2 and tensor.dtype != torch.float32.

If the conditions don't meet, we'll just jump to the next tensor and check again. Let me know your thoughts about this :)

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-12-21T07:08:11Z

Updated this PR and retained the original --outtype auto logic whereby it will only choose the highest-fidelity 16-bit ftype. Heuristics now check the first tensor for either f16 or bf16 dtype and if it doesn't match, it will continue until it gets a match.

If there are no matches, it defaults to f16 ftype. Updated the PR description also, PTAL again.

CISC

Might want to enumerate and have some threshold before giving up, but optional.

CISC · 2025-12-21T08:46:00Z

convert_hf_to_gguf.py

-                logger.info(f"choosing --outtype f16 from first tensor type ({first_tensor.dtype})")
-                self.ftype = gguf.LlamaFileType.MOSTLY_F16
+            for _, tensor in self.get_tensors():
+                if tensor.dim() < 2 and tensor.dtype == torch.float32:


Suggested change

if tensor.dim() < 2 and tensor.dtype == torch.float32:

if tensor.dim() < 2:

taronaeo requested a review from CISC as a code owner December 20, 2025 04:36

github-actions bot added the python python script changes label Dec 20, 2025

taronaeo requested a review from pwilkin December 20, 2025 04:37

taronaeo mentioned this pull request Dec 20, 2025

Feature Request: convert_hf_to_gguf.py to default to the original precision #18182

Open

4 tasks

loci-dev mentioned this pull request Dec 20, 2025

UPSTREAM PR #18214: convert: rework ftype heuristics auroralabs-loci/llama.cpp#633

Open

convert: rework ftype heuristics

eae4555

Signed-off-by: Aaron Teo <[email protected]> convert: fix type-check Signed-off-by: Aaron Teo <[email protected]> convert: bring back heuristics comment Signed-off-by: Aaron Teo <[email protected]>

taronaeo force-pushed the feat/default_precision branch from bcc2001 to eae4555 Compare December 20, 2025 04:56

pwilkin requested changes Dec 20, 2025

View reviewed changes

taronaeo added 2 commits December 21, 2025 14:19

convert: revert to using first tensor

bf0a37f

Signed-off-by: Aaron Teo <[email protected]>

convert: rework heuristics logic

0d01739

Signed-off-by: Aaron Teo <[email protected]>

CISC approved these changes Dec 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert: rework ftype heuristics #18214

convert: rework ftype heuristics #18214

taronaeo commented Dec 20, 2025 •

edited

Loading

Uh oh!

taronaeo commented Dec 20, 2025

Uh oh!

CISC commented Dec 20, 2025

Uh oh!

pwilkin commented Dec 20, 2025

Uh oh!

pwilkin left a comment

Uh oh!

taronaeo commented Dec 21, 2025 •

edited

Loading

Uh oh!

taronaeo commented Dec 21, 2025

Uh oh!

taronaeo commented Dec 21, 2025

Uh oh!

CISC left a comment

Uh oh!

CISC Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if tensor.dim() < 2 and tensor.dtype == torch.float32:
	if tensor.dim() < 2:

convert: rework ftype heuristics #18214

Are you sure you want to change the base?

convert: rework ftype heuristics #18214

Conversation

taronaeo commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Dec 20, 2025

Uh oh!

CISC commented Dec 20, 2025

Uh oh!

pwilkin commented Dec 20, 2025

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

taronaeo commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Dec 21, 2025

Uh oh!

taronaeo commented Dec 21, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taronaeo commented Dec 20, 2025 •

edited

Loading

taronaeo commented Dec 21, 2025 •

edited

Loading