Skip to content

Conversation

@hxssgaa
Copy link

@hxssgaa hxssgaa commented Dec 16, 2025

Add ColQwen3 support

  • Add ColQwen3/BiQwen3 modeling and processors with Qwen3-VL checkpoint mapping and projection head.
  • Export new classes, add training entrypoint under scripts/configs/qwen3, and surface in README/CHANGELOG.
  • Add relevant tessting scripts.

The fine-tuned colqwen3 models are below including with benchmark results:

@ManuelFay
Copy link
Collaborator

Some tests don't run.
(1) the ruff CI (maybe not your fault btu if this can be fixed great)

(2) The tests. should we bump the transformers version and we're good ?

@athrael-soju
Copy link
Contributor

@hxssgaa thanks for bringing these over. Having a blast using them!

@hxssgaa
Copy link
Author

hxssgaa commented Dec 17, 2025

Some tests don't run. (1) the ruff CI (maybe not your fault btu if this can be fixed great)

(2) The tests. should we bump the transformers version and we're good ?

We have fixed ruff lint error in our commits, the other lint errors are not from us.

For the transformers version it must be at least 4.57.0.

@sunxichen
Copy link

I try to run tomoro-colqwen3-embed-8b with this PR, and I encounter following errors:

(.venv) root@ubuntu:~/colqwen3-8b/service# python colqwen_vector_loader.py 

Loading weights: 0it [00:00, ?it/s]





ColQwen3 LOAD REPORT from: /root/tomoro-colqwen3-embed-8b-local

Key                                                                      | Status     | 

-------------------------------------------------------------------------+------------+-

vlm.model.visual.blocks.{0...26}.norm1.bias                              | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.weight                            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.weight                   | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.gate_proj.weight            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.bias                          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.bias                           | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.post_attention_layernorm.weight | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_norm.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.weight                   | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.weight                        | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.down_proj.weight            | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.input_layernorm.weight          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.weight                         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight       | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_norm.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_proj.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.o_proj.weight         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.weight             | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.up_proj.weight              | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.v_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm1.weight                            | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.bias               | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.bias                              | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.weight                                | UNEXPECTED | 

vlm.model.visual.merger.norm.weight                                      | UNEXPECTED | 

vlm.model.language_model.embed_tokens.weight                             | UNEXPECTED | 

embedding_proj_layer.bias                                                | UNEXPECTED | 

vlm.model.visual.pos_embed.weight                                        | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.bias                                  | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.weight                                | UNEXPECTED | 

vlm.lm_head.weight                                                       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.bias                                  | UNEXPECTED | 

embedding_proj_layer.weight                                              | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.weight                                 | UNEXPECTED | 

vlm.model.language_model.norm.weight                                     | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.bias                                   | UNEXPECTED | 

vlm.model.visual.merger.norm.bias                                        | UNEXPECTED | 

visual.blocks.{0...26}.mlp.linear_fc2.weight                             | MISSING    | 

language_model.layers.{0...31}.input_layernorm.weight                    | MISSING    | 

language_model.layers.{0...31}.self_attn.k_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.up_proj.weight                        | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.bias                               | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc2.bias                               | MISSING    | 

language_model.layers.{0...31}.self_attn.q_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.norm1.weight                                      | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.weight                             | MISSING    | 

language_model.layers.{0...31}.self_attn.v_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.down_proj.weight                      | MISSING    | 

visual.blocks.{0...26}.attn.proj.weight                                  | MISSING    | 

visual.blocks.{0...26}.norm2.bias                                        | MISSING    | 

visual.blocks.{0...26}.norm1.bias                                        | MISSING    | 

language_model.layers.{0...31}.self_attn.q_norm.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.qkv.weight                                   | MISSING    | 

language_model.layers.{0...31}.mlp.gate_proj.weight                      | MISSING    | 

language_model.layers.{0...31}.post_attention_layernorm.weight           | MISSING    | 

visual.blocks.{0...26}.attn.qkv.bias                                     | MISSING    | 

language_model.layers.{0...31}.self_attn.k_norm.weight                   | MISSING    | 

language_model.layers.{0...31}.self_attn.o_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.proj.bias                                    | MISSING    | 

visual.blocks.{0...26}.norm2.weight                                      | MISSING    | 

visual.patch_embed.proj.weight                                           | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight                 | MISSING    | 

visual.merger.linear_fc2.bias                                            | MISSING    | 

visual.pos_embed.weight                                                  | MISSING    | 

visual.patch_embed.proj.bias                                             | MISSING    | 

visual.merger.norm.bias                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.bias                         | MISSING    | 

custom_text_proj.weight                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias                   | MISSING    | 

custom_text_proj.bias                                                    | MISSING    | 

visual.merger.linear_fc1.bias                                            | MISSING    | 

language_model.embed_tokens.weight                                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.weight                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias                   | MISSING    | 

visual.merger.linear_fc2.weight                                          | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight                 | MISSING    | 

visual.merger.norm.weight                                                | MISSING    | 

language_model.norm.weight                                               | MISSING    | 

visual.merger.linear_fc1.weight                                          | MISSING    | 



Notes:

- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.

- MISSING       :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

ERROR:base_colqwen:模型加载失败:'NoneType' object has no attribute 'convert_tokens_to_ids'

Traceback (most recent call last):

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 102, in <module>

    loader.batch_insert_to_milvus()

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 32, in batch_insert_to_milvus

    self.initialize_model()  # 先加载模型

    ^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/service/base_colqwen.py", line 33, in initialize_model

    self.processor = ColQwen3Processor.from_pretrained(

                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 42, in from_pretrained

    instance = super().from_pretrained(

               ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1414, in from_pretrained

    return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict

    processor = cls(*args, **valid_kwargs)

                ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 32, in __init__

    super().__init__(*args, **kwargs)

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 69, in __init__

    else tokenizer.convert_tokens_to_ids(self.image_token)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

How to fix it? or Is it my wrong configuration?

@hxssgaa
Copy link
Author

hxssgaa commented Dec 19, 2025

I try to run tomoro-colqwen3-embed-8b with this PR, and I encounter following errors:

(.venv) root@ubuntu:~/colqwen3-8b/service# python colqwen_vector_loader.py 

Loading weights: 0it [00:00, ?it/s]





ColQwen3 LOAD REPORT from: /root/tomoro-colqwen3-embed-8b-local

Key                                                                      | Status     | 

-------------------------------------------------------------------------+------------+-

vlm.model.visual.blocks.{0...26}.norm1.bias                              | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.weight                            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.weight                   | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.gate_proj.weight            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.bias                          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.bias                           | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.post_attention_layernorm.weight | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_norm.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.weight                   | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.weight                        | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.down_proj.weight            | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.input_layernorm.weight          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.weight                         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight       | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_norm.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_proj.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.o_proj.weight         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.weight             | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.up_proj.weight              | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.v_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm1.weight                            | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.bias               | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.bias                              | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.weight                                | UNEXPECTED | 

vlm.model.visual.merger.norm.weight                                      | UNEXPECTED | 

vlm.model.language_model.embed_tokens.weight                             | UNEXPECTED | 

embedding_proj_layer.bias                                                | UNEXPECTED | 

vlm.model.visual.pos_embed.weight                                        | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.bias                                  | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.weight                                | UNEXPECTED | 

vlm.lm_head.weight                                                       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.bias                                  | UNEXPECTED | 

embedding_proj_layer.weight                                              | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.weight                                 | UNEXPECTED | 

vlm.model.language_model.norm.weight                                     | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.bias                                   | UNEXPECTED | 

vlm.model.visual.merger.norm.bias                                        | UNEXPECTED | 

visual.blocks.{0...26}.mlp.linear_fc2.weight                             | MISSING    | 

language_model.layers.{0...31}.input_layernorm.weight                    | MISSING    | 

language_model.layers.{0...31}.self_attn.k_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.up_proj.weight                        | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.bias                               | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc2.bias                               | MISSING    | 

language_model.layers.{0...31}.self_attn.q_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.norm1.weight                                      | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.weight                             | MISSING    | 

language_model.layers.{0...31}.self_attn.v_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.down_proj.weight                      | MISSING    | 

visual.blocks.{0...26}.attn.proj.weight                                  | MISSING    | 

visual.blocks.{0...26}.norm2.bias                                        | MISSING    | 

visual.blocks.{0...26}.norm1.bias                                        | MISSING    | 

language_model.layers.{0...31}.self_attn.q_norm.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.qkv.weight                                   | MISSING    | 

language_model.layers.{0...31}.mlp.gate_proj.weight                      | MISSING    | 

language_model.layers.{0...31}.post_attention_layernorm.weight           | MISSING    | 

visual.blocks.{0...26}.attn.qkv.bias                                     | MISSING    | 

language_model.layers.{0...31}.self_attn.k_norm.weight                   | MISSING    | 

language_model.layers.{0...31}.self_attn.o_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.proj.bias                                    | MISSING    | 

visual.blocks.{0...26}.norm2.weight                                      | MISSING    | 

visual.patch_embed.proj.weight                                           | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight                 | MISSING    | 

visual.merger.linear_fc2.bias                                            | MISSING    | 

visual.pos_embed.weight                                                  | MISSING    | 

visual.patch_embed.proj.bias                                             | MISSING    | 

visual.merger.norm.bias                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.bias                         | MISSING    | 

custom_text_proj.weight                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias                   | MISSING    | 

custom_text_proj.bias                                                    | MISSING    | 

visual.merger.linear_fc1.bias                                            | MISSING    | 

language_model.embed_tokens.weight                                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.weight                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias                   | MISSING    | 

visual.merger.linear_fc2.weight                                          | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight                 | MISSING    | 

visual.merger.norm.weight                                                | MISSING    | 

language_model.norm.weight                                               | MISSING    | 

visual.merger.linear_fc1.weight                                          | MISSING    | 



Notes:

- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.

- MISSING       :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

ERROR:base_colqwen:模型加载失败:'NoneType' object has no attribute 'convert_tokens_to_ids'

Traceback (most recent call last):

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 102, in <module>

    loader.batch_insert_to_milvus()

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 32, in batch_insert_to_milvus

    self.initialize_model()  # 先加载模型

    ^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/service/base_colqwen.py", line 33, in initialize_model

    self.processor = ColQwen3Processor.from_pretrained(

                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 42, in from_pretrained

    instance = super().from_pretrained(

               ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1414, in from_pretrained

    return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict

    processor = cls(*args, **valid_kwargs)

                ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 32, in __init__

    super().__init__(*args, **kwargs)

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 69, in __init__

    else tokenizer.convert_tokens_to_ids(self.image_token)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

How to fix it? or Is it my wrong configuration?

Hi, note that Colpali format isn't directly compatible with current Tomoro-colqwen3 hf models as they are converted from this Colpali format, you can refer to the Tomoro HF repo for how to run the models for now. We intent to also share the conversion script later, but it shouldn't belong to this repo.

@ManuelFay
Copy link
Collaborator

So.one thing I don't understand yet is that this PR is not compatible with the Tomoro checkpoints you shared ?

Isn't it better to make this compatible ?
Is it just naming of the parameters ? Maybe we can have a copy of the model with a prefix in its name or something that directly loads ?

@01234568
Copy link

01234568 commented Dec 22, 2025

We based our huggingface repo off the colqwen2 implementation on transformers here: https://github.com/huggingface/transformers/tree/main/src/transformers/models/colqwen2, which uses a different naming convention for params compared to this repo. Should we unify the names or keep 2 separate versions?

@Mungeryang
Copy link

Mungeryang commented Dec 25, 2025

I trained a 2B smaller colqwen3 model on the Qwen3-VL-2B-Instruct model, welcome to follow and use~
https://github.com/Mungeryang/colqwen3
https://huggingface.co/goodman2001/colqwen3-v0.2

Copy link
Collaborator

@ManuelFay ManuelFay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rebase on main so that linting gets corrected ?

The pther interrogation I have is whether the pyprojecxt should get updatred. From my understanding, qwen3 is only in newer versions of transformers so I am guessing we should bump the minimal transformers package right ?

The rest looks great !

TransAMrit and others added 4 commits December 26, 2025 20:26
)

* looks like colqwen 2.5 omni support was accidentally removed in illuin-tech#339

EDIT: that was based upon just looking at the main __init__.py. looking
at the other files, perhaps it was intentionally removed...

* found & fixed resize_token_embeddings() breakage
* lint

* lint examples
@hxssgaa
Copy link
Author

hxssgaa commented Dec 26, 2025

Is it possible to rebase on main so that linting gets corrected ?

The pther interrogation I have is whether the pyprojecxt should get updatred. From my understanding, qwen3 is only in newer versions of transformers so I am guessing we should bump the minimal transformers package right ?

The rest looks great !

Already rebased to main, and yes, I have updated the minima transformer version to be >=4.57.0 for Qwen3-VL support.

@hxssgaa hxssgaa requested a review from ManuelFay December 26, 2025 13:23
@hxssgaa hxssgaa requested a review from ManuelFay December 26, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants