Add ColQwen3 Support #366

hxssgaa · 2025-12-16T10:40:09Z

Add ColQwen3 support

Add ColQwen3/BiQwen3 modeling and processors with Qwen3-VL checkpoint mapping and projection head.
Export new classes, add training entrypoint under scripts/configs/qwen3, and surface in README/CHANGELOG.
Add relevant tessting scripts.

The fine-tuned colqwen3 models are below including with benchmark results:

ManuelFay · 2025-12-16T14:40:43Z

Some tests don't run.
(1) the ruff CI (maybe not your fault btu if this can be fixed great)

(2) The tests. should we bump the transformers version and we're good ?

athrael-soju · 2025-12-16T21:06:31Z

@hxssgaa thanks for bringing these over. Having a blast using them!

hxssgaa · 2025-12-17T02:59:58Z

Some tests don't run. (1) the ruff CI (maybe not your fault btu if this can be fixed great)

(2) The tests. should we bump the transformers version and we're good ?

We have fixed ruff lint error in our commits, the other lint errors are not from us.

For the transformers version it must be at least 4.57.0.

sunxichen · 2025-12-19T10:28:06Z

I try to run tomoro-colqwen3-embed-8b with this PR, and I encounter following errors:

(.venv) root@ubuntu:~/colqwen3-8b/service# python colqwen_vector_loader.py 

Loading weights: 0it [00:00, ?it/s]





ColQwen3 LOAD REPORT from: /root/tomoro-colqwen3-embed-8b-local

Key                                                                      | Status     | 

-------------------------------------------------------------------------+------------+-

vlm.model.visual.blocks.{0...26}.norm1.bias                              | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.weight                            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.weight                   | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.gate_proj.weight            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.bias                          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.bias                           | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.post_attention_layernorm.weight | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_norm.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.weight                   | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.weight                        | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.down_proj.weight            | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.input_layernorm.weight          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.weight                         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight       | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_norm.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_proj.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.o_proj.weight         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.weight             | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.up_proj.weight              | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.v_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm1.weight                            | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.bias               | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.bias                              | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.weight                                | UNEXPECTED | 

vlm.model.visual.merger.norm.weight                                      | UNEXPECTED | 

vlm.model.language_model.embed_tokens.weight                             | UNEXPECTED | 

embedding_proj_layer.bias                                                | UNEXPECTED | 

vlm.model.visual.pos_embed.weight                                        | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.bias                                  | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.weight                                | UNEXPECTED | 

vlm.lm_head.weight                                                       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.bias                                  | UNEXPECTED | 

embedding_proj_layer.weight                                              | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.weight                                 | UNEXPECTED | 

vlm.model.language_model.norm.weight                                     | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.bias                                   | UNEXPECTED | 

vlm.model.visual.merger.norm.bias                                        | UNEXPECTED | 

visual.blocks.{0...26}.mlp.linear_fc2.weight                             | MISSING    | 

language_model.layers.{0...31}.input_layernorm.weight                    | MISSING    | 

language_model.layers.{0...31}.self_attn.k_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.up_proj.weight                        | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.bias                               | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc2.bias                               | MISSING    | 

language_model.layers.{0...31}.self_attn.q_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.norm1.weight                                      | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.weight                             | MISSING    | 

language_model.layers.{0...31}.self_attn.v_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.down_proj.weight                      | MISSING    | 

visual.blocks.{0...26}.attn.proj.weight                                  | MISSING    | 

visual.blocks.{0...26}.norm2.bias                                        | MISSING    | 

visual.blocks.{0...26}.norm1.bias                                        | MISSING    | 

language_model.layers.{0...31}.self_attn.q_norm.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.qkv.weight                                   | MISSING    | 

language_model.layers.{0...31}.mlp.gate_proj.weight                      | MISSING    | 

language_model.layers.{0...31}.post_attention_layernorm.weight           | MISSING    | 

visual.blocks.{0...26}.attn.qkv.bias                                     | MISSING    | 

language_model.layers.{0...31}.self_attn.k_norm.weight                   | MISSING    | 

language_model.layers.{0...31}.self_attn.o_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.proj.bias                                    | MISSING    | 

visual.blocks.{0...26}.norm2.weight                                      | MISSING    | 

visual.patch_embed.proj.weight                                           | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight                 | MISSING    | 

visual.merger.linear_fc2.bias                                            | MISSING    | 

visual.pos_embed.weight                                                  | MISSING    | 

visual.patch_embed.proj.bias                                             | MISSING    | 

visual.merger.norm.bias                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.bias                         | MISSING    | 

custom_text_proj.weight                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias                   | MISSING    | 

custom_text_proj.bias                                                    | MISSING    | 

visual.merger.linear_fc1.bias                                            | MISSING    | 

language_model.embed_tokens.weight                                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.weight                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias                   | MISSING    | 

visual.merger.linear_fc2.weight                                          | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight                 | MISSING    | 

visual.merger.norm.weight                                                | MISSING    | 

language_model.norm.weight                                               | MISSING    | 

visual.merger.linear_fc1.weight                                          | MISSING    | 



Notes:

- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.

- MISSING       :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

ERROR:base_colqwen:模型加载失败：'NoneType' object has no attribute 'convert_tokens_to_ids'

Traceback (most recent call last):

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 102, in <module>

    loader.batch_insert_to_milvus()

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 32, in batch_insert_to_milvus

    self.initialize_model()  # 先加载模型

    ^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/service/base_colqwen.py", line 33, in initialize_model

    self.processor = ColQwen3Processor.from_pretrained(

                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 42, in from_pretrained

    instance = super().from_pretrained(

               ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1414, in from_pretrained

    return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict

    processor = cls(*args, **valid_kwargs)

                ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 32, in __init__

    super().__init__(*args, **kwargs)

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 69, in __init__

    else tokenizer.convert_tokens_to_ids(self.image_token)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

How to fix it? or Is it my wrong configuration?

hxssgaa · 2025-12-19T13:31:29Z

I try to run tomoro-colqwen3-embed-8b with this PR, and I encounter following errors:

(.venv) root@ubuntu:~/colqwen3-8b/service# python colqwen_vector_loader.py 

Loading weights: 0it [00:00, ?it/s]





ColQwen3 LOAD REPORT from: /root/tomoro-colqwen3-embed-8b-local

Key                                                                      | Status     | 

-------------------------------------------------------------------------+------------+-

vlm.model.visual.blocks.{0...26}.norm1.bias                              | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.weight                            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc1.weight                   | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.gate_proj.weight            | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.bias                          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.bias                           | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.post_attention_layernorm.weight | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_norm.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.weight                   | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.proj.weight                        | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.down_proj.weight            | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.input_layernorm.weight          | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.attn.qkv.weight                         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight       | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_norm.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.k_proj.weight         | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.o_proj.weight         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.weight             | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.mlp.up_proj.weight              | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.v_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm1.weight                            | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.norm.bias               | UNEXPECTED | 

vlm.model.language_model.layers.{0...35}.self_attn.q_proj.weight         | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.mlp.linear_fc2.bias                     | UNEXPECTED | 

vlm.model.visual.blocks.{0...26}.norm2.bias                              | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias         | UNEXPECTED | 

vlm.model.visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.weight                                | UNEXPECTED | 

vlm.model.visual.merger.norm.weight                                      | UNEXPECTED | 

vlm.model.language_model.embed_tokens.weight                             | UNEXPECTED | 

embedding_proj_layer.bias                                                | UNEXPECTED | 

vlm.model.visual.pos_embed.weight                                        | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.bias                                  | UNEXPECTED | 

vlm.model.visual.merger.linear_fc1.weight                                | UNEXPECTED | 

vlm.lm_head.weight                                                       | UNEXPECTED | 

vlm.model.visual.merger.linear_fc2.bias                                  | UNEXPECTED | 

embedding_proj_layer.weight                                              | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.weight                                 | UNEXPECTED | 

vlm.model.language_model.norm.weight                                     | UNEXPECTED | 

vlm.model.visual.patch_embed.proj.bias                                   | UNEXPECTED | 

vlm.model.visual.merger.norm.bias                                        | UNEXPECTED | 

visual.blocks.{0...26}.mlp.linear_fc2.weight                             | MISSING    | 

language_model.layers.{0...31}.input_layernorm.weight                    | MISSING    | 

language_model.layers.{0...31}.self_attn.k_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.up_proj.weight                        | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.bias                               | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc2.bias                               | MISSING    | 

language_model.layers.{0...31}.self_attn.q_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.norm1.weight                                      | MISSING    | 

visual.blocks.{0...26}.mlp.linear_fc1.weight                             | MISSING    | 

language_model.layers.{0...31}.self_attn.v_proj.weight                   | MISSING    | 

language_model.layers.{0...31}.mlp.down_proj.weight                      | MISSING    | 

visual.blocks.{0...26}.attn.proj.weight                                  | MISSING    | 

visual.blocks.{0...26}.norm2.bias                                        | MISSING    | 

visual.blocks.{0...26}.norm1.bias                                        | MISSING    | 

language_model.layers.{0...31}.self_attn.q_norm.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.qkv.weight                                   | MISSING    | 

language_model.layers.{0...31}.mlp.gate_proj.weight                      | MISSING    | 

language_model.layers.{0...31}.post_attention_layernorm.weight           | MISSING    | 

visual.blocks.{0...26}.attn.qkv.bias                                     | MISSING    | 

language_model.layers.{0...31}.self_attn.k_norm.weight                   | MISSING    | 

language_model.layers.{0...31}.self_attn.o_proj.weight                   | MISSING    | 

visual.blocks.{0...26}.attn.proj.bias                                    | MISSING    | 

visual.blocks.{0...26}.norm2.weight                                      | MISSING    | 

visual.patch_embed.proj.weight                                           | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.weight                 | MISSING    | 

visual.merger.linear_fc2.bias                                            | MISSING    | 

visual.pos_embed.weight                                                  | MISSING    | 

visual.patch_embed.proj.bias                                             | MISSING    | 

visual.merger.norm.bias                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.bias                         | MISSING    | 

custom_text_proj.weight                                                  | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.bias                   | MISSING    | 

custom_text_proj.bias                                                    | MISSING    | 

visual.merger.linear_fc1.bias                                            | MISSING    | 

language_model.embed_tokens.weight                                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.norm.weight                       | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc2.bias                   | MISSING    | 

visual.merger.linear_fc2.weight                                          | MISSING    | 

visual.deepstack_merger_list.{0, 1, 2}.linear_fc1.weight                 | MISSING    | 

visual.merger.norm.weight                                                | MISSING    | 

language_model.norm.weight                                               | MISSING    | 

visual.merger.linear_fc1.weight                                          | MISSING    | 



Notes:

- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.

- MISSING       :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

ERROR:base_colqwen:模型加载失败：'NoneType' object has no attribute 'convert_tokens_to_ids'

Traceback (most recent call last):

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 102, in <module>

    loader.batch_insert_to_milvus()

  File "/root/colqwen3-8b/service/colqwen_vector_loader.py", line 32, in batch_insert_to_milvus

    self.initialize_model()  # 先加载模型

    ^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/service/base_colqwen.py", line 33, in initialize_model

    self.processor = ColQwen3Processor.from_pretrained(

                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 42, in from_pretrained

    instance = super().from_pretrained(

               ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1414, in from_pretrained

    return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict

    processor = cls(*args, **valid_kwargs)

                ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/colpali_engine/models/qwen3/colqwen3/processing_colqwen3.py", line 32, in __init__

    super().__init__(*args, **kwargs)

  File "/root/colqwen3-8b/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 69, in __init__

    else tokenizer.convert_tokens_to_ids(self.image_token)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

How to fix it? or Is it my wrong configuration?

Hi, note that Colpali format isn't directly compatible with current Tomoro-colqwen3 hf models as they are converted from this Colpali format, you can refer to the Tomoro HF repo for how to run the models for now. We intent to also share the conversion script later, but it shouldn't belong to this repo.

ManuelFay · 2025-12-19T14:16:24Z

So.one thing I don't understand yet is that this PR is not compatible with the Tomoro checkpoints you shared ?

Isn't it better to make this compatible ?
Is it just naming of the parameters ? Maybe we can have a copy of the model with a prefix in its name or something that directly loads ?

01234568 · 2025-12-22T10:22:04Z

We based our huggingface repo off the colqwen2 implementation on transformers here: https://github.com/huggingface/transformers/tree/main/src/transformers/models/colqwen2, which uses a different naming convention for params compared to this repo. Should we unify the names or keep 2 separate versions?

Mungeryang · 2025-12-25T09:04:50Z

I trained a 2B smaller colqwen3 model on the Qwen3-VL-2B-Instruct model, welcome to follow and use~
https://github.com/Mungeryang/colqwen3
https://huggingface.co/goodman2001/colqwen3-v0.2

ManuelFay

Is it possible to rebase on main so that linting gets corrected ?

The pther interrogation I have is whether the pyprojecxt should get updatred. From my understanding, qwen3 is only in newer versions of transformers so I am guessing we should bump the minimal transformers package right ?

The rest looks great !

) * looks like colqwen 2.5 omni support was accidentally removed in illuin-tech#339 EDIT: that was based upon just looking at the main __init__.py. looking at the other files, perhaps it was intentionally removed... * found & fixed resize_token_embeddings() breakage

* lint * lint examples

…colqwen3

hxssgaa · 2025-12-26T12:32:30Z

Is it possible to rebase on main so that linting gets corrected ?

The pther interrogation I have is whether the pyprojecxt should get updatred. From my understanding, qwen3 is only in newer versions of transformers so I am guessing we should bump the minimal transformers package right ?

The rest looks great !

Already rebased to main, and yes, I have updated the minima transformer version to be >=4.57.0 for Qwen3-VL support.

tests/models/modernvbert/test_interpretability_colmodernvbert.py

Add ColQwen3 Support

27de473

fix(colqwen3): fix ruff lint errors

0bef0c8

Merge branch 'main' into feat/colqwen3

4e46b92

ManuelFay requested changes Dec 26, 2025

View reviewed changes

TransAMrit and others added 4 commits December 26, 2025 20:26

lint (illuin-tech#368)

1edd6a9

* lint * lint examples

Merge branch 'feat/colqwen3' of github.com:hxssgaa/colpali into feat/…

49f6600

…colqwen3

Bump the transformer versions for ColQwen3 support

d8ee1ec

hxssgaa requested a review from ManuelFay December 26, 2025 13:23

ManuelFay requested changes Dec 26, 2025

View reviewed changes

tests/models/modernvbert/test_interpretability_colmodernvbert.py Outdated Show resolved Hide resolved

Fix lint error

f68b85f

hxssgaa requested a review from ManuelFay December 26, 2025 15:50

Update performance number

c5244fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ColQwen3 Support #366

Add ColQwen3 Support #366

Uh oh!

hxssgaa commented Dec 16, 2025

Uh oh!

ManuelFay commented Dec 16, 2025

Uh oh!

athrael-soju commented Dec 16, 2025

Uh oh!

hxssgaa commented Dec 17, 2025

Uh oh!

sunxichen commented Dec 19, 2025

Uh oh!

hxssgaa commented Dec 19, 2025

Uh oh!

ManuelFay commented Dec 19, 2025

Uh oh!

01234568 commented Dec 22, 2025 •

edited

Loading

Uh oh!

Mungeryang commented Dec 25, 2025 •

edited

Loading

Uh oh!

ManuelFay left a comment

Uh oh!

hxssgaa commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Add ColQwen3 Support #366

Are you sure you want to change the base?

Add ColQwen3 Support #366

Uh oh!

Conversation

hxssgaa commented Dec 16, 2025

Uh oh!

ManuelFay commented Dec 16, 2025

Uh oh!

athrael-soju commented Dec 16, 2025

Uh oh!

hxssgaa commented Dec 17, 2025

Uh oh!

sunxichen commented Dec 19, 2025

Uh oh!

hxssgaa commented Dec 19, 2025

Uh oh!

ManuelFay commented Dec 19, 2025

Uh oh!

01234568 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mungeryang commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ManuelFay left a comment

Choose a reason for hiding this comment

Uh oh!

hxssgaa commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

01234568 commented Dec 22, 2025 •

edited

Loading

Mungeryang commented Dec 25, 2025 •

edited

Loading