Add Nemotron nano v2 vl #1136

cuichenx · 2025-10-29T16:39:43Z

NVIDIA Nemotron Nano v2 VL is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos. This makes the model valuable for agents focused on data analysis, document processing and visual understanding in applications like generating reports, curating videos, and dense captioning for media asset management, and retrieval-augmented search.

NeMo Megatron Bridge supports finetuning this model (including LoRA finetuning) on single-image, multi-image, and video datasets. The finetuned model can be converted back to the 🤗 Hugging Face format for downstream evaluation.

The model is currently available in the nvcr.io/nvidia/nemo:25.09.nemotron_nano_v2_vl container. This is the PR to the main branch.

Documentation: https://docs.nvidia.com/nemo/megatron-bridge/latest/models/vlm/nemotron-nano-v2-vl.html
Notable differences compared to the code in the nvcr.io/nvidia/nemo:25.09.nemotron_nano_v2_vl container:

The forward step is renamed to llava_step instead of nemotron_nano_v2_vl_step
The vlm inference script is moved to a standalone script hf_to_megatron_generate_nemotron_vlm.py‎ to distinguish the two different types of models, and the argument --use_llava_model is removed (hard coded into the new script)

Requires this megatron branch: NVIDIA/Megatron-LM#2115

Signed-off-by: yaoyu-33 <[email protected]>

…motron-nano-v2-vl

# Conflicts: # src/megatron/bridge/training/config.py

Signed-off-by: yaoyu-33 <[email protected]>

model Signed-off-by: yaoyu-33 <[email protected]>

Signed-off-by: yaoyu-33 <[email protected]>

Nemotron Nano V2 VL bridge and provider See merge request chcui/Megatron-Bridge!1

Signed-off-by: yaoyu-33 <[email protected]>

HF export See merge request chcui/Megatron-Bridge!2

Signed-off-by: yaoyu-33 <[email protected]>

examples/conversion/hf_to_megatron_generate_vlm.py

src/megatron/bridge/data/vlm_datasets/hf_dataset_makers.py

src/megatron/bridge/models/conversion/auto_bridge.py

src/megatron/bridge/models/hf_pretrained/base.py

tests/functional_tests/recipes/test_nemotron_vl_recipes_finetune.py

src/megatron/bridge/utils/path_utils.py

src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py

Signed-off-by: Chen Cui <[email protected]>

…matically Signed-off-by: Chen Cui <[email protected]>

Signed-off-by: Chen Cui <[email protected]>

…gatron-Bridge into chcui/nemotron-nano-v2-vl

Signed-off-by: Chen Cui <[email protected]>

cuichenx · 2025-11-03T23:34:39Z

blocked by mcore version bump after NVIDIA/Megatron-LM#2115 is merged.

…video Signed-off-by: Chen Cui <[email protected]>

cuichenx · 2025-11-04T22:25:01Z

/ok to test ced4190

Signed-off-by: Chen Cui <[email protected]>

adithya-s-k · 2025-11-05T19:58:45Z

Hey @cuichenx

thank you for this message
NVIDIA-NeMo/NeMo#15023 (comment)

I’m trying out the 12B variant, but it would be great if you could also support the 8B VLM model. We have some work where we prefer models under 10B parameters.

I’m also open to contributing — please let me know if there are any plans to support it.

cuichenx · 2025-11-05T21:42:45Z

Hey @cuichenx

thank you for this message NVIDIA-NeMo/NeMo#15023 (comment)

I’m trying out the 12B variant, but it would be great if you could also support the 8B VLM model. We have some work where we prefer models under 10B parameters.

I’m also open to contributing — please let me know if there are any plans to support it.

Hi @adithya-s-k, thanks for your interest. Currently NVIDIA has only released a 12B v2 VL model (nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16). We don't have any plans to support the 9B v2 VL model since the checkpoint is not released. Are you trying to fine-tune the 9B VL model from scratch?

We would welcome your contribution of other VL models :)

adithya-s-k · 2025-11-06T05:15:04Z

Hey @cuichenx
thank you for this message NVIDIA-NeMo/NeMo#15023 (comment)
I’m trying out the 12B variant, but it would be great if you could also support the 8B VLM model. We have some work where we prefer models under 10B parameters.
I’m also open to contributing — please let me know if there are any plans to support it.

Hi @adithya-s-k, thanks for your interest. Currently NVIDIA has only released a 12B v2 VL model (nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16). We don't have any plans to support the 9B v2 VL model since the checkpoint is not released. Are you trying to fine-tune the 9B VL model from scratch?

We would welcome your contribution of other VL models :)

I’m referring to this model: nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1, which is the 8B variant we’ve been working with and really like. We were planning to fine-tune it but ran into several dependency conflicts. It would be great to have support for this model in Megatron-Bridge.

If not, please let me know if it’s possible to add it manually. I can take a stab at it, though I understand it might be tricky if there have been underlying API changes.

cuichenx and others added 30 commits September 17, 2025 22:01

add wip code

e63ed61

update utils for transformers config in hydra

7858117

Signed-off-by: yaoyu-33 <[email protected]>

temp save

457bace

Signed-off-by: yaoyu-33 <[email protected]>

pipeclean conversion (forward wip)

22233a2

Merge branch 'refs/heads/main' into qwen-25vl-training

6937da4

vlm generate script updates for nemotron vl

c67f734

Merge remote-tracking branch 'refs/remotes/origin/main' into chcui/ne…

fcca45c

…motron-nano-v2-vl

fix after merging with main

790cd8d

clean up

3a9ab4f

fix forward pass

e0fc7d1

add /no_think sys prompt

44faee0

Merge branch 'refs/heads/main' into qwen-25vl-training

8a51440

# Conflicts: # src/megatron/bridge/training/config.py

lint

3bc6ba5

Signed-off-by: yaoyu-33 <[email protected]>

revert qwen-vl changes in gpt

8061e0f

Signed-off-by: yaoyu-33 <[email protected]>

revert qwen-vl changes in gpt #2

df4755a

Signed-off-by: yaoyu-33 <[email protected]>

Add mock dataset provider for qwen25 vl

975efd2

Signed-off-by: yaoyu-33 <[email protected]>

add qwen25 vl dataset support from auto

be708c2

model Signed-off-by: yaoyu-33 <[email protected]>

lint

6822d34

Signed-off-by: yaoyu-33 <[email protected]>

enable multi image and video inputs

ec9c7cd

update _attn_implementation

bc8c605

Signed-off-by: yaoyu-33 <[email protected]>

update comments

689f491

Signed-off-by: yaoyu-33 <[email protected]>

Merge branch 'chcui/nemotron-nano-v2-vl' into 'dev/nemotron-nano-v2-vl'

cf2c769

Nemotron Nano V2 VL bridge and provider See merge request chcui/Megatron-Bridge!1

add preloaded dataset provider

4f0e90f

Signed-off-by: yaoyu-33 <[email protected]>

enable hf export (need to manually copy over modeling files)

4959ea5

expose strict

98caa7a

update _processor to a private attr

2af0c2e

Signed-off-by: yaoyu-33 <[email protected]>

Merge branch 'chcui/hf_export' into 'dev/nemotron-nano-v2-vl'

4a3ef3b

HF export See merge request chcui/Megatron-Bridge!2

Merge branch 'refs/heads/main' into chcui/nano-v2-vl-training

7f3818e

update qwen training utils

ccf6abe

Signed-off-by: yaoyu-33 <[email protected]>

training bug fix

94c6192

Signed-off-by: yaoyu-33 <[email protected]>