Skip to content

Conversation

@cuichenx
Copy link
Contributor

@cuichenx cuichenx commented Oct 29, 2025

NVIDIA Nemotron Nano v2 VL is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos. This makes the model valuable for agents focused on data analysis, document processing and visual understanding in applications like generating reports, curating videos, and dense captioning for media asset management, and retrieval-augmented search.

NeMo Megatron Bridge supports finetuning this model (including LoRA finetuning) on single-image, multi-image, and video datasets. The finetuned model can be converted back to the 🤗 Hugging Face format for downstream evaluation.

The model is currently available in the nvcr.io/nvidia/nemo:25.09.nemotron_nano_v2_vl container. This is the PR to the main branch.

Documentation: https://docs.nvidia.com/nemo/megatron-bridge/latest/models/vlm/nemotron-nano-v2-vl.html
Notable differences compared to the code in the nvcr.io/nvidia/nemo:25.09.nemotron_nano_v2_vl container:

  1. The forward step is renamed to llava_step instead of nemotron_nano_v2_vl_step
  2. The vlm inference script is moved to a standalone script hf_to_megatron_generate_nemotron_vlm.py‎ to distinguish the two different types of models, and the argument --use_llava_model is removed (hard coded into the new script)

Requires this megatron branch: NVIDIA/Megatron-LM#2115

cuichenx and others added 30 commits September 17, 2025 22:01
Signed-off-by: yaoyu-33 <[email protected]>
# Conflicts:
#	src/megatron/bridge/training/config.py
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Nemotron Nano V2 VL bridge and provider

See merge request chcui/Megatron-Bridge!1
HF export

See merge request chcui/Megatron-Bridge!2
Signed-off-by: yaoyu-33 <[email protected]>
yaoyu-33
yaoyu-33 previously approved these changes Oct 29, 2025
@cuichenx cuichenx marked this pull request as ready for review October 31, 2025 23:38
@cuichenx
Copy link
Contributor Author

cuichenx commented Nov 3, 2025

blocked by mcore version bump after NVIDIA/Megatron-LM#2115 is merged.

@cuichenx
Copy link
Contributor Author

cuichenx commented Nov 4, 2025

/ok to test ced4190

Signed-off-by: Chen Cui <[email protected]>
@adithya-s-k
Copy link

Hey @cuichenx

thank you for this message
NVIDIA-NeMo/NeMo#15023 (comment)

I’m trying out the 12B variant, but it would be great if you could also support the 8B VLM model. We have some work where we prefer models under 10B parameters.

I’m also open to contributing — please let me know if there are any plans to support it.

@cuichenx
Copy link
Contributor Author

cuichenx commented Nov 5, 2025

Hey @cuichenx

thank you for this message NVIDIA-NeMo/NeMo#15023 (comment)

I’m trying out the 12B variant, but it would be great if you could also support the 8B VLM model. We have some work where we prefer models under 10B parameters.

I’m also open to contributing — please let me know if there are any plans to support it.

Hi @adithya-s-k, thanks for your interest. Currently NVIDIA has only released a 12B v2 VL model (nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16). We don't have any plans to support the 9B v2 VL model since the checkpoint is not released. Are you trying to fine-tune the 9B VL model from scratch?

We would welcome your contribution of other VL models :)

@adithya-s-k
Copy link

Hey @cuichenx
thank you for this message NVIDIA-NeMo/NeMo#15023 (comment)
I’m trying out the 12B variant, but it would be great if you could also support the 8B VLM model. We have some work where we prefer models under 10B parameters.
I’m also open to contributing — please let me know if there are any plans to support it.

Hi @adithya-s-k, thanks for your interest. Currently NVIDIA has only released a 12B v2 VL model (nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16). We don't have any plans to support the 9B v2 VL model since the checkpoint is not released. Are you trying to fine-tune the 9B VL model from scratch?

We would welcome your contribution of other VL models :)

I’m referring to this model: nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1, which is the 8B variant we’ve been working with and really like. We were planning to fine-tune it but ran into several dependency conflicts. It would be great to have support for this model in Megatron-Bridge.

If not, please let me know if it’s possible to add it manually. I can take a stab at it, though I understand it might be tricky if there have been underlying API changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.2.0 Cherry-pick label for r0.2.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants