Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
e63ed61
add wip code
cuichenx Sep 18, 2025
7858117
update utils for transformers config in hydra
yaoyu-33 Sep 19, 2025
457bace
temp save
yaoyu-33 Sep 19, 2025
22233a2
pipeclean conversion (forward wip)
cuichenx Sep 19, 2025
6937da4
Merge branch 'refs/heads/main' into qwen-25vl-training
yaoyu-33 Sep 22, 2025
c67f734
vlm generate script updates for nemotron vl
cuichenx Sep 25, 2025
fcca45c
Merge remote-tracking branch 'refs/remotes/origin/main' into chcui/ne…
cuichenx Sep 25, 2025
790cd8d
fix after merging with main
cuichenx Sep 25, 2025
3a9ab4f
clean up
cuichenx Sep 25, 2025
e0fc7d1
fix forward pass
cuichenx Sep 26, 2025
44faee0
add /no_think sys prompt
cuichenx Sep 29, 2025
8a51440
Merge branch 'refs/heads/main' into qwen-25vl-training
yaoyu-33 Sep 30, 2025
3bc6ba5
lint
yaoyu-33 Sep 30, 2025
8061e0f
revert qwen-vl changes in gpt
yaoyu-33 Sep 30, 2025
df4755a
revert qwen-vl changes in gpt #2
yaoyu-33 Sep 30, 2025
975efd2
Add mock dataset provider for qwen25 vl
yaoyu-33 Sep 30, 2025
be708c2
add qwen25 vl dataset support from auto
yaoyu-33 Sep 30, 2025
6822d34
lint
yaoyu-33 Sep 30, 2025
ec9c7cd
enable multi image and video inputs
cuichenx Sep 30, 2025
bc8c605
update _attn_implementation
yaoyu-33 Oct 1, 2025
689f491
update comments
yaoyu-33 Oct 1, 2025
cf2c769
Merge branch 'chcui/nemotron-nano-v2-vl' into 'dev/nemotron-nano-v2-vl'
cuichenx Oct 1, 2025
4f0e90f
add preloaded dataset provider
yaoyu-33 Oct 1, 2025
4959ea5
enable hf export (need to manually copy over modeling files)
cuichenx Oct 2, 2025
98caa7a
expose strict
cuichenx Oct 2, 2025
2af0c2e
update _processor to a private attr
yaoyu-33 Oct 2, 2025
4a3ef3b
Merge branch 'chcui/hf_export' into 'dev/nemotron-nano-v2-vl'
cuichenx Oct 2, 2025
7f3818e
Merge branch 'refs/heads/main' into chcui/nano-v2-vl-training
cuichenx Oct 2, 2025
ccf6abe
update qwen training utils
yaoyu-33 Oct 2, 2025
94c6192
training bug fix
yaoyu-33 Oct 2, 2025
95d3002
fix finalize grad
yaoyu-33 Oct 3, 2025
4b7ef60
save qwen25 vl recipes
yaoyu-33 Oct 3, 2025
c37ffa0
training WIP
cuichenx Oct 3, 2025
03e3a7c
undo ckpt modification, loading works
cuichenx Oct 3, 2025
b095aae
Merge branch 'chcui/nano-v2-vl-training' into 'dev/nemotron-nano-v2-vl'
cuichenx Oct 3, 2025
608117e
add padding logic for pp
yaoyu-33 Oct 3, 2025
a9f0e15
vlm step general
yaoyu-33 Oct 6, 2025
6ddd4b3
default update
yaoyu-33 Oct 6, 2025
f30aa39
Merge branch 'main' into qwen-25vl-training
yaoyu-33 Oct 6, 2025
e425113
update to model specific visual inputs, also update mock dataset to b…
yaoyu-33 Oct 6, 2025
5bc1f29
Merge branch 'main' into qwen-25vl-training
yaoyu-33 Oct 6, 2025
90a0ff0
add ci tests
yaoyu-33 Oct 7, 2025
49759bc
lint
yaoyu-33 Oct 8, 2025
62ffa88
update dependency
yaoyu-33 Oct 8, 2025
6af4e4c
build: add qwen-vl-utils and update lockfile
yaoyu-33 Oct 8, 2025
7e0ceaf
remove `start_of_response_token` use
yaoyu-33 Oct 8, 2025
a7e5fdc
add few more unit tests
yaoyu-33 Oct 8, 2025
1e44b97
fix wandb reinit issue
yaoyu-33 Oct 8, 2025
18012cd
Revert "fix wandb reinit issue"
yaoyu-33 Oct 9, 2025
b0b910e
lint
yaoyu-33 Oct 9, 2025
d2031ca
update and fix tests for vlm dataset
yaoyu-33 Oct 9, 2025
3d8f4b3
Merge remote-tracking branch 'origin/qwen-25vl-training' into chcui/n…
cuichenx Oct 10, 2025
70aafe2
training works
cuichenx Oct 14, 2025
398a812
add raven and llava-video datasets
cuichenx Oct 14, 2025
a44d26c
push discussion code
cuichenx Oct 15, 2025
cbc25d4
Merge branch 'chcui/nano-v2-vl-training' into 'dev/nemotron-nano-v2-vl'
cuichenx Oct 15, 2025
56f9ad9
support video training
liding-nv Oct 17, 2025
a8ad5fd
add peft merge
cuichenx Oct 17, 2025
46cd9b9
change wording
cuichenx Oct 17, 2025
6008b3e
save every 200
cuichenx Oct 17, 2025
2da5696
clean up internal paths
cuichenx Oct 17, 2025
d3dd155
add merge lora script..
cuichenx Oct 18, 2025
3a13a6c
fix import
liding-nv Oct 20, 2025
b9da6cf
support multi subset video
liding-nv Oct 20, 2025
0bcfcb8
export with copy
cuichenx Oct 26, 2025
e9ee70d
qa fixes
cuichenx Oct 27, 2025
546c233
Merge remote-tracking branch 'refs/remotes/origin/main' into chcui/ne…
cuichenx Oct 28, 2025
e69586d
clean up code
cuichenx Oct 28, 2025
85c6a44
Merge remote-tracking branch 'origin/main' into chcui/nemotron-nano-v…
cuichenx Oct 28, 2025
d31d50f
Merge remote-tracking branch 'origin/main' into chcui/nemotron-nano-v…
cuichenx Oct 28, 2025
2e223e8
change to supported HF architectures
cuichenx Oct 28, 2025
1eb8fa3
add tests
cuichenx Oct 28, 2025
6f739cf
Merge remote-tracking branch 'refs/remotes/origin/main' into chcui/ne…
cuichenx Oct 29, 2025
0abb526
Merge remote-tracking branch 'refs/remotes/origin/main' into chcui/ne…
cuichenx Oct 29, 2025
0567e20
address comments
cuichenx Oct 29, 2025
edc2d98
copy over py and json files only
cuichenx Oct 31, 2025
9e80f35
merge causal lm and vlm so that output saves preprocessor config auto…
cuichenx Oct 31, 2025
bd447ae
move nemotron vlm generation to a new script
cuichenx Oct 31, 2025
bac193a
address comment
cuichenx Oct 31, 2025
c0756ce
move path helper to common utils
cuichenx Oct 31, 2025
707562a
Merge branch 'main' into chcui/nemotron-nano-v2-vl
cuichenx Oct 31, 2025
f7e0d3b
update model name
cuichenx Oct 31, 2025
b6a60d7
Merge branch 'chcui/nemotron-nano-v2-vl' of github.com:NVIDIA-NeMo/Me…
cuichenx Oct 31, 2025
bfda67e
refactor to llava_step
cuichenx Nov 1, 2025
71b4e78
clean up
cuichenx Nov 1, 2025
8813087
Merge branch 'main' into chcui/nemotron-nano-v2-vl
cuichenx Nov 3, 2025
e67e9f1
revert previous export copy code
cuichenx Nov 3, 2025
ced4190
raise error if trying to access validation split for raven and llava …
cuichenx Nov 4, 2025
f603601
Fix typo
cuichenx Nov 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion examples/conversion/convert_checkpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ def export_megatron_to_hf(
megatron_path: str,
hf_path: str,
show_progress: bool = True,
strict: bool = True,
) -> None:
"""
Export a Megatron checkpoint to HuggingFace format.
Expand Down Expand Up @@ -175,14 +176,15 @@ def export_megatron_to_hf(

# For demonstration, we'll create a bridge from a known config
# This would typically be extracted from the checkpoint metadata
bridge = AutoBridge.from_hf_pretrained(hf_model)
bridge = AutoBridge.from_hf_pretrained(hf_model, trust_remote_code=True)

# Export using the convenience method
print("📤 Exporting to HuggingFace format...")
bridge.export_ckpt(
megatron_path=megatron_path,
hf_path=hf_path,
show_progress=show_progress,
strict=strict,
)

print(f"✅ Successfully exported model to: {hf_path}")
Expand Down Expand Up @@ -232,6 +234,7 @@ def main():
"--hf-path", required=True, help="Directory path where the HuggingFace model will be saved"
)
export_parser.add_argument("--no-progress", action="store_true", help="Disable progress bar during export")
export_parser.add_argument("--not-strict", action="store_true", help="Allow source and target checkpoint to have different keys")

args = parser.parse_args()

Expand All @@ -254,6 +257,7 @@ def main():
megatron_path=args.megatron_path,
hf_path=args.hf_path,
show_progress=not args.no_progress,
strict=not args.not_strict,
)
else:
raise RuntimeError(f"Unknown command: {args.command}")
Expand Down
Loading