This repository was archived by the owner on Oct 20, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 44
This repository was archived by the owner on Oct 20, 2025. It is now read-only.
question about torch 2.1.0 integration #22
Copy link
Copy link
Open
Description
Thanks for your sharing! I'm greatly appreciate your work for reducing the cuda memory fragmentation. Recently I have integrated GMLake into torch2.1.0 and finished compiling without error. I would like to know how to confirm if GMLake is working properly, as I did not find any reduction in peak memory reserved during using Lora to train Llama2-7B.
garbage_collect_fused_blocks() function jumps to the error handling section, and does it causing GMLake not working?

Here are some running logs with only 6 iterations trainning steps.
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x25f46060, ptr 0x12a0000000 of size 512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 256 physical blocks to ptr 0x12a0000000 of size 512.000000MB for allocate size 512.000000MB succeeded, takes 20.435480ms, total_fuse_size 32558.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2a994d70, ptr 0x12c0000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x12c0000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 40.207251ms, total_fuse_size 33512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2ba9e650, ptr 0x1320000000 of size 512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 256 physical blocks to ptr 0x1320000000 of size 512.000000MB for allocate size 512.000000MB succeeded, takes 20.692452ms, total_fuse_size 34024.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2bab4010, ptr 0x1340000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x1340000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 51.173343ms, total_fuse_size 34978.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2b83a6b0, ptr 0x13a0000000 of size 512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 256 physical blocks to ptr 0x13a0000000 of size 512.000000MB for allocate size 512.000000MB succeeded, takes 30.265250ms, total_fuse_size 35490.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x26fa6af0, ptr 0x13c0000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x13c0000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 49.731019ms, total_fuse_size 36444.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x28e575f0, ptr 0x13fc000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x13fc000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 40.066690ms, total_fuse_size 37398.000000MB
{'train_runtime': 25.9383, 'train_samples_per_second': 1.851, 'train_steps_per_second': 0.231, 'train_loss': 1.7313324610392253, 'epoch': 1.0}
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MBMetadata
Metadata
Assignees
Labels
No labels