Skip to content
This repository was archived by the owner on Oct 20, 2025. It is now read-only.
This repository was archived by the owner on Oct 20, 2025. It is now read-only.

question about torch 2.1.0 integration #22

@Pegessi

Description

@Pegessi

Thanks for your sharing! I'm greatly appreciate your work for reducing the cuda memory fragmentation. Recently I have integrated GMLake into torch2.1.0 and finished compiling without error. I would like to know how to confirm if GMLake is working properly, as I did not find any reduction in peak memory reserved during using Lora to train Llama2-7B.
garbage_collect_fused_blocks() function jumps to the error handling section, and does it causing GMLake not working?
image

Here are some running logs with only 6 iterations trainning steps.

node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x25f46060, ptr 0x12a0000000 of size 512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 256 physical blocks to ptr 0x12a0000000 of size 512.000000MB for allocate size 512.000000MB succeeded, takes 20.435480ms, total_fuse_size 32558.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2a994d70, ptr 0x12c0000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x12c0000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 40.207251ms, total_fuse_size 33512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2ba9e650, ptr 0x1320000000 of size 512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 256 physical blocks to ptr 0x1320000000 of size 512.000000MB for allocate size 512.000000MB succeeded, takes 20.692452ms, total_fuse_size 34024.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2bab4010, ptr 0x1340000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x1340000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 51.173343ms, total_fuse_size 34978.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x2b83a6b0, ptr 0x13a0000000 of size 512.000000MB
node-9658:4032281:4036703 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 256 physical blocks to ptr 0x13a0000000 of size 512.000000MB for allocate size 512.000000MB succeeded, takes 30.265250ms, total_fuse_size 35490.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x26fa6af0, ptr 0x13c0000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x13c0000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 49.731019ms, total_fuse_size 36444.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4159 fused block 0x28e575f0, ptr 0x13fc000000 of size 954.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO get_fused_fragmented_blocks():4171 try 0: fuse 477 physical blocks to ptr 0x13fc000000 of size 954.000000MB for allocate size 954.000000MB succeeded, takes 40.066690ms, total_fuse_size 37398.000000MB
{'train_runtime': 25.9383, 'train_samples_per_second': 1.851, 'train_steps_per_second': 0.231, 'train_loss': 1.7313324610392253, 'epoch': 1.0}
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO emptyCache():2425 garbage_collect_fused_blocks() return 0MB garbage memory
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3824 gc from fragmented_free_fused_blocks: blocks 0, size 0.000000MB
node-9658:4032281:4032281 [0] GMLAKE_INFO garbage_collect_fused_blocks():3893 gc from free_fused_blocks_in_release_order: blocks 0, size 0.000000MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions