Host memory unecessary usage in FSDP rank 0

### System Info

verl main ( 2026.01.27)
vllm 0.13.0 (updated from official docker image vllm012.exp3)
torch 2.9.1+cu129
GPU: 8* A800 80GB


### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

When I tried to run DAPO scripts with fsdp2+vllm async for Qwen3-Next-80B-A3B-Instruct on 4 nodes, I encountered host OOM during actor loading model.
When I use fsdp_size=16, I noticed that the memory usage for Node 0 and Node 2 is about 320GB higher than Node 1 and Node 3 after initializing the ref model, and blooms when initializing the actor/rollout.

It seems that  the weights in "full_state" are not freed after the following code, or there's some bug in set_state_dict from pytorch.

Does anyone meet the same issue?
Or should I use megatron instead of fsdp?

 
```
def _build_model_optimizer( ...):
    init_context = get_init_weight_context_manager(
            use_meta_tensor=not actor_model_config.tie_word_embeddings, mesh=self.device_mesh
        )
    ...
    with init_context(), warnings.catch_warnings():
        ...
        actor_module = actor_module_class.from_pretrained(
                pretrained_model_name_or_path=local_path,
                torch_dtype=torch_dtype,
                config=actor_model_config,
                trust_remote_code=trust_remote_code,
                attn_implementation=attn_implementation,
            )
    ....
    # Params loaded to CPU on rank0 here
    full_state = actor_module.state_dict()
    # Params converted to DTensor
    apply_fsdp2(actor_module, fsdp_kwargs, fsdp_config)
    # broadcast param weights from rank0 to others
    fsdp2_load_full_state_dict(actor_module, full_state, fsdp_mesh, cpu_offload)
    actor_module_fsdp = actor_module

```



### Expected behavior

Memory usage for all nodes being similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Host memory unecessary usage in FSDP rank 0 #5125

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Host memory unecessary usage in FSDP rank 0 #5125

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions