When I used the finetuned model for MD simulation, an "out-of-memory" error occurred #5167
Replies: 2 comments 1 reply
-
|
Hi @Yancy-Luke! I'm Dosu and I’m helping the deepmd-kit team. Out-of-memory errors in DeePMD-kit MD simulations with finetuned models can happen for several reasons. If batch size is already 1, the model or system may be too large for your available memory. Common causes include a large model network size, a high number of atoms, or using descriptor types like If you are running inference or model deviation in a loop, make sure you are not repeatedly loading the same model, as TensorFlow or PyTorch may not release memory until the process shuts down [docs]. For large systems, consider using an external neighbor list compatible with ASE's If the error occurs at a specific simulation step, it may indicate a memory leak or accumulation. Evaluating descriptors for each system in a loop (rather than all at once) can also help avoid OOM errors [discussion]. Check your hardware specs (RAM/GPU memory), confirm whether the simulation is running on CPU or GPU, and try running a smaller system or simpler model to isolate the issue. For more details on tuning model/network size and other memory-related settings, see the troubleshooting guide. If you can share your hardware specs, model architecture, and system size, I can help with more targeted suggestions. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
@dosu +-----------------------------------------------------------------------------------------+ CPU(s): 208 Mem: 251Gi 59Gi 95Gi 130Mi 97Gi 191Gi the descriptor is dpa3,the model-branch is H2O_H2O_PD,and the system size is 3000 atoms |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear Developer
When I used the finetuned model for MD simulation, an "out-of-memory" error occurred.
deepmd.utils.errors.OutOfMemoryError: The callable still throws an out-of-memory (OOM) error even when batch size is 1!
version:DeePMD-kit v3.1.2
installation way:conda
input.json
ASE_md.py
If you could offer me some suggestions, I would be very grateful.
Beta Was this translation helpful? Give feedback.
All reactions