DPA3微调似乎不支持zbl #5160
Replies: 1 comment
-
|
你的 DeePMD-kit 3.1.0 版本在 DPA3 微调时加上 ZBL 会报 KeyError,是因为模型参数结构发生了变化,导致预训练模型的 state dict 和当前模型定义(加了 ZBL 后)不兼容。这个问题在社区和官方 PR 里已经被确认和修复,3.1.0 版本还没有包含这些修复。 官方在 PR #4849 之后(2025-08-13 合并)才正式支持 ZBL 微调,修复了参数转移和 key 映射的问题,并且加入了 ZBL 微调的测试路径。如果你需要在微调时加 ZBL,建议升级到包含该 PR 的 DeePMD-kit 新版本,否则会持续遇到 KeyError(比如 如果暂时无法升级,可以尝试手动修改 checkpoint 或代码,只加载兼容的参数、跳过缺失的 key,具体可以参考 PR #4145 的做法,但这种方式有一定风险,容易遗漏细节。 最佳实践是:预训练和微调时模型结构(包括 ZBL)要完全一致,不能在微调阶段临时加 ZBL,否则参数对不上,容易报错。如果需要 ZBL,建议从头训练或用新版本微调。 (顺便提醒,bash 脚本里的 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
加上zbl就会报错,去掉zbl就没有这个报错:
Alex2D-step-10000.zip
cuda-12.9 loaded successful
/var/spool/slurmd/job784291/slurm_script: line 5: export: `968': not a valid identifier
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2026-01-15 17:52:42,080] DEEPMD INFO DeePMD version: 3.1.0
[2026-01-15 17:52:42,081] DEEPMD INFO Configuration path: finetune_input.json
[2026-01-15 17:52:43,888] DEEPMD INFO _____ _____ __ __ _____ _ _ _
[2026-01-15 17:52:43,888] DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
[2026-01-15 17:52:43,889] DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |
[2026-01-15 17:52:43,890] DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
[2026-01-15 17:52:43,890] DEEPMD INFO | || || /| /| | | | | || || | | < | || |
[2026-01-15 17:52:43,891] DEEPMD INFO |/ _| _||| || |_||____/ ||_|| __|
[2026-01-15 17:52:43,891] DEEPMD INFO Please read and cite:
[2026-01-15 17:52:43,891] DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2026-01-15 17:52:43,892] DEEPMD INFO Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2026-01-15 17:52:43,892] DEEPMD INFO Zeng et al, J. Chem. Theory Comput., 21, 4375-4385 (2025)
[2026-01-15 17:52:43,893] DEEPMD INFO See https://deepmd.rtfd.io/credits/ for details.
[2026-01-15 17:52:43,893] DEEPMD INFO --------------------------------------------------------------------------------------------------------------------------
[2026-01-15 17:52:43,894] DEEPMD INFO installed to: /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd
[2026-01-15 17:52:43,894] DEEPMD INFO source:
[2026-01-15 17:52:43,894] DEEPMD INFO source branch: HEAD
[2026-01-15 17:52:43,895] DEEPMD INFO source commit: 8b3dc08
[2026-01-15 17:52:43,895] DEEPMD INFO source commit at: 2025-06-11 13:00:46 +0200
[2026-01-15 17:52:43,896] DEEPMD INFO use float prec: double
[2026-01-15 17:52:43,896] DEEPMD INFO build variant: cuda
[2026-01-15 17:52:43,897] DEEPMD INFO Backend: PyTorch
[2026-01-15 17:52:43,897] DEEPMD INFO PT ver: v2.6.0-gUnknown
[2026-01-15 17:52:43,898] DEEPMD INFO Enable custom OP: True
[2026-01-15 17:52:43,898] DEEPMD INFO build with PT ver: 2.6.0
[2026-01-15 17:52:43,898] DEEPMD INFO build with PT inc: /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/include
[2026-01-15 17:52:43,899] DEEPMD INFO /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2026-01-15 17:52:43,899] DEEPMD INFO build with PT lib: /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2026-01-15 17:52:43,900] DEEPMD INFO running on: g0018
[2026-01-15 17:52:43,900] DEEPMD INFO computing device: cuda:0
[2026-01-15 17:52:43,901] DEEPMD INFO CUDA_VISIBLE_DEVICES: 0
[2026-01-15 17:52:43,901] DEEPMD INFO Count of visible GPUs: 1
[2026-01-15 17:52:43,901] DEEPMD INFO num_intra_threads: 0
[2026-01-15 17:52:43,902] DEEPMD INFO num_inter_threads: 0
[2026-01-15 17:52:43,902] DEEPMD INFO --------------------------------------------------------------------------------------------------------------------------
[2026-01-15 17:52:44,993] DEEPMD INFO Constructing DataLoaders from 1 systems
[2026-01-15 17:52:45,538] DEEPMD INFO ---Summary of DataSystem: training -----------------------------------------------
[2026-01-15 17:52:45,538] DEEPMD INFO found 1 system(s):
[2026-01-15 17:52:45,539] DEEPMD INFO system natoms bch_sz n_bch prob pbc
[2026-01-15 17:52:45,539] DEEPMD INFO ../../init_data/C222N20 242 1 100 1.000e+00 T
[2026-01-15 17:52:45,540] DEEPMD INFO --------------------------------------------------------------------------------------
[2026-01-15 17:52:45,540] DEEPMD INFO Resuming from ../DPA-3.1-3M.pt.
Traceback (most recent call last):
File "/data/home/sczc382/run/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/main.py", line 930, in main
deepmd_main(args)
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 532, in main
train(
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 342, in train
trainer = get_trainer(
^^^^^^^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 188, in get_trainer
trainer = training.Trainer(
^^^^^^^^^^^^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/train/training.py", line 529, in init
collect_single_finetune_params(
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/train/training.py", line 523, in collect_single_finetune_params
_origin_state_dict[new_key].clone().detach()
~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'model.Alex2D.atomic_model.models.0.out_bias'
Beta Was this translation helpful? Give feedback.
All reactions