Hi fairchem team!
I'm on a research of finetuning uma-s-1p1 with all heads pretrained. I see the PR #1766 and delete the head part in my finetune config.
uma_sm_finetune_template.yaml
from:
model:
_target_: fairchem.core.units.mlip_unit.mlip_unit.initialize_finetuning_model
checkpoint_location:
_target_: fairchem.core.calculate.pretrained_mlip.pretrained_checkpoint_path_from_name
model_name: ${base_model_name}
overrides:
backbone:
otf_graph: true
max_neighbors: ${max_neighbors}
regress_stress: ${data.regress_stress}
always_use_pbc: false
pass_through_head_outputs: ${data.pass_through_head_outputs}
heads: ${data.heads}
to:
model:
_target_: fairchem.core.units.mlip_unit.mlip_unit.initialize_finetuning_model
checkpoint_location:
_target_: fairchem.core.calculate.pretrained_mlip.pretrained_checkpoint_path_from_name
model_name: ${base_model_name}
overrides:
backbone:
otf_graph: true
max_neighbors: ${max_neighbors}
regress_stress: ${data.regress_stress}
always_use_pbc: false
pass_through_head_outputs: ${data.pass_through_head_outputs}
I do some basic debugs to make the programm run but the loss at step 0 is still abnormally high, which is:
INFO:root:{'train/loss': 8990.620638182878, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.1664387292354395, 'train/atoms_per_second(approx)': 197.7114417561113, 'train/num_atoms_on_rank': 1026, 'train/num_samples_on_rank': 32}
/data/sunxuetin/anaconda3/envs/UMA/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:332: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
_warn_get_lr_called_within_step(self)
INFO:root:{'train/loss': 11867.852604975855, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.168348108451524, 'train/atoms_per_second(approx)': 197.5799003488379, 'train/num_atoms_on_rank': 1025, 'train/num_samples_on_rank': 32}
/data/sunxuetin/anaconda3/envs/UMA/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:332: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
_warn_get_lr_called_within_step(self)
INFO:root:{'train/loss': 10284.808734129449, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.165504733162675, 'train/atoms_per_second(approx)': 197.68149550702827, 'train/num_atoms_on_rank': 1026, 'train/num_samples_on_rank': 32}
/data/sunxuetin/anaconda3/envs/UMA/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:332: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
_warn_get_lr_called_within_step(self)
INFO:root:{'train/loss': 10899.674122548213, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.131760968243467, 'train/atoms_per_second(approx)': 195.83311592327573, 'train/num_atoms_on_rank': 1022, 'train/num_samples_on_rank': 32}
which is same as the loss using re-initialized heads.
Do I need to do anything else when editing the config to make sure the heads are succefully loaded?
Thanks for your reply!
Hi fairchem team!
I'm on a research of finetuning uma-s-1p1 with all heads pretrained. I see the PR #1766 and delete the head part in my finetune config.
uma_sm_finetune_template.yaml
from:
to:
I do some basic debugs to make the programm run but the loss at step 0 is still abnormally high, which is:
which is same as the loss using re-initialized heads.
Do I need to do anything else when editing the config to make sure the heads are succefully loaded?
Thanks for your reply!