-
Notifications
You must be signed in to change notification settings - Fork 573
Open
Labels
Description
Bug summary
From pre-trained multi-head model, dp --pt change-bias
will give a model with much larger size. However, finetuen with numb_steps: 0
will have no problem:
(base) [2201110432@wm2-login01 fine2]$ ll -h
total 465M
-rw-rw-r-- 1 2201110432 2201110432 24 Nov 13 15:46 checkpoint
lrwxrwxrwx 1 2201110432 2201110432 27 Nov 13 15:36 dpa230m.pt -> DPA2_medium_28_10M_beta4.pt
-rw-rw-r-- 1 2201110432 2201110432 338M Nov 13 15:45 dpa230m_updated.pt
-rw-rw-r-- 1 2201110432 2201110432 800 Nov 13 15:46 dpa2.hdf5
-rw-rw-r-- 1 2201110432 2201110432 119M Nov 13 15:35 DPA2_medium_28_10M_beta4.pt
-rw-rw-r-- 1 2201110432 2201110432 108K Nov 13 15:46 dpfine_4279321.err
-rw-rw-r-- 1 2201110432 2201110432 0 Nov 13 15:43 dpfine_4279321.out
-rw-r--r-- 1 2201110432 2201110432 692 Nov 13 15:43 fine.slurm
-rw-rw-r-- 1 2201110432 2201110432 2.4K Nov 13 15:36 input.json
-rw-rw-r-- 1 2201110432 2201110432 3.0K Nov 13 15:45 input_v2_compat.json
-rw-rw-r-- 1 2201110432 2201110432 0 Nov 13 15:46 lcurve.out
-rw-rw-r-- 1 2201110432 2201110432 7.9M Nov 13 15:46 model_finetune.ckpt-0.pt
lrwxrwxrwx 1 2201110432 2201110432 24 Nov 13 15:46 model_finetune.ckpt.pt -> model_finetune.ckpt-0.pt
-rw-rw-r-- 1 2201110432 2201110432 4.8K Nov 13 15:45 out.json
the model after change-bias dpa230m_updated.pt have much larger size even more than original model, but the 0-step finetuned model model_finetune.ckpt-0.pt have much small size which is in desire.
And, if try to load the model after change-bias, the head should be selected, which is also not in desire
In [1]: from deepmd.infer.deep_pot import DeepPot
In [2]: model = DeepPot("dpa230m_updated.pt")
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
/data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/pt/infer/deep_eval.py:110: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(model_file, map_location=env.DEVICE)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = DeepPot("dpa230m_updated.pt")
File /data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/infer/deep_eval.py:334, in DeepEval.__init__(self, model_file, auto_batch_size, neighbor_list, *args, **kwargs)
326 def __init__(
327 self,
328 model_file: str,
(...)
332 **kwargs: Any,
333 ) -> None:
--> 334 self.deep_eval = DeepEvalBackend(
335 model_file,
336 self.output_def,
337 *args,
338 auto_batch_size=auto_batch_size,
339 neighbor_list=neighbor_list,
340 **kwargs,
341 )
342 if self.deep_eval.get_has_spin() and hasattr(self, "output_def_mag"):
343 self.deep_eval.output_def = self.output_def_mag
File /data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/pt/infer/deep_eval.py:121, in DeepEval.__init__(self, model_file, output_def, auto_batch_size, neighbor_list, head, *args, **kwargs)
118 if isinstance(head, int):
119 head = model_keys[0]
120 assert (
--> 121 head is not None
122 ), f"Head must be set for multitask model! Available heads are: {model_keys}"
123 assert (
124 head in model_keys
125 ), f"No head named {head} in model! Available heads are: {model_keys}"
126 self.input_param = self.input_param["model_dict"][head]
AssertionError: Head must be set for multitask model! Available heads are: ['Domains_Alloy', 'Domains_Anode', 'Domains_Cluster', 'Domains_Drug', 'Domains_FerroEle', 'Domains_OC2M', 'Domains_SSE-PBE', 'Domains_SemiCond', 'H2O_H2O-PD', 'Metals_AgAu-PBE', 'Metals_AlMgCu', 'Metals_Cu', 'Metals_Sn', 'Metals_Ti', 'Metals_V', 'Metals_W', 'Others_C12H26', 'Others_HfO2', 'Domains_ANI', 'Domains_SSE-PBESol', 'Domains_Transition1x', 'H2O_H2O-DPLR', 'H2O_H2O-PBE0TS-MD', 'H2O_H2O-PBE0TS', 'H2O_H2O-SCAN0', 'Metals_AgAu-PBED3', 'Others_In2Se3', 'MP_traj_v024_alldata_mixu']
Where the 0-step finetuned model have no problem
In [3]: model = DeepPot("model_finetune.ckpt-0.pt")
/data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/pt/infer/deep_eval.py:110: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(model_file, map_location=env.DEVICE)
You can use the environment variable DP_INFER_BATCH_SIZE tocontrol the inference batch size (nframes * natoms). The default value is 1024.
DeePMD-kit Version
v3.0.0b4
Backend and its version
pytorch 2.5.1
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc.
command for change-bias:
dp --pt change-bias dpa230m.pt -s ../../data-clean4_radsp/train --model-branch Domains_OC2M
command for 0-step finetune
dp --pt train input.json --finetune dpa230m.pt --model-branch Domains_OC2M
coresponding input.json
{
"_comment": "that's all",
"model": {
"type_map": [
"C",
"Fe",
"H",
"O"
],
"descriptor": {
"type": "dpa2",
"repinit": {
"tebd_dim": 8,
"rcut": 6.0,
"rcut_smth": 0.5,
"nsel": 120,
"neuron": [
25,
50,
100
],
"axis_neuron": 12,
"activation_function": "tanh",
"three_body_sel": 40,
"three_body_rcut": 4.0,
"three_body_rcut_smth": 3.5,
"use_three_body": true
},
"repformer": {
"rcut": 4.0,
"rcut_smth": 3.5,
"nsel": 40,
"nlayers": 6,
"g1_dim": 128,
"g2_dim": 32,
"attn2_hidden": 32,
"attn2_nhead": 4,
"attn1_hidden": 128,
"attn1_nhead": 4,
"axis_neuron": 4,
"update_h2": false,
"update_g1_has_conv": true,
"update_g1_has_grrg": true,
"update_g1_has_drrd": true,
"update_g1_has_attn": false,
"update_g2_has_g1g1": false,
"update_g2_has_attn": true,
"update_style": "res_residual",
"update_residual": 0.01,
"update_residual_init": "norm",
"attn2_has_gate": true,
"use_sqrt_nnei": true,
"g1_out_conv": true,
"g1_out_mlp": true
},
"add_tebd_to_repinit_out": false
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 19090,
"_comment": " that's all"
},
"_comment": " that's all"
},
"learning_rate": {
"type": "exp",
"decay_steps": 2000,
"start_lr": 0.001,
"stop_lr": 3.51e-08,
"_comment": "that's all"
},
"loss": {
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},
"training": {
"stat_file": "./dpa2.hdf5",
"training_data": {
"systems": "../../data-clean4_radsp/train/",
"batch_size": "auto",
"_comment": "that's all"
},
"numb_steps": 0,
"warmup_steps": 0,
"gradient_max_norm": 5.0,
"max_ckpt_keep":20,
"seed": 19090,
"save_ckpt": "model_finetune.ckpt",
"disp_file": "lcurve.out",
"disp_freq": 1000,
"save_freq": 20000,
"_comment": "that's all"
}
}
Steps to Reproduce
run these command in any dataset
Further Information, Files, and Links
No response
Copilot
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo