[BUG] `dp change-bias` will give much large model

### Bug summary

From pre-trained multi-head model, `dp --pt change-bias` will give a model with much larger size. However, finetuen with `numb_steps: 0` will have no problem:
```
(base) [2201110432@wm2-login01 fine2]$ ll -h
total 465M
-rw-rw-r-- 1 2201110432 2201110432   24 Nov 13 15:46 checkpoint
lrwxrwxrwx 1 2201110432 2201110432   27 Nov 13 15:36 dpa230m.pt -> DPA2_medium_28_10M_beta4.pt
-rw-rw-r-- 1 2201110432 2201110432 338M Nov 13 15:45 dpa230m_updated.pt
-rw-rw-r-- 1 2201110432 2201110432  800 Nov 13 15:46 dpa2.hdf5
-rw-rw-r-- 1 2201110432 2201110432 119M Nov 13 15:35 DPA2_medium_28_10M_beta4.pt
-rw-rw-r-- 1 2201110432 2201110432 108K Nov 13 15:46 dpfine_4279321.err
-rw-rw-r-- 1 2201110432 2201110432    0 Nov 13 15:43 dpfine_4279321.out
-rw-r--r-- 1 2201110432 2201110432  692 Nov 13 15:43 fine.slurm
-rw-rw-r-- 1 2201110432 2201110432 2.4K Nov 13 15:36 input.json
-rw-rw-r-- 1 2201110432 2201110432 3.0K Nov 13 15:45 input_v2_compat.json
-rw-rw-r-- 1 2201110432 2201110432    0 Nov 13 15:46 lcurve.out
-rw-rw-r-- 1 2201110432 2201110432 7.9M Nov 13 15:46 model_finetune.ckpt-0.pt
lrwxrwxrwx 1 2201110432 2201110432   24 Nov 13 15:46 model_finetune.ckpt.pt -> model_finetune.ckpt-0.pt
-rw-rw-r-- 1 2201110432 2201110432 4.8K Nov 13 15:45 out.json
```

the model after change-bias *dpa230m_updated.pt* have much larger size even more than original model, but the 0-step finetuned model *model_finetune.ckpt-0.pt* have much small size which is in desire.

And, if try to load the model after change-bias, the head should be selected, which is also not in desire
```
In [1]: from deepmd.infer.deep_pot import DeepPot

In [2]: model = DeepPot("dpa230m_updated.pt")
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
/data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/pt/infer/deep_eval.py:110: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_file, map_location=env.DEVICE)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = DeepPot("dpa230m_updated.pt")

File /data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/infer/deep_eval.py:334, in DeepEval.__init__(self, model_file, auto_batch_size, neighbor_list, *args, **kwargs)
    326 def __init__(
    327     self,
    328     model_file: str,
   (...)
    332     **kwargs: Any,
    333 ) -> None:
--> 334     self.deep_eval = DeepEvalBackend(
    335         model_file,
    336         self.output_def,
    337         *args,
    338         auto_batch_size=auto_batch_size,
    339         neighbor_list=neighbor_list,
    340         **kwargs,
    341     )
    342     if self.deep_eval.get_has_spin() and hasattr(self, "output_def_mag"):
    343         self.deep_eval.output_def = self.output_def_mag

File /data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/pt/infer/deep_eval.py:121, in DeepEval.__init__(self, model_file, output_def, auto_batch_size, neighbor_list, head, *args, **kwargs)
    118 if isinstance(head, int):
    119     head = model_keys[0]
    120 assert (
--> 121     head is not None
    122 ), f"Head must be set for multitask model! Available heads are: {model_keys}"
    123 assert (
    124     head in model_keys
    125 ), f"No head named {head} in model! Available heads are: {model_keys}"
    126 self.input_param = self.input_param["model_dict"][head]

AssertionError: Head must be set for multitask model! Available heads are: ['Domains_Alloy', 'Domains_Anode', 'Domains_Cluster', 'Domains_Drug', 'Domains_FerroEle', 'Domains_OC2M', 'Domains_SSE-PBE', 'Domains_SemiCond', 'H2O_H2O-PD', 'Metals_AgAu-PBE', 'Metals_AlMgCu', 'Metals_Cu', 'Metals_Sn', 'Metals_Ti', 'Metals_V', 'Metals_W', 'Others_C12H26', 'Others_HfO2', 'Domains_ANI', 'Domains_SSE-PBESol', 'Domains_Transition1x', 'H2O_H2O-DPLR', 'H2O_H2O-PBE0TS-MD', 'H2O_H2O-PBE0TS', 'H2O_H2O-SCAN0', 'Metals_AgAu-PBED3', 'Others_In2Se3', 'MP_traj_v024_alldata_mixu']
```

Where the 0-step finetuned model have no problem
```
In [3]: model = DeepPot("model_finetune.ckpt-0.pt")
/data/softwares/miniconda3/envs/deepmd-3b4/lib/python3.11/site-packages/deepmd/pt/infer/deep_eval.py:110: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_file, map_location=env.DEVICE)
You can use the environment variable DP_INFER_BATCH_SIZE tocontrol the inference batch size (nframes * natoms). The default value is 1024.
```

### DeePMD-kit Version

v3.0.0b4

### Backend and its version

pytorch 2.5.1

### How did you download the software?

Offline packages

### Input Files, Running Commands, Error Log, etc.

command for change-bias:
```
dp --pt change-bias dpa230m.pt -s ../../data-clean4_radsp/train --model-branch Domains_OC2M
```

command for 0-step finetune
```
dp --pt train input.json --finetune dpa230m.pt --model-branch Domains_OC2M
```

coresponding input.json
```
{
  "_comment": "that's all",
  "model": {
    "type_map": [
      "C",
      "Fe",
      "H",
      "O"
    ],
    "descriptor": {
      "type": "dpa2",
      "repinit": {
        "tebd_dim": 8,
        "rcut": 6.0,
        "rcut_smth": 0.5,
        "nsel": 120,
        "neuron": [
          25,
          50,
          100
        ],
        "axis_neuron": 12,
        "activation_function": "tanh",
        "three_body_sel": 40,
        "three_body_rcut": 4.0,
        "three_body_rcut_smth": 3.5,
        "use_three_body": true
      },
      "repformer": {
        "rcut": 4.0,
        "rcut_smth": 3.5,
        "nsel": 40,
        "nlayers": 6,
        "g1_dim": 128,
        "g2_dim": 32,
        "attn2_hidden": 32,
        "attn2_nhead": 4,
        "attn1_hidden": 128,
        "attn1_nhead": 4,
        "axis_neuron": 4,
        "update_h2": false,
        "update_g1_has_conv": true,
        "update_g1_has_grrg": true,
        "update_g1_has_drrd": true,
        "update_g1_has_attn": false,
        "update_g2_has_g1g1": false,
        "update_g2_has_attn": true,
        "update_style": "res_residual",
        "update_residual": 0.01,
        "update_residual_init": "norm",
        "attn2_has_gate": true,
        "use_sqrt_nnei": true,
        "g1_out_conv": true,
        "g1_out_mlp": true
      },
      "add_tebd_to_repinit_out": false
    },
    "fitting_net": {
      "neuron": [
        240,
        240,
        240
      ],
      "resnet_dt": true,
      "seed": 19090,
      "_comment": " that's all"
    },
    "_comment": " that's all"
  },
  "learning_rate": {
    "type": "exp",
    "decay_steps": 2000,
    "start_lr": 0.001,
    "stop_lr": 3.51e-08,
    "_comment": "that's all"
  },
  "loss": {
    "type": "ener",
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0,
    "limit_pref_v": 0,
    "_comment": " that's all"
  },
  "training": {
    "stat_file": "./dpa2.hdf5",
    "training_data": {
      "systems": "../../data-clean4_radsp/train/",
      "batch_size": "auto",
      "_comment": "that's all"
    },
    "numb_steps": 0,
    "warmup_steps": 0,
    "gradient_max_norm": 5.0,
    "max_ckpt_keep":20,
    "seed": 19090,
    "save_ckpt": "model_finetune.ckpt",
    "disp_file": "lcurve.out",
    "disp_freq": 1000,
    "save_freq": 20000,
    "_comment": "that's all"
  }
}
```

### Steps to Reproduce

run these command in any dataset

### Further Information, Files, and Links

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] `dp change-bias` will give much large model #4348

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] dp change-bias will give much large model #4348

Description

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] `dp change-bias` will give much large model #4348