DPA4/SeZM .pt2 export rejects zero-ghost nall==nloc; question about ZBL bridge switching

### Summary

We observed two related but distinct issues while testing DPA4/SeZM exported models:

1. **`.pt2` / AOTInductor crash when no ghost atoms are present**: exported SeZM/DPA4 `.pt2` models can contain a runtime guard that requires `nall != nloc`. This rejects valid inputs with zero ghost atoms, e.g. `nall == nloc == 375`.
2. **Question about SeZM ZBL bridge implementation**: the current Python `SeZMModel` path appears to store `bridging_r_inner` / `bridging_r_outer`, but the active energy path seems to add `InterPotential` directly. I want to confirm whether this additive behavior is intentional or whether a switch/mixing function is expected.

These may deserve separate issues; they were found while debugging the same DPA4/SeZM workflow.

---

### Environment

- Repository: `deepmodeling/deepmd-kit`
- Master commit inspected/tested: `99c1ece2e5087c77267fba4ca84932b53621e42c`
- `deepmd-kit`: `3.2.0b1.dev1+g99c1ece2e`
- Python: `3.12`
- PyTorch: `2.11.0+cu126`
- Model family: DPA4 / SeZM

---

## 1. `.pt2` crash: AOTInductor guard requires `nall != nloc`

The exported `.pt2` compiled code can contain a guard similar to:

```cpp
int64_t s22 = arg147_1_size[1];  // nall: total atoms including ghosts
int64_t s24 = arg149_1_size[1];  // nloc: local atoms

if (!(s22 != s24)) {
    throw std::runtime_error("Expected Ne(s22, s24) to be True but received 375");
}
```

At runtime, valid zero-ghost inputs may have:

```text
nall == nloc == 375
```

This causes the exported model to throw before inference. Both ZBL and non-ZBL exported models appear to contain the same `Ne(s22, s24)` guard, so the `.pt2` crash itself should not be attributed only to ZBL.

### Suspected source

In current master, `_build_dynamic_shapes()` defines `nall` and `nloc` as independent dynamic dimensions. For example:

```python
# deepmd/pt_expt/utils/serialization.py
nall_dim = torch.export.Dim("nall", min=nall_min)
nloc_dim = torch.export.Dim("nloc", min=1)
```

and similarly in:

```python
# deepmd/pt/entrypoints/freeze_pt2.py
nall_dim = torch.export.Dim("nall", min=4 if has_spin else 1)
nloc_dim = torch.export.Dim("nloc", min=1)
```

If the export sample has `nall > nloc`, PyTorch/AOTInductor can infer and preserve `nall != nloc` as a runtime invariant, even though `nall == nloc` is valid when there are no ghosts.

### Expected behavior

The exported model should allow:

```text
nall >= nloc
```

including the equality case `nall == nloc`.

### Possible fixes

One possible approach is to express the relationship explicitly, e.g. with PyTorch 2.11:

```python
nloc_dim = torch.export.Dim("nloc", min=1)
nall_dim = torch.export.Dim("nall", min=nloc_dim)
```

Alternatively, the export process could cover both zero-ghost (`nall == nloc`) and nonzero-ghost (`nall > nloc`) sample cases, if that is the preferred way to avoid an over-specialized inequality guard.

---

## 2. Question: should SeZM ZBL use `bridging_r_inner` / `bridging_r_outer` to switch/mix?

This is separate from the `.pt2` shape-guard crash. While inspecting `deepmd/pt/model/model/sezm_model.py`, I noticed that `SeZMModel.__init__` stores:

```python
self.bridging_r_inner = float(bridging_r_inner)
self.bridging_r_outer = float(bridging_r_outer)
self.inter_potential = InterPotential(...)
```

but in the observed `core_compute()` path, the analytical potential is added directly:

```python
fit_ret["energy"] = fit_ret["energy"] + self.inter_potential(
    extended_coord,
    extended_atype,
    nlist,
    nloc,
    real_type_count=self._get_inter_potential_real_type_count(),
)
```

The observed `InterPotential.forward()` computes ZBL pair energy over the normal neighbor list and sums it:

```python
pair_e = self._zbl_pair_energy(r, zi, zj)
pair_e = pair_e * valid
atom_pair_energy = (pair_e * 0.5).sum(dim=-1, keepdim=True)
```

I could not find use of `bridging_r_inner` / `bridging_r_outer` in this active energy path. This looks like an additive ZBL term rather than a switched/mixed short-range bridge.

### Question

Is the current additive behavior intended for SeZM ZBL, or should `bridging_r_inner` / `bridging_r_outer` be used to switch/mix the ZBL term with the learned energy?

If the additive behavior is intentional, it would be helpful to document it and clarify the intended training/inference configuration for SeZM ZBL models.

---

### Questions

1. For `.pt2` export, is the proposed `nall >= nloc` relationship the right fix for the zero-ghost guard crash?
2. For SeZM ZBL, is the current additive `fit_ret["energy"] + InterPotential(...)` behavior intended, or should `bridging_r_inner` / `bridging_r_outer` switch/mix the ZBL term?
3. Should the `.pt2` guard issue and the ZBL bridge behavior be tracked as separate issues?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPA4/SeZM .pt2 export rejects zero-ghost nall==nloc; question about ZBL bridge switching #5502

Summary

Environment

1. `.pt2` crash: AOTInductor guard requires `nall != nloc`

Suspected source

Expected behavior

Possible fixes

2. Question: should SeZM ZBL use `bridging_r_inner` / `bridging_r_outer` to switch/mix?

Question

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DPA4/SeZM .pt2 export rejects zero-ghost nall==nloc; question about ZBL bridge switching #5502

Description

Summary

Environment

1. .pt2 crash: AOTInductor guard requires nall != nloc

Suspected source

Expected behavior

Possible fixes

2. Question: should SeZM ZBL use bridging_r_inner / bridging_r_outer to switch/mix?

Question

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `.pt2` crash: AOTInductor guard requires `nall != nloc`

2. Question: should SeZM ZBL use `bridging_r_inner` / `bridging_r_outer` to switch/mix?