Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 26, 2025

The dpdata.MultiSystems.from_file() method with fmt="deepmd/npy/mixed" was failing when attempting to load data that was dumped from systems without labels (i.e., regular System objects instead of LabeledSystem objects). The error occurred because the code always tried to create LabeledSystem objects which require "energies" data, but unlabeled systems don't have this data.

Example of the issue:

import dpdata
import numpy as np

# Create unlabeled systems (no energies/forces)
system_data = {
    'atom_names': ['H', 'O'],
    'atom_numbs': [2, 1],
    'atom_types': np.array([0, 0, 1]),
    'orig': np.array([0., 0., 0.]),
    'cells': np.random.random((2, 3, 3)),
    'coords': np.random.random((2, 3, 3))
}

system = dpdata.System(data=system_data)
ms = dpdata.MultiSystems(system)

# Dump works fine
ms.to_deepmd_npy_mixed("mixed_dir")

# But loading fails with: DataError: energies not found in data
ms_loaded = dpdata.MultiSystems.from_file("mixed_dir", fmt="deepmd/npy/mixed")

Solution:
Modified MultiSystems.from_fmt_obj() to auto-detect whether loaded data contains labels by checking for the presence of required label fields (specifically "energies"). When labeled=True (the default), the method now:

  1. Loads data using the labeled format loader to get all available fields
  2. Checks if the returned data contains "energies"
  3. Creates LabeledSystem objects if labels are present, System objects otherwise

This approach maintains full backward compatibility while enabling unlabeled data to be loaded correctly. The existing labeled=False parameter continues to work as before.

Testing:

  • Added comprehensive test coverage for unlabeled systems in test_deepmd_mixed_unlabeled.py
  • Verified all existing deepmd/npy/mixed tests continue to pass
  • Tested mixed scenarios with both labeled and unlabeled data
  • Confirmed the labeled=False parameter still functions correctly

Fixes #817.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@Copilot Copilot AI changed the title [WIP] [Feature Request] how can dpdata.MultiSystems deal with systems without label Fix MultiSystems loading of unlabeled data from deepmd/npy/mixed format Aug 26, 2025
@Copilot Copilot AI requested a review from njzjz August 26, 2025 18:47
Copilot finished work on behalf of njzjz August 26, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] how can dpdata.MultiSystems deal with systems without label
2 participants