Documentation

aims-PAX, short for ab initio molecular simulation-Parallel Active eXploration, is a flexible, fully automated, open-source software package for performing active learning for machine learning force fields using a parallelized algorithm that enables efficient resource management.

Documentation

Note: aims-PAX is under active development. Backward compatibility with older versions is not currently guaranteed.

Installation

aims-PAX

To install aims-PAX and all its requirements do the following steps:

Create a(n) (mini)conda environment (if not existing already) e.g.: conda create -n my_env python=3.10
Activate the conda environment: conda activate my_env
Clone this repository
Move to the aims-PAX directory: cd aims-PAX
run the setup script: bash setup.sh

The latter will install the PARSL tools, other packages specified in requirements.txt, and aims-PAX itself.

FHI aims

To install FHI aims you can follow the instruction in the official manual.

Here is a quick rundown how to compile it on Meluxina:

Clone the source code from the FHI aims gitlab.
Move to the root FHI aims directory
Create a build directory and change to it: mkdir build && cd build
Create a initial_cache.cmake file with all necessary flags (an example is provided under: fhi_aims_help/example_initial_cache.cmake)
Load necessary modules e.g. intel compilers etc.
run cmake -C initial_cache.cmake ..
run make -j x (x being the number of processes)

After compilation you will find an exectuable e.g. aims.XXX.scalapack.mpi.x which can then be used in aims-PAX.

A .cmake file for Meluxina can be found in fhi_aims_help, alongside a script to run the compilation (step 7 above). The latter also includes the necessary modules that have to be loaded at step 5 above.

Running

To run aims-PAX, one has to specify the settings for FHI aims, the MACE model and active learning.

make sure the following files are in the working directory:
- control.in,
- mace.yaml,
- aimsPAX.yaml
- either specify a geometry as geometry.in or provide a path in the MISC settings under path_to_geometry. For the latter, you can also specify a path to folder containing multiple geometry files (see Multi-system sampling).
run aims-PAX in the directory

The settings are explained below and aims-PAX automatically runs the all necessary steps. During the run you can observe its progress in the initial_dataset.logand active_learning.log inside the log directory (default is ./logs). At the end, the model(s) and data can be found in the results/ folder.

Take a look at the example and its explanation (example/explanation.md) for more details.

Note: The models are named like this: {name_exp}_{seed}.model where name_exp is set in mace.yaml (see Settings/MACE settings section below)

Both procedures, initial dataset acquisition and active learning, are classes that can be used independently from each other. To only create an initial dataset you can run aims-PAX-initial-ds . Equivalently, to only run the active learning procedure (given that the previous step has been done or all the necessary files are present), just run aims-PAX-al.

Note: you can also change the names of the settings file and run aims-PAX --mace_settings path/to/my_mace.yaml --aimsPAX-settings path/to/my_aimspax.yaml for example.

Common Pitfalls

Not specifying all required settings: Take a look at the settings below. Mandatory ones are marked by *.
Not properly specifying slurm settings in CLUSTER settings: PARSL launches independent jobs from the main process, which collects the jobs' results. This means that these jobs need all the infos and settings that a normal job on an HPC enviroment also needs. In practice, you have to make sure that all the necessary slurm variables are provided under slurm_str. Setting of environment variables, loading modules as well as sourcing the correct conda environment has to be specified under worker_str. You can find an example in the example folder.

Multi-system sampling

One of the major strengths of aims-PAX is running multiple trajectories at once to sample new points during active learning. It is possible to have multiple distinct geometries or chemical species in these different trajectories. For example, perhaps you want to start sampling for the same systems from various starting geometries or train a model different molecules, materials etc. at the same time.

To do so, as mentioned in the Running section above, you must provide the path to a folder containing ASE readable files under path_to_geometry under the MISC settings in the aimsPAX.yaml file.

aims-PAX then automatically reads all ASE readable files inside that directory and assigns it to a trajectory. If num_trajectories is smaller than the number of geometries present in the folder, aims-PAX will warn you and adjust num_trajectories to the number of actual geometries. If num_trajectories is larger than the number of geometries, aims-PAX will loop through the geometries again, assigning them to new trajectories until num_trajectories is satisfied.

If you want to use varying numbers of trajectories for each geometry you provide, the simplest solution for now is to duplicate each geometry file in the specified directory as many times as the number of trajectories you wish to run for that geometry.

During the initial dataset generation, at each step (n_points_per_sampling_step_idg *times* ensemble_size *times* number of geometries) will be sampled so that points are sampled for each geometry.

During active learning, the uncertainty threshold is shared across all geometries, at the moment.

Restarting

By default create_restart is set to True in MISC in the settings (see below). In that case, the state of the initial dataset generation or active learning run is frequently saved to a .npy file inside the restart directory. The procedure can be continued by just running aims-PAX(-al/initial-ds) again.

Settings

Description of all settings and their default values for FHI aims, aims-PAX, and MACE. First mandatory keywords are given and then they're sorted alphabetically.

FHI aims settings (control.in)

The settings here are the same as for usual FHI aims calculations (see the official FHI aims manual) and are parsed internally in aims-PAX. MD settings are not specified here.

As we are using ASE/ASI for running FHI aims, it is not needed to add the basis set information at the ned of the control.in file. That information is taken straight from the indicated species directory (see species_dir in the settings).

aims-PAX (aimsPAX.yaml)

Settings are given in a .yaml file. The file itself is split into multiple dictionaries: INITIAL_DATASET_GENERATION, ACTIVE_LEARNING, MD, MISC, CLUSTER.

Mandatory settings are indicated by *. Otherwise, they are optional and default settings are used if not specified differently.

Example settings can be found in the examples folder.

INITIAL_DATASET_GENERATION:

Parameter	Type	Default	Description
*species_dir	`str`	—	Path to the directory containing the FHI AIMS species defaults.
*n_points_per_sampling_step_idg	`int`	—	Number of points that is sampled at each step for each model in the ensemble and each geometry.
analysis	`bool`	`False`	Saves metrics such as losses during initial dataset generation.
desired_acc	`float`	`0.0`	Force MAE that the ensemble should reach on the validation set. Needs to be combined with `desired_acc_scale_idg`.
desired_acc_scale_idg	`float`	`10.0`	Scales `desired_acc` during initial dataset generation. Resulting product is accuracy that the model has to reach on the validation set before stopping the procedure at this stage.
ensemble_size	`int`	`4`	Number of models in the ensemble for uncertainty estimation.
foundational_model	`str`	`mace-mp`	Which foundational model to use for structure generation. Possible options: `mace-mp` or `so3lr`.
initial_foundational_size	`str`	`"small"`	Size of the foundational model used when `initial_sampling` is set to `mace-mp0`.
foundational_model_settings	`dict`	`{mace_model: small}`	Settings for the chosen foundational model for structure generation.
mace_model	`str`	`small`	Type of `MACE` foundational model. See here for their names.
dispersion_lr_damping	`str`	`None`	Damping parameter for dispersion interaction in `SO3LR`. Needed if `r_max_lr` is not `None`! Part of `foundational_model_settings`.
r_max_lr	`float`	`None`	Cutoff of long-range modules of `SO3LR`. Part of `foundational_model_settings`.
intermediate_epochs_idg	`int`	`5`	Number of intermediate epochs between dataset growth steps in initial training.
max_initial_epochs	`int` or `float`	`np.inf`	Maximum number of epochs for the initial training stage.
max_initial_set_size	`int` or `float`	`np.inf`	Maximum size of the initial training dataset.
progress_dft_update	`int`	`10`	Intervals at which progress of DFT calculations is logged.
scheduler_initial	`bool`	`True`	Whether to use a learning rate scheduler during initial training.
skip_step_initial	`int`	`25`	Intervals at which a structure is taken from the MD simulation either for the dataset in case of AIMD or for DFT in case of using an MLFF.
valid_ratio	`float`	`0.1`	Fraction of data reserved for validation.
valid_skip	`int`	`1`	Number of training steps between validation runs in initial training.

Convergence

After the initial dataset generation is finished aims PAX does not converge the model(s) on the final dataset by default. In case you want to do this the following keywords are relevant.

Parameter	Type	Default	Description
converge_initial	`bool`	`False`	Whether to converge the model(s) on the initial training set after a stopping criterion was met.
convergence_patience	`int`	`50`	Number of epochs without improvement before halting convergence.
margin	`float`	`0.002`	Margin to decide if a model has improved over the previous training epoch.
max_convergence_epochs	`int`	`500`	Maximum total epochs allowed before halting convergence

ACTIVE_LEARNING:

Parameter	Type	Default	Description
*species_dir	`str`	—	Path to the directory containing the FHI AIMS species defaults.
*num_trajectories	`int`	—	How many trajectories are sampled from during the active learning phase.
analysis	`bool`	`False`	Whether to run DFT calculations at specified intervals and save predicitions, uncertainties etc.
analysis_skip	`int`	`50`	Interval (in MD steps) at which analysis DFT calculations are performed.
c_x	`float`	`0.0`	Weighting factor for the uncertainty threshold (see Eq. 2 in the paper). < 0 tightens, > 0 relaxes the threshold
desired_acc	`float`	`0.`	Force MAE (eV/Å) that the ensemble should reach on the validation set.
ensemble_size	`int`	`4`	Number of models in the ensemble for uncertainty estimation.
epochs_per_worker	`int`	`2`	Number of training epochs per worker after DFT is done.
freeze_threshold_dataset	`float`	`np.inf`	Training set size at which the uncertainty threshold is frozen; `np.inf` disables freezing.
intermediate_epochs_al	`int`	`1`	Number of intermediate training epochs after DFT is done.
margin	`float`	`0.002`	Margin to decide if a model has improved over the previous training epoch.
max_MD_steps	`int`	`np.inf`	Maximum number of steps taken using the MLFF during active learning per trajectory.
max_train_set_size	`int`	`np.inf`	Maximum size of training set before procedure is stopped.
seeds_tags_dict	`dict` or `None`	`None`	Optional mapping of seed indices to trajectory tags for reproducible runs.
skip_step_mlff	`int`	`25`	Step interval for evaluating the uncertainty criterion during MD in active learning.
uncertainty_type	`str`	`"max_atomic_sd"`	Method for estimating prediction uncertainty. Default is max force standard deviation (See Eq. 1 in the paper).
uncert_not_crossed_limit	`int`	`50000`	Max consecutive steps without crossing uncertainty threshold after which the a point is treated as if it crossed the threshold. This is done in case the models are overly confident for a long time.
valid_ratio	`float`	`0.1`	Fraction of data reserved for validation during active learning.
valid_skip	`int`	`1`	Rate at which validation of model during training is performed.

Convergence

After the active learning is finished aims PAX converges the model(s) on the final dataset by default. The following keywords are only applied to this part.

Parameter	Type	Default	Description
converge_al	`bool`	`True`	Whether to converge the model(s) on the final training set at the end of active learning.
converge_best	`bool`	`True`	Whether to only converge the best performing model of the ensemble.
convergence_patience	`int`	`50`	Number of epochs without improvement before halting convergence.
max_convergence_epochs	`int`	`500`	Maximum total epochs allowed before halting convergence.

CLUSTER:

Settings for PARSL.

Parameter	Type	Default	Description
type	`str`	`'slurm'`	Cluster backend type. Currently only `slurm` is available.
*parsl_options	`dict`	—	Parsl configuration options.
*nodes_per_block	`int`	—	Number of nodes per block.
*init_blocks	`int`	—	Initial number of blocks to launch.
*min_blocks	`int`	—	Minimum number of blocks allowed.
*max_blocks	`int`	—	Maximum number of blocks allowed.
*label	`str`	—	Unique label for this Parsl configuration. IMPORTANT: If you run multiple instances of aims-PAX on the same machine make sure that the labels are unique for each instance!
run_dir	`str`	`None`	Directory to store runtime files.
function_dir	`str`	`None`	Directory for Parsl function storage.
*slurm_str	`str` (multiline)	—	SLURM job script header specifying job resources and options.
*worker_str	`str` (multiline)	—	Shell commands to configure the environment for each worker process e.g. loading modules, activating conda environment. IMPORTANT: On most systems it's necessary to set the following environment variable so that multiple jobs don't interfere with each other: `export WORK_QUEUE_DISABLE_SHARED_PORT=1`.
*launch_str	`str`	—	Command to run FHI aims e.g. `"srun path/to/aims/aims.XXX.scalapack.mpi.x >> aims.out"`
*calc_dir	`str`	—	Path to the directory used for calculation outputs.
clean_dirs	`bool`	`True`	Whether to remove calculation directories after DFT computations.

MD:

This part defines the settings for the molecular dynamics simulations during initial dataset generation and/or active learning. If only one set of settings is given, they are used for all systems/geometries. In case you want to use different settings for different systems or geometries you have specifiy which trajectory/system uses which system using their indices. Practically this means using a nested dictionary in the settings file:

MD:
  0:
    stat_ensemble: nvt
    thermostat: langevin
    temperature: 500
  1:    
    stat_ensemble: nvt
    thermostat: langevin
    temperature: 300

Here 0 and 1 refer to the indices used for giving the paths to the geometries and/or control files (see MISC settings below).

Currently these settings are used for ab initio and MLFF MD.

Parameter	Type	Default	Description
*stat_ensemble	`str`	—	Statistical ensemble for molecular dynamics (e.g., `NVT`, `NPT`).
barostat	`str`	`MTK`	Barostat used when `NPT` is chosen. Stands for Full Martyna-Tobias-Klein barostat.
friction	`float`	`0.001`	Friction coefficient for Langevin dynamics (in fs^-1).
MD_seed	`int`	`42`	Random number generator seed for Langevin dynamics.
pchain	`int`	`3`	Number of thermostats in the barostat chain for MTK dynamics.
pdamp	`float`	`500`	Pressure damping for MTK dynamics (`1000*timestep`).
ploop	`int`	`1`	Number of loops for barostat integration in MTK dynamics.
pressure	`float`	`101325.`	Pressure used for `NPT` in bar
tchain	`int`	`3`	Number of thermostats in the thermostat chain for MTK dynamics.
tdamp	`float`	`50`	Temperature damping for MTK dynamics (`100*timestep`).
temperature	`float`	`300`	Target temperature
thermostat	`str`	`Langevin`	Thermostat used when `NVT` is chosen.
timestep	`float`	0.5	Time step for molecular dynamics (in femtoseconds).
tloop	`int`	`1`	Number of loops for thermostat integration in MTK dynamics.

MISC:

The source of the geometries can be either a single path, a folder (where all ASE readable files will be loaded) or a dictionary of paths like:

  path_to_geometry:
    0: "path/geo1.in"
    1: "path/geo2.in"

Similarly, the source of the control files can be either a single path or a dictionary of paths like:

  path_to_control:
    0: "path/control1.in"
    1: "path/control2.in"

Note: Ideally you don't want to train your model on different levels of theory or DFT settings. Using different control files is mostly intended for using aims-PAX on periodic and non-periodic systems simulatenoeusly. Then you can specify that a k grid can be used for the periodic structure but not for the non-periodic one!

Parameter	Type	Default	Description
create_restart	`bool`	`True`	Whether to create restart files during the run.
dataset_dir	`str`	`"./data"`	Directory where dataset files will be stored.
log_dir	`str`	`"./logs"`	Directory where log files are saved.
path_to_control	`str`	`"./control.in"`	Path to the FHI aims control input file.
path_to_geometry	`str`	`"./geometry.in"`	Path to the geometry input file or folder.

MACE settings (mace.yaml)

The settings for the MACE model(s) are specified in the YAML file called 'mace.yaml'. We use exactly the same names as employed in the MACE code. This is mostly to show the structure of the .yaml file and its default values.

GENERAL

Parameter	Type	Default	Description
name_exp	`str`	-	This is the name given to the experiment and subsequently to the models and datasets.
checkpoints_dir	`str`	`"./checkpoints"`	Directory path for storing model checkpoints.
compute_stress	`bool`	`False`	Whether to compute stress tensors.
default_dtype	`str`	`"float32"`	Default data type for model parameters (float32/float64).
loss_dir	`str`	`"./losses"`	Directory path for storing training losses for each ensemble member.
model_dir	`str`	`"./model"`	Directory path for storing final trained models.
seed	`int`	`42`	Random seed (ensemble seeds are randomly chosen using this seed here.)

ARCHITECTURE

Parameter	Type	Default	Description
atomic_energies	`dict` or `None`	`None`	Atomic energy references for each element. Dictionary is structured as follows: {atomic_number: energy}. If `None`, atomic energies are determined using the training set using linear least squares.
compute_avg_num_neighbors	`bool`	`True`	Whether to compute average number of neighbors.
correlation	`int`	`3`	Correlation order for many-body interactions.
gate	`str`	`"silu"`	Activation function.
interaction	`str`	`"RealAgnosticResidualInteractionBlock"`	Type of interaction block.
interaction_first	`str`	`"RealAgnosticResidualInteractionBlock"`	Type of first interaction block.
max_ell	`int`	`3`	Maximum degree of direction embeddings.
max_L	`int`	`1`	Maximum degree for equivariant features.
MLP_irreps	`str`	`"16x0e"`	Irreps of the multi-layer perceptron in the last readout. Format is a `str` as defined in `e3nn`
model	`str`	`"ScaleShiftMACE"`	Type of MACE model architecture to use.
num_channels	`int`	`128`	Number of channels (features).
num_cutoff_basis	`int`	`5`	Number of cutoff basis functions.
num_interactions	`int`	`2`	Number of interaction layers.
num_radial_basis	`int`	`8`	Number of radial basis functions.
r_max	`float`	`5.0`	Cutoff radius (Å).
radial_MLP	`list`	`[64, 64, 64]`	Architecture of the radial MLP (hidden layer sizes).
radial_type	`str`	`"bessel"`	Type of radial basis functions.
scaling	`str`	`"rms_forces_scaling"`	Scaling method used.

TRAINING

Parameter	Type	Default	Description
amsgrad	`bool`	`True`	Whether to use AMSGrad variant of Adam optimizer.
batch_size	`int`	`5`	Batch size for training data.
clip_grad	`float`	`10.0`	Gradient clipping threshold.
config_type_weights	`dict`	`{"Default": 1.0}`	Weights for different configuration types.
ema	`bool`	`True`	Whether to use Exponential Moving Average.
ema_decay	`float`	`0.99`	Decay factor for exponential moving average.
energy_weight	`float`	`1.0`	Weight for energy loss component.
forces_weight	`float`	`1000.0`	Weight for forces loss component.
loss	`str`	`"weighted"`	Loss function type.
lr	`float`	`0.01`	Initial learning rate for optimizer.
lr_factor	`float`	`0.8`	Factor by which learning rate is reduced.
lr_scheduler_gamma	`float`	`0.9993`	Learning rate decay factor for scheduler.
optimizer	`str`	`"adam"`	Optimizer type (adam/adamw).
scheduler	`str`	`"ReduceLROnPlateau"`	Learning rate scheduler type.
scheduler_patience	`int`	`5`	Number of epochs to wait before reducing LR.
stress_weight	`float`	`1.0`	Weight for stress loss component.
swa	`bool`	`False`	Whether to use Stochastic Weight Averaging.
valid_batch_size	`int`	`5`	Batch size for validation data.
virials_weight	`float`	`1.0`	Weight for virials loss component.
weight_decay	`float`	`5.e-07`	L2 regularization weight decay factor.

MISC

Parameter	Type	Default	Description
device	`str`	`"cpu"`	Device for training (cpu/cuda).
error_table	`str`	`"PerAtomMAE"`	Type of error metrics to compute and display.
log_level	`str`	`"INFO"`	Logging level (DEBUG/INFO/WARNING/ERROR).

The workflow

Please consult the publication for a description of the aims-PAX workflow.

References

If you are using aims-PAX cite the main publication:

aims-PAX:

@misc{https://doi.org/10.48550/arxiv.2508.12888,
  doi = {10.48550/ARXIV.2508.12888},
  url = {https://arxiv.org/abs/2508.12888},
  author = {Henkes,  Tobias and Sharma,  Shubham and Tkatchenko,  Alexandre and Rossi,  Mariana and Poltavskyi,  Igor},
  keywords = {Chemical Physics (physics.chem-ph),  FOS: Physical sciences,  FOS: Physical sciences},
  title = {aims-PAX: Parallel Active eXploration for the automated construction of Machine Learning Force Fields},
  publisher = {arXiv},
  year = {2025},
  copyright = {arXiv.org perpetual,  non-exclusive license}
}

FHI-aims:

@misc{aims-roadmap-2025,
      title={Roadmap on Advancements of the FHI-aims Software Package}, 
      author={FHI aims community},
      year={2025},
      eprint={2505.00125},
      archivePrefix={arXiv},
      primaryClass={cond-mat.mtrl-sci},
      url={https://arxiv.org/abs/2505.00125}, 
}

MACE:

@inproceedings{Batatia2022mace,
  title={{MACE}: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields},
  author={Ilyes Batatia and David Peter Kovacs and Gregor N. C. Simm and Christoph Ortner and Gabor Csanyi},
  booktitle={Advances in Neural Information Processing Systems},
  editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
  year={2022},
  url={https://openreview.net/forum?id=YPpSngE-ZU}
}

@misc{Batatia2022Design,
  title = {The Design Space of E(3)-Equivariant Atom-Centered Interatomic Potentials},
  author = {Batatia, Ilyes and Batzner, Simon and Kov{\'a}cs, D{\'a}vid P{\'e}ter and Musaelian, Albert and Simm, Gregor N. C. and Drautz, Ralf and Ortner, Christoph and Kozinsky, Boris and Cs{\'a}nyi, G{\'a}bor},
  year = {2022},
  number = {arXiv:2205.06643},
  eprint = {2205.06643},
  eprinttype = {arxiv},
  doi = {10.48550/arXiv.2205.06643},
  archiveprefix = {arXiv}
 }

So3krates/SO3LR:

If you are using SO3LR please cite:

@article{kabylda2024molecular,
  title={Molecular Simulations with a Pretrained Neural Network and Universal Pairwise Force Fields},
  author={Kabylda, A. and Frank, J. T. and Dou, S. S. and Khabibrakhmanov, A. and Sandonas, L. M.
          and Unke, O. T. and Chmiela, S. and M{\"u}ller, K.R. and Tkatchenko, A.},
  journal={ChemRxiv},
  year={2024},
  doi={10.26434/chemrxiv-2024-bdfr0-v2}
}

@article{frank2024euclidean,
  title={A Euclidean transformer for fast and stable machine learned force fields},
  author={Frank, Thorben and Unke, Oliver and M{\"u}ller, Klaus-Robert and Chmiela, Stefan},
  journal={Nature Communications},
  volume={15},
  number={1},
  pages={6539},
  year={2024}
}

Contact

If you have questions you can reach us at: [email protected] or [email protected]

For bugs or feature requests, please use GitHub Issues.

License

The aims-PAX code is published and distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
example		example
fhi_aims_help		fhi_aims_help
readme_figs		readme_figs
src/aims_PAX		src/aims_PAX
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Documentation

Installation

aims-PAX

FHI aims

Running

Common Pitfalls

Multi-system sampling

Restarting

Settings

FHI aims settings (control.in)

aims-PAX (aimsPAX.yaml)

INITIAL_DATASET_GENERATION:

Convergence

ACTIVE_LEARNING:

Convergence

CLUSTER:

MD:

MISC:

MACE settings (mace.yaml)

GENERAL

ARCHITECTURE

TRAINING

MISC

The workflow

References

Contact

License

About

Uh oh!

Releases

Packages

Languages

License

tohenkes/aims-PAX

Folders and files

Latest commit

History

Repository files navigation

Documentation

Installation

aims-PAX

FHI aims

Running

Common Pitfalls

Multi-system sampling

Restarting

Settings

FHI aims settings (control.in)

aims-PAX (aimsPAX.yaml)

INITIAL_DATASET_GENERATION:

Convergence

ACTIVE_LEARNING:

Convergence

CLUSTER:

MD:

MISC:

MACE settings (mace.yaml)

GENERAL

ARCHITECTURE

TRAINING

MISC

The workflow

References

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages