A reinforcement learning framework in latent diffusion models for crystal structure generation using group relative policy optimization.
Chemeleon2 implements a three-stage pipeline for crystal structure generation:
- VAE Module: Encodes crystal structures into latent space representations
- LDM Module: Samples crystal structures in latent space using diffusion Transformer
- RL Module: Fine-tunes the LDM with reward functions
# Clone the repository
git clone https://github.com/hspark1212/chemeleon2
cd chemeleon2
# Install dependencies with uv
uv syncTip:
uv syncinstalls dependencies based on theuv.lockfile, ensuring reproducible environments. If you encounter issues withuv.lock(e.g., lock file conflicts or compatibility problems), you can useuv pip install -e .as an alternative to install the package in editable mode directly frompyproject.toml.
# (Optional) Install development dependencies (pytest, ruff, pyright, etc.)
uv sync --extra dev
# (Optional) Install metrics dependencies for evaluation (mace-torch, smact)
uv sync --extra metricsAfter completing uv sync, install a PyTorch version compatible with your CUDA environment to prevent compatibility issues.
For version-specific installation commands, visit the PyTorch official website.
# (Optional) Example command for for PyTorch 2.7.0 with CUDA 12.8
uv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128For a simple walkthrough of sampling and evaluation, see tutorial.ipynb.
Chemeleon2 uses a three-stage training pipeline: VAE → LDM → RL.
For detailed instructions, see:
- Training Guide - VAE, LDM, RL, and predictor training
- Evaluation Guide - Sampling and model evaluation/metrics
To benchmark de novo generation (DNG), 10,000 sampled structures are available in the benchmarks/dng/ directory:
- MP-20:
chemeleon2_rl_dng_mp_20.json.gz- 10,000 generated structures using RL-trained model on MP-20 - Alex-MP-20:
chemeleon2_rl_dng_alex_mp_20.json.gz- 10,000 generated structures using RL-trained model on Alex-MP-20
The compressed json files can be load them using from monty.serialization import loadfn.
We welcome contributions! Please see CONTRIBUTING.md for detailed setup instructions, development workflow, and guidelines.
@article{Park2025chemeleon2,
title={Guiding Generative Models to Uncover Diverse and Novel Crystals via Reinforcement Learning},
author={Hyunsoo Park and Aron Walsh},
year={2025},
url={https://arxiv.org/abs/2511.07158}
}
This work is inspired by the following projects:
Chemeleon2 is licensed under the MIT License. See LICENSE for more information.

