Code associated with the paper A Learning Framework for Atomic-level Polymer Structure Generation
This library contains work for structure generation of polymers. The repo consists of two main models and various iterations between them.
- polyGen: is a polymer structure generation based on a latent diffusion model.
- An autoencoder maps a joint latent space between molecular DFT-optimized structures and polymer DFT-optimized structures.
- The diffusion model works within the latent space to generate new structures from noise given an initial conditioning based on a molecular graph.
The recommended way to set up the environment is via the provided environment.yml file, which ensures all dependencies (both conda and pip) are installed correctly.
# create and activate the conda environment
conda env create -f environment.yml
conda activate Poly-DiT- This will install all required packages, including those from both conda and pip.
- If you prefer a pip-only install, you may use
requirements.txt, but note that it may not include all dependencies or the correct versions for all platforms. - You might need to manually install a package if the initial runs cause errors.
Paper Results
All the evaluations from the paper and model verisons mentioned can be found in the data_analysis folder.
Evaluating models:
Our codebase is built upon the , whose README provides a general overview of usage.
Before running any scripts, ensure you have activated the conda environment:
conda activate Poly-DiT├── configs <- Hydra configs
│ ├── autoencoder_module <- VAE LitModule configs
│ ├── callbacks <- Callbacks configs
│ ├── data <- Data configs
│ ├── encoder <- VAE encoder configs
│ ├── decoder <- VAE decoder configs
│ ├── diffusion_module <- Latent diffusion/DiT LitModule configs
│ ├── extras <- Extra utilities configs
│ ├── hydra <- Hydra configs
│ ├── logger <- Logger configs (for W&B)
│ ├── paths <- Project paths configs
│ ├── trainer <- Trainer configs
│ │
│ ├── train_autoencoder.yaml <- Main config for training VAEs
│ ├── train_diffusion.yaml <- Main config for training DiTs
│ ├── eval_autoencoder.yaml <- Main config for evaluating trained VAEs
│ └── eval_diffusion.yaml <- Main config for sampling from/evaluating trained DiTs
|
├── src <- Source code (same directory structure as configs)
|
├── slurm <- Example slurm scripts for launching training runs and experiments
|
├── data <- Datasets directory
|
├── data_analysis <- Scripts for analyzing results and generating plots; contains output plots and analysis scripts
|
├── logs <- Model checkpoints and raw generated files from training/evaluation runs
|
├── .gitignore <- List of files ignored by git
├── .pre-commit-config.yaml <- Configuration of pre-commit hooks for code formatting
├── .project-root <- File for inferring the position of project root directory
├── CODE_OF_CONDUCT.md <- Code of conduct to define community standards
├── pyproject.toml <- Configuration options for testing and linting
├── requirements.txt <- File for installing python dependencies (pip only; may be incomplete)
├── environment.yml <- Full environment specification (conda + pip)
|
└── README.md
Portions of the code are adapted from All-Atom_Diffusion_Transformer and lightning-hydra template.
If you would like to use the dataset for this study for an academic organization and study, please contact ramprasad-group-github@mse.gatech.edu.