Skip to content

mxhulab/cryopros

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

banner

CryoPROS: Correcting Misalignment Caused by Preferred Orientation Using AI-generated Auxiliary Particles.

CryoPROS is a computational framework specifically designed to tackle misalignment errors caused by preferred orientation issues in single-particle cryo-EM. It addresses these challenges by co-refining synthesized and experimental data. By utilizing a self-supervised deep generative model, cryoPROS synthesizes auxiliary particles that effectively eliminate these misalignment errors through a co-refinement process.

Video Tutorial

[TBD]

Preprint

For more details, please refer to the preprint "Addressing preferred orientation in single-particle cryo-EM through AI-generated auxiliary particles".

The List of Available Demo Cases

dataset cast study exepected result
untitled HA-trimer (EMPIAR-10096) link 3.49Å model-to-map resolution density map

Installation

CryoPROS is free software developed in Python and is available as a Python package. You can access its distributions on GitHub.

Prerequisites

  • Python version 3.10 or 3.12.
  • NVIDIA CUDA library 9.2 or later installed in the user's environment.

Dependencies

  • torch
  • torchvision
  • mrcfile>=1.3
  • scipy>=1.6.2
  • tqdm>=4.59
  • numpy>=1.21.5
  • pandas>=1.3.2
  • opencv-python
  • matplotlib

Preparation of CUDA Environment

Creating and Activating a Conda Virtual Environment

First, create a Conda virtual environment named cryopros with Python 3.10 by running the following command:

conda create -n cryopros python==3.10

After creating the environment, activate it using:

conda activate cryopros

Installing CryoPROS

Install the CryoPROS using pip with the following command:

git clone https://github.com/mxhulab/cryopros.git
cd cryoPROS_source_code
pip install .

Verifying Installation

You can verify whether cryoPROS has been installed successfully by running the following command:

cryopros-generate -h

This should display the help information for cryoPROS, indicating a successful installation.

Tutorial

Workflow Diagram of CryoPROS

workflow

CryoPROS is composed of two primary modules: the generative module and the co-refinement module, and includes an optional sub-module for heterogeneous reconstruction.

Five Executable Binaries Included in CryoPROS

CryoPROS consists of five executable binaries, as listed in the following table:

binary name category purpose options/argument
cryopros-train core Training a conditional VAE deep neural network model from an input initial volume and raw particles with given imaging parameters. see
cryopros-generate core Generating an auxiliary particle stack from a pre-trained conditional VAE deep neural network model. see
cryopros-gen-mask utility Generating a volume mask for a given input volume and corresponding threshold. see
cryopros-recondismic optional Reconstructing the micelle/nanodisc density map from an input initial volume, a mask volume and raw particles with given imaging parameters. see

Integrating CryoPROS's Executable Binaries with Cryo-EM Softwares to Address Preferred Orientation Challenges

Using cryoPROS to address the preferred orientation issue in single-particle cryo-EM involves integrating these submodules with other cryo-EM software, such as Relion, CryoSPARC, EMReady and cryoDRGN. This integration is user-defined and can be customized based on different datasets. To demonstrate the effectiveness of cryoPROS, case studies are provided.

Case Study: Achieving 3.49Å Resolution for an Untitled HA-Trimer (EMPIAR-10096)

CryoPROS facilitates the recovery of near-atomic-resolution details from the untitled HA-trimer dataset. result

  • a, The top row illustrates the pose distribution obtained through the tilt strategy (130,000 particles), while the middle (side view) and bottom rows (top view) depict the reconstructed density maps of the tilt-collected dataset: autorefinement (pink) and state-of-the-art results (violet). Notably, achieving the state-of-the-art result necessitates intricate subsequent refinements at the per-particle level, involving multi-round 3D classification, defocus refinement, and Bayesian polishing.
  • b, Similar to a, with the first row showcasing the pose distribution of untilted raw particle stacks (130,000 particles), the generated particles (130,000 particles), and selected subset for local refinement (31,146 particles). Reconstructed density maps of the untilted dataset, including autorefinement (yellow), cryoPROS (cyan), and cryoPROS with follow-up local refinement (magenta), are presented. Maps in a and b are superimposed on the envelope of the HA-trimer atomic model (PDB ID: 3WHE, grey).
  • c, Detailed close-ups of selected parts of the density maps shown in a and b, ordered consistently. The first and second rows display regions of alpha-helix and beta-sheet with low transparency, respectively. The third row and fourth row show the selected regions in gray mesh style, with the embedded atomic model colored by average Q-score.sc value and average Q-score.bb value, respectively.

Step 1: Download Untitled HA-Trimer Dataset (EMPIAR-10096)

Download EMPIAR-10096 (~32GB). You can download it directly from the command line:

wget -nH -m ftp://ftp.ebi.ac.uk/empiar/world_availability/10096/data/Particle-Stack/

This dataset contains 130,000 extracted particles with box size of 256 and pixel size of 1.31Å/pix.

dataset

The CTF parameters for each particle are in the metadata file T00_HA_130K-Equalized_run-data.star.

Step 2: Ab-initio auto-refinement

Perform ab-initio auto-refinement:

  • Import the downloaded data into relion and execute the 3D initial model task.
  • Import the raw data and initial volume obtained by relion into CryoSPARC and perform the Non-uniform Refinement task on raw particles with C3 symmetry.

The expected outcome of the process described above is a density map accompanied by a pose metafile:

J379

This pose metafile needs to be converted into the STAR file format to facilitate downstream training. This conversion can be achieved using csparc2star.py from the pyem.

python csparc2star.py cryosparc_P68_J379_005_particles.cs autorefinement.star

The expected result, autorefinement.star, which includes the estimated pose parameters, can be downloaded from this link.

Step 3: Generate the initial latent volume

molmap #1 2.62 onGrid #0
fit #2 in #0
vop resample #2 onGrid #0
save #2 6idd_align.mrc
  • Or in the ChimeraX command line:
molmap #2 2.62 onGrid #1
fitmap #3 inMap #1 
vop resample #3 onGrid #1
save 6idd_align.mrc #3

Finnaly, use Relion to apply low-pass filtering to the aligned volume (6idd_align.mrc), which will generate the initial latent volume 6idd_align_lp10.mrc necessary for the first iteration of training in cryoPROS, via the command:

relion_image_handler --i 6idd_align.mrc --o 6idd_align_lp10.mrc --lowpass 10

The expected result, 6idd_align_lp10.mrc, can be downloaded from this link.

Step 4: Iteration 1: Train the neural network in the generative module

The particles T00_HA_130K-Equalized-Particle-Stack.mrcs and their refined poses, available at autorefinement.star, are utilized to train the neural network within the generative module. This training starts with the initial latent volume, which can be accessed at 6idd_align_lp10.mrc, via command:

cryopros-train \
--gpu_ids 0 1 2 3 \
--task_name HAtrimer_iteration_1 \
--box_size 256 \
--Apix 1.31 \
--volume_scale 50 \
--init_volume_path 6idd_align_lp10.mrc \
--data_path T00_HA_130K-Equalized-Particle-Stack.mrcs \
--param_path autorefinement.star \
--invert \
--dataloader_batch_size 8 \
--dataloader_num_workers 0 \

4 GPUs are utilized for training in the aforementioned setting. Adjust the --gpu_ids option to accommodate your computing environment.

Upon completion of the above command:

  • A directory named ./generate/HAtrimer_iteration_1 will be created.
  • The training log will be stored at ./generate/HAtrimer_iteration_1/train.log.
  • The trained neural networks will be saved under ./generate/HAtrimer_iteration_1/models/.

Pay attention to the KL loss in the training log. If it remains very low (e.g., ~1e-9), please restart the training process. This often indicates a common issue called posterior collapse.

The expected trained neural network (HAtrimer_iteration_1.pth, actually the latest.pth under ./generate/HAtrimer_iteration_1/models/) can be downloaded from this link.

Step 5: Iteration 1: Generate auxiliary particles with the trained neural network

The auxiliary particles can be generated using the neural network that was trained in the preceding step, with the command (specifying GPU 0):

export CUDA_VISIBLE_DEVICES=0
cryopros-generate \
--model_path HAtrimer_iteration_1.pth \
--param_path autorefinement.star \
--output_path generated_HAtrimer_iteration_1 \
--gen_name HAtrimer_iteration_1_generated_particles \
--batch_size 50 \
--box_size 256 \
--Apix 1.31 \
--invert \
--gen_mode 2 \

The algorithm will only utilize the CTF parameters from autorefinement.star and will generate uniform poses to replace the original poses in autorefinement.star.

Generated auxiliary particles are saved in ./generated_HAtrimer_iteration_1/HAtrimer_iteration_1_generated_particles.mrcs with the corresponding star file in ./generated_HAtrimer_iteration_1/HAtrimer_iteration_1_generated_particles.star.

Step 6: Iteration 1: Co-refinement using a combination of raw particles and synthesized auxiliary particles

Perform Non-uniform Refinement in cryoSPARC using a combination of raw particles (T00_HA_130K-Equalized-Particle-Stack.mrcs) and synthesized auxiliary particles (HAtrimer_iteration_1_generated_particles.mrcs). The parameter settings for this process are:

Step 7: Iteration 1: Reconstruction-only with raw particles and their pose esimated in the co-refinement step

After completing the co-refinement, use the Particle Sets Tool in cryoSPARC to separate the raw particles from the combination of raw and synthesized auxiliary particles. The parameter settings for this process are:

  • Particles (A): raw and auxiliary particles in Step 6.
  • Particles (B): raw particles.
  • Action: intersect.

[Optional] Conduct 2D classification of raw particles and manual pick a subset with less top view (62,952 particles).

J2581

Then, execute Homogeneous Reconstruction Only task on raw particles subset. The expected density map (cryosparc_P68_J2581_volume_map_sharp.mrc) can be download from this link.

Next, export the poses of the raw particles as a star file (2581.star) by exporting the cryoSPARC job and using the csparc2star.py script from the pyem package.

The expected result (2581.star) can be downloaded from this link.

Finally, Use Relion to generate the subset stack (raw_iter_2.mrcs) by this command:

relion_stack_create --i 2581.star --o raw_iter_2

Note that the 2581.star file should be placed in the proper path corresponding to the raw particles' path. Here, put it in the same directory where the cryoSPARC project is stored.

Step 8: Iteration 2: Train the neural network in the generative module

The training process follows the approach outlined in Step 4.

cryopros-train \
--gpu_ids 0 1 2 3 \
--task_name HAtrimer_iteration_2 \
--box_size 256 \
--Apix 1.31 \
--volume_scale 50 \
--init_volume_path cryosparc_P68_J2581_volume_map_sharp.mrc \
--data_path raw_iter_2.mrcs \
--param_path 2581.star \
--invert \
--dataloader_batch_size 8 \
--dataloader_num_workers 0 \

Upon completion of the above command:

  • A directory named ./generate/HAtrimer_iteration_2 will be created.
  • The training log will be stored at ./generate/HAtrimer_iteration_2/train.log.
  • The trained neural networks will be saved under ./generate/HAtrimer_iteration_2/models/.

Pay attention to the KL loss in the training log. If it remains very low (e.g., ~1e-9), please restart the training process. This often indicates a common issue called posterior collapse.

The expected trained neural network (HAtrimer_iteration_2.pth, actually the latest.pth under ./generate/HAtrimer_iteration_2/models/) can be downloaded from this link.

Step 9: Iteration 2: Generate auxiliary particles with the trained neural network

The generating process follows the approach outlined in Step 5.

cryopros-generate \
--model_path HAtrimer_iteration_2.pth \
--param_path autorefinement.star \
--output_path generated_HAtrimer_iteration_2/ \
--gen_name HAtrimer_iteration_2_generated_particles \
--batch_size 50 \
--box_size 256 \
--Apix 1.31 \
--invert \
--gen_mode 2 \

The algorithm will only utilize the CTF parameters from autorefinement.star and will generate uniform poses to replace the original poses in autorefinement.star.

Generated auxiliary particles are saved in ./generated_HAtrimer_iteration_2/HAtrimer_iteration_2_generated_particles.mrcs with the corresponding star file in ./generated_HAtrimer_iteration_2/HAtrimer_iteration_2_generated_particles.star.

Step 10: Iteration 2: Co-refinement using a combination of raw particles and synthesized auxiliary particles

The co-refinement process is identical to that described in Step 6, with the only difference being the use of the auxiliary particle stack HAtrimer_iteration_2_generated_particles.mrcs instead of HAtrimer_iteration_1_generated_particles.mrcs.

Step 11: Iteration 2: Reconstruction-only with raw particles and their pose esimated in the co-refinement step

This step mirrors Step 7, with the exception that the optional 2D classification followed by selection is omitted.

The expected result, consisting of poses of raw particles obtained from the second iteration co-refinement and named 2596.star, can be downloaded from this link.

The expected density map (cryosparc_P68_J2599_volume_map_sharp.mrc) can be downloaded from this link.

Step 12: Post-processing by EMReady

Install EMReady by following the instructions provided.

Next, postprocess the density map obtained in the previous step using the following command:

EMReady.sh cryosparc_P68_J2599_volume_map_sharp.mrc 2599_refined.mrc
relion_image_handler --i 2599_refined.mrc --o 2599_refined.mrc --new_box 256 --rescale_angpix 1.31

The expected processed density map ('2599_refined.mrc') can be downloaded from this link.

Step 13: Local Refinement

This step involves performing local refinement on a subset of particles that exhibit relatively balanced poses:

Firstly, conduct a 2D classification of the raw particles using the 2596.star file. Manually select a subset of particles with balanced poses, such as those in 2599_subset.star, which contains 31,146 particles.

Then, perform a Homogeneous Reconstruction Only on this selected subset to obtain a density map and corresponding mask file. The results are available at cryosparc_P68_J4657_volume_map_sharp.mrc and cryosparc_P68_J4657_volume_mask_fsc.mrc respectively.

Finally, perform a Local Refinement task using the following settings:

The expected refined density map ('cryosparc_P68_J4826_volume_map_sharp.mrc') can be downloaded from this link.

Step 14: Post-processing by EMReady

This step is identical to Step 12.

The expected processed density map (4826_refined.mrc) can be downloaded from this link.

Options/Arguments

Options/Arguments of cryopros-train

$ cryopros-train -h
usage: cryopros-train [-h] --box_size BOX_SIZE --Apix APIX --init_volume_path INIT_VOLUME_PATH --data_path
                     DATA_PATH --param_path PARAM_PATH --gpu_ids GPU_IDS [GPU_IDS ...] [--invert]
                     [--task_name TASK_NAME] [--volume_scale VOLUME_SCALE]
                     [--dataloader_batch_size DATALOADER_BATCH_SIZE]
                     [--dataloader_num_workers DATALOADER_NUM_WORKERS] [--lr LR] [--KL_weight KL_WEIGHT]
                     [--max_iter MAX_ITER]

Training a conditional VAE deep neural network model from an input initial volume and raw particles with given
imaging parameters.

options:
  -h, --help            show this help message and exit
  --box_size BOX_SIZE   box size
  --Apix APIX           pixel size in Angstrom
  --init_volume_path INIT_VOLUME_PATH
                        input inital volume path
  --data_path DATA_PATH
                        input raw particles path
  --param_path PARAM_PATH
                        path of star file which contains the imaging parameters
  --gpu_ids GPU_IDS [GPU_IDS ...]
                        GPU IDs to utilize
  --invert              invert the image sign
  --task_name TASK_NAME
                        task name
  --volume_scale VOLUME_SCALE
                        scale factor
  --dataloader_batch_size DATALOADER_BATCH_SIZE
                        batch size to load data
  --dataloader_num_workers DATALOADER_NUM_WORKERS
                        number of workers to load data
  --lr LR               learning rate
  --KL_weight KL_WEIGHT
                        KL weight
  --max_iter MAX_ITER   max number of iterations

Options/Arguments of cryopros-generate

$ cryopros-generate -h
usage: cryopros-generate [-h] --model_path MODEL_PATH --output_path OUTPUT_PATH --box_size BOX_SIZE --Apix APIX --gen_name GEN_NAME --param_path
                              PARAM_PATH [--invert] [--batch_size BATCH_SIZE] [--num_max NUM_MAX] [--data_scale DATA_SCALE] [--gen_mode GEN_MODE]
                              [--nls NLS [NLS ...]]

Generating an auxiliary particle stack from a pre-trained conditional VAE deep neural network model.

options:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH
                        input pretrained model path
  --output_path OUTPUT_PATH
                        output output synthesized auxiliary particle stack
  --box_size BOX_SIZE   box size
  --Apix APIX           pixel size in Angstrom
  --gen_name GEN_NAME   filename of the generated auxiliary particle stack
  --param_path PARAM_PATH
                        path of star file which contains the imaging parameters
  --invert              invert the image sign
  --batch_size BATCH_SIZE
                        batch size
  --num_max NUM_MAX     maximum number particles to generate
  --data_scale DATA_SCALE
                        scale factor
  --gen_mode GEN_MODE   storage model of the synthesized particles; mode 0 is int; mode 2 is float
  --nls NLS [NLS ...]   number of layers of the neural network

Options/Arguments of cryopros-gen-mask

$ cryopros-gen-mask -h
usage: cryopros-gen-mask [-h] [-h] --volume_path VOLUME_PATH --result_path RESULT_PATH --threshold THRESHOLD

Generating a volume mask for a given input volume and corresponding threshold.

options:
  -h, --help            show this help message and exit
  --volume_path VOLUME_PATH
                        input volume path
  --result_path RESULT_PATH
                        output mask path
  --threshold THRESHOLD

Options/Arguments of cryopros-recondismic

$ cryopros-recondismic -h
usage: cryopros-recondismic [-h] --box_size BOX_SIZE --Apix APIX --init_volume_path INIT_VOLUME_PATH --mask_path
                        MASK_PATH --data_path DATA_PATH --param_path PARAM_PATH --gpu_ids GPU_IDS [GPU_IDS ...]
                        [--invert] [--task_name TASK_NAME] [--volume_scale VOLUME_SCALE]
                        [--dataloader_batch_size DATALOADER_BATCH_SIZE]
                        [--dataloader_num_workers DATALOADER_NUM_WORKERS] [--lr LR] [--KL_weight KL_WEIGHT]
                        [--max_iter MAX_ITER]

Reconstructing the micelle/nanodisc density map from an input initial volume, a mask volume and raw particles
with given imaging parameters.

options:
  -h, --help            show this help message and exit
  --box_size BOX_SIZE   box size
  --Apix APIX           pixel size in Angstrom
  --init_volume_path INIT_VOLUME_PATH
                        input inital volume path
  --mask_path MASK_PATH
                        mask volume path
  --data_path DATA_PATH
                        input raw particles path
  --param_path PARAM_PATH
                        path of star file which contains the imaging parameters
  --gpu_ids GPU_IDS [GPU_IDS ...]
                        GPU IDs to utilize
  --invert              invert the image sign
  --task_name TASK_NAME
                        task name
  --volume_scale VOLUME_SCALE
                        scale factor
  --dataloader_batch_size DATALOADER_BATCH_SIZE
                        batch size to load data
  --dataloader_num_workers DATALOADER_NUM_WORKERS
                        number of workers to load data
  --lr LR               learning rate
  --KL_weight KL_WEIGHT
                        KL weight
  --max_iter MAX_ITER   max number of iterations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages