Prepare PDB files for MD minimization with OpenMM Amber14 forcefield.

pdbprep is not a Python package. It is a series of Python scripts orchestrated by a bash file that defines the preparation pipeline; hence, you can run each script individually if needed.

Performed tasks

pdbpred ensures the treated PDBs comply with the consistency standards needed for DeepRank input. pdbprep is a modular pipeline that performs the following tasks:

Cleans PDBs (uses scripts adapted from the pdb-toos package):
1. Keeps only coordinate lines to simplify the input PDBs
2. Removes water (HOH) molecules
3. Replaces certain residue names to standard names, for example MSE to MET and HIP to HIS
4. Selects the most probable alternative locations (discards others)
5. Fixes inserts
6. Sorts chains and residues (necessary for OpenMM)
7. Renumber residues starting from 1
8. Renumber atoms starting from 1
9. Cleans the PDB (uses pdb_tidy)
Runs PRAS to add missing heavy atoms
Runs pdb2pqr to calculate the protonation states of the polar hydrogens
Reads the calculated protonation states and prepares a file that will serve as input to OpenMM
Adds all hydrogens with OpenMM and the Amber14 forcefield using the information provided in the previous step

How to install

Clone this repository

Clone this repository:

git clone https://github.com/DeepRank/pdbprep

Install dependencies

pdbpred requires Python 3 and a bash shell to run. Ensure these are installed. We suggest using Anaconda.

Install the following dependencies in the Python environment you wish to run pdbpred.

Install OpenMM: conda install -c conda-forge openmm
Install chardet: conda install chardet
Install pdb2pqr: pip install pdb2pqr
Install @joaomcteixeira fork of the Pras_Server as follows and outside the pdbpred folder:

# clone the fork and compile the software from the `nolog` branch
git clone https://github.com/joaomcteixeira/Pras_Server
cd Pras_Server
git fetch
git checkout nolog
cd Pras_Server_C++
g++ -std=c++17 src/*.cpp -o PRAS

Why installing the fork and not the original source? Because in the fork's branch the logging operations were removed to avoid writing thousand of log files to disk. All credit about PRAS should be given to the original authors:

Give execution permission to files

In the pdbprep repository folder, give the necessary permissions to the *.py and run_pdbprep.sh files:

chmod u+x *.py run_pdbprep.sh

The pdb_*.py files were adapted from the pdb-tools project from Alexandre Bonvin lab.

http://www.bonvinlab.org/pdb-tools/

Here, @joaomcteixeira modifed the scripts reducing their versatility to improve their speed. Hence, the pdb-tools scripts provided here won't work outside the pdbprep context. If you want to use pdb-tools for any other need, install the official package pip install pdb-tools, and cite the original work.

Configure the path to the PRAS executable

Add the absolute path to the PRAS file in the run_pdbprep.sh file (edit line 4 of run_pdbprep.sh).

How to run

Source the `setup.sh` file

From within the pdbprep folder, source the setup.sh file: source setup.sh. You need to perform this operation every time you want to use pdbprep in a new terminal window.

Prepare a list of the target PDBs

To prepare the PDB files:

Navigate to the folder where you want the new PDBs to be saved.
create a file with the list of paths to the input PDB files. Paths can be relative to the current folder or absolute. You can use ls path/to/my/pdbs/*.pdb > pdblist to perform this operation. The file should contain lines like the following, pointing to the input PDBs:

A6/6A6I.pdb
AY/7AYE.pdb
D2/7D2T.pdb
GS/6GS2.pdb

Run the pipeline - prepare the PDBs!

To execute the pipeline on the list of target PDBs, run:

run_pdbprep.sh pdblist <N>

Where N is the number of threads (cores) you want to use. The multithreading operation follows an embarrassingly parallel scheme where each thread will take a PDB from the list and process it independently until the end.

What to expect?

The run_pdbprep.sh script will create a series of numbered indexed folders to store the temporary PDBs for the different intermediate steps (0_*, 1_*, 2_*, ...). If the preparation succeeds, the temporary PDBs will be deleted and only those in the last folder 4_ready_to_minimize will be saved. If something goes wrong with a PDB, its intermediate temporary files won't be deleted so that errors can be traced.

At startup, run_pdbprep.sh will delete all temporary folders (and the files inside), keeping only the folder with the ready to minimize structures. run_pdbpred.sh will skip those PDBs listed in the input pdblist that were already treated and are present in the 4_ready_to_minimize folder. Therefore, you can restart a previously halted run without needing to repeat the already completed PDBs.

Additional Features

Calculate interface pairwise energies

Use the evaluate_pairwise_energies.py script to calculate atom-atom energies of a protein complex interface. Here, the interface is defined by a configurable distance parameter (defaults to 5 Angstroms). The script has several parameters to configure the execution and the calculation. Example:

$ python evaluate_pairwise_energies.py -h  # for listing all options
$ python evaluate_pairwise_energies.py -s FILE.pdb  # to run on FILE.pdb

This will output a file of the following format:

chainA resnameA resiA atomA - chainB resnameB resiB atomA LJ Coulomb (kcal/mole)
A LEU 20   HD13 - B ASN 60   OD1     -0.00803    -4.03991                       
A PRO 21   N    - B ALA 46   HB1     -0.01046    -1.06165                       
A PRO 21   N    - B ARG 48   HD2     -0.00725    -1.16984 
(...)
Total LJ: -22.92113 (kcal/mole)
Total Coulomb: -278.20533 (kcal/mole)

Troubleshooting

Can't find Python

In case the script can find the Python interpreter, type whereis python (or which python) and update the python path in the shabang (1st line) of all *.py files accordingly.

Need help?

Citing

When using pdbpred you should acknowledge the following software, follow the links for information about how to cite:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performed tasks

How to install

Clone this repository

Install dependencies

Give execution permission to files

Configure the path to the PRAS executable

How to run

Source the `setup.sh` file

Prepare a list of the target PDBs

Run the pipeline - prepare the PDBs!

What to expect?

Additional Features

Calculate interface pairwise energies

Troubleshooting

Can't find Python

Need help?

Citing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Performed tasks

How to install

Clone this repository

Install dependencies

Give execution permission to files

Configure the path to the PRAS executable

How to run

Source the setup.sh file

Prepare a list of the target PDBs

Run the pipeline - prepare the PDBs!

What to expect?

Additional Features

Calculate interface pairwise energies

Troubleshooting

Can't find Python

Need help?

Citing

Source the `setup.sh` file