Prepare PDB files for MD minimization with OpenMM Amber14 forcefield.
pdbprep is not a Python package. It is a series of Python scripts orchestrated
by a bash file that defines the preparation pipeline; hence, you can run each script
individually if needed.
pdbpred ensures the treated PDBs comply with the consistency standards needed
for DeepRank input. pdbprep is a modular pipeline that performs the following
tasks:
- Cleans PDBs (uses scripts adapted from the
pdb-toospackage):- Keeps only coordinate lines to simplify the input PDBs
- Removes water (
HOH) molecules - Replaces certain residue names to standard names, for example
MSEtoMETandHIPtoHIS - Selects the most probable alternative locations (discards others)
- Fixes inserts
- Sorts chains and residues (necessary for OpenMM)
- Renumber residues starting from 1
- Renumber atoms starting from 1
- Cleans the PDB (uses
pdb_tidy)
- Runs PRAS to add missing heavy atoms
- Runs
pdb2pqrto calculate the protonation states of the polar hydrogens - Reads the calculated protonation states and prepares a file that will serve as input to OpenMM
- Adds all hydrogens with OpenMM and the Amber14 forcefield using the information provided in the previous step
Clone this repository:
git clone https://github.com/DeepRank/pdbpreppdbpred requires Python 3 and a bash shell to run. Ensure these are
installed. We suggest using Anaconda.
Install the following dependencies in the Python environment you wish to run
pdbpred.
- Install OpenMM:
conda install -c conda-forge openmm - Install chardet:
conda install chardet - Install pdb2pqr:
pip install pdb2pqr - Install @joaomcteixeira fork of the
Pras_Serveras follows and outside thepdbpredfolder:
# clone the fork and compile the software from the `nolog` branch
git clone https://github.com/joaomcteixeira/Pras_Server
cd Pras_Server
git fetch
git checkout nolog
cd Pras_Server_C++
g++ -std=c++17 src/*.cpp -o PRASWhy installing the fork and not the original source? Because in the fork's branch the logging operations were removed to avoid writing thousand of log files to disk. All credit about PRAS should be given to the original authors:
- https://github.com/osita-sunday-nnyigide/Pras_Server
- https://pubs.acs.org/doi/10.1021/acs.jcim.2c00571
In the pdbprep repository folder, give the necessary permissions to the *.py
and run_pdbprep.sh files:
chmod u+x *.py run_pdbprep.shThe pdb_*.py files were adapted from the pdb-tools project from Alexandre
Bonvin lab.
http://www.bonvinlab.org/pdb-tools/
Here, @joaomcteixeira modifed the scripts reducing their versatility to
improve their speed. Hence, the pdb-tools scripts provided here won't work
outside the pdbprep context. If you want to use pdb-tools for any other need,
install the official package pip install pdb-tools, and cite the original work.
Add the absolute path to the PRAS file in the run_pdbprep.sh file (edit line 4 of
run_pdbprep.sh).
From within the pdbprep folder, source the setup.sh file: source setup.sh.
You need to perform this operation every time you want to use pdbprep in a new
terminal window.
To prepare the PDB files:
- Navigate to the folder where you want the new PDBs to be saved.
- create a file with the list of paths to the input PDB files. Paths can be
relative to the current folder or absolute.
You can use
ls path/to/my/pdbs/*.pdb > pdblistto perform this operation. The file should contain lines like the following, pointing to the input PDBs:
A6/6A6I.pdb
AY/7AYE.pdb
D2/7D2T.pdb
GS/6GS2.pdb
To execute the pipeline on the list of target PDBs, run:
run_pdbprep.sh pdblist <N>
Where N is the number of threads (cores) you want to use. The multithreading
operation follows an embarrassingly parallel scheme where each thread will
take a PDB from the list and process it independently until the end.
The run_pdbprep.sh script will create a series of numbered indexed folders to
store the temporary PDBs for the different intermediate steps (0_*, 1_*,
2_*, ...). If the preparation succeeds, the temporary PDBs will be deleted and
only those in the last folder 4_ready_to_minimize will be saved. If something
goes wrong with a PDB, its intermediate temporary files won't be deleted so that
errors can be traced.
At startup, run_pdbprep.sh will delete all temporary folders (and the files
inside), keeping only the folder with the ready to minimize structures.
run_pdbpred.sh will skip those PDBs listed in the input pdblist that were
already treated and are present in the 4_ready_to_minimize folder. Therefore,
you can restart a previously halted run without needing to repeat the already
completed PDBs.
Use the evaluate_pairwise_energies.py script to calculate atom-atom energies
of a protein complex interface. Here, the interface is defined by a configurable
distance parameter (defaults to 5 Angstroms). The script has several parameters
to configure the execution and the calculation. Example:
$ python evaluate_pairwise_energies.py -h # for listing all options
$ python evaluate_pairwise_energies.py -s FILE.pdb # to run on FILE.pdbThis will output a file of the following format:
chainA resnameA resiA atomA - chainB resnameB resiB atomA LJ Coulomb (kcal/mole)
A LEU 20 HD13 - B ASN 60 OD1 -0.00803 -4.03991
A PRO 21 N - B ALA 46 HB1 -0.01046 -1.06165
A PRO 21 N - B ARG 48 HD2 -0.00725 -1.16984
(...)
Total LJ: -22.92113 (kcal/mole)
Total Coulomb: -278.20533 (kcal/mole)
In case the script can find the Python interpreter, type whereis python (or
which python) and update the python path in the shabang (1st line) of all
*.py files accordingly.
Contact us by opening a new issue.
When using pdbpred you should acknowledge the following software, follow the
links for information about how to cite: