A tool to model the evolution and structural impact of alternative splicing
Status | Linux, OSX | Windows |
---|---|---|
PhyloSofS (Phylogenies of Splicing isoforms Structures) is a
fully automated computational tool that infers plausible evolutionary scenarios
explaining a set of transcripts observed in several species and models the
three-dimensional structures of the produced protein isoforms.
The phylogenetic reconstruction algorithm relies on a combinatorial approach
and the maximum parsimony principle. The generation of the isoforms' 3D models
is performed using comparative modeling.
PhyloSofS was applied to the c-Jun N-terminal kinase (JNK) family (60 transcripts in 7 species). It enabled to date the appearance of an alternative splicing event (ASE) resulting in substrate affinity modulation in the ancestor common to mammals, amphibians and fishes, and to identify key residues responsible for such modulation. It also highlighted a new ASE inducing a large deletion, yet conserved across several species. The resulting isoform is stable in solution and could play a role in the cell. More details about this case study, together with the algorithm description, can be found in the PhyloSofS' preprint available at bioRxiv.
You can clone this PhyloSofS package using git
:
git clone https://github.com/PhyloSofS-Team/PhyloSofS.git
Then, you can access the cloned PhyloSofS
folder and install the package
using Python 3's pip
:
cd PhyloSofS
python -m pip install .
To run the phylogenetic module of PhyloSofS, you need to have
Graphviz installed.
The easiest way to install Graphviz in...
- Debian/Ubuntu is:
sudo apt-get install graphviz
- Windows is using Chocolatey:
choco install graphviz
- macOS is using Homebrew:
brew install graphviz
The molecular modelling pipeline depends on Julia, HH-suite3 and MODELLER. This module can only run on Unix systems (because of the HH-suite). To alleviate that, we offer a Docker image with all these dependencies installed (see the Docker section for more details).
You can download Julia 1.1.1 binaries from its site.
Some BioJulia packages can need LibZ to precompile. If you found a related
error, you can install LibZ from its site.
In Ubuntu 18.04 you can install it by doing: sudo apt-get install zlib1g-dev
Clone our HH-suite fork at AntoineLabeeuw/hh-suite and follow the Compilation instructions in its README.md file.
PhyloSofS needs MODELLER version 9.21. Follow the instructions in the MODELLER site to install it and get the license key.
To run the molecular modelling module you need the HH-suite databases:
- Sequence database:
uniclust30_yyyy_mm_hhsuite.tar.gz
(we have tested PhyloSofS using20180_08
asyyyy_mm
) - Structural database:
pdb70_from_mmcif_latest.tar.gz
The needed mmCIF PDB files for MODELLER are downloaded on demand, if there are not present, in an indicated folder.
To set up the databases, you can use the script setup_databases
(recommended).
Alternatively, a manual installation can be performed following the instructions
in docs/get_databases.md.
The setup_databases
downloads and decompress the needed databases. It creates
the following folder structure that can be easily used by PhyloSofS with the
--databases
argument:
databases
├── pdb
├── pdb70
└── uniclust
You can do setup_databases -h
to know more about the script and its arguments.
You can directly use PhyloSofS via Docker without cloning this GitHub repository. To run PhyloSofS' Docker image you need to install Docker following these instructions.
The following example is going to run PhyloSofS' Docker image using
Windows PowerShell. Databases for the molecular modelling module stored in
D:\databases
are going to be mounted in /databases
and the local directory
in /project
. The actual folder is ${PWD}
in Windows PowerShell, %cd%
in
Windows Command Line (cmd), and $(pwd)
in Unix.
docker run -ti --rm --mount type=bind,source=d:\databases,target=/databases --mount type=bind,source=${PWD},target=/project diegozea/phylosofs
After this, we have access to the bash
terminal of an Ubuntu 18.04 image
with PhyloSofS and all its dependencies installed. You only need to indicate
your MODELLER license key to
use PhyloSofS. To do that, you run the following command after replacing
license_key
with your MODELLER license key:
sed -i 's/xxx/license_key/' /usr/lib/modeller9.21/modlib/modeller/config.py
After installing Docker CE following these instructions, you can create a folder to work with the app, e.g.:
mkdir phylosofs
And then go into that folder and run the PhyloSofS Docker image bind-mounting
the local folder into /project
:
cd phylosofs
sudo docker run -ti --rm --mount type=bind,source=$(pwd),target=/project diegozea/phylosofs
This starts a bash console with PhyloSofS and all its dependencies installed. The sources are taken from diegozea/phylosofs
.
First, change xxx
by your MODELLER license key using sed
as indicated in the banner.
Then, you can use the setup_databases
script the first time to install the needed
databases into the project folder. The databases are going to need some time to
download and decompress depending on your internet connection and disk speed.
You need almost 129 Gb in your disk before download and decompress them:
setup_databases
This has created a databases
folder in /project
(and therefore in the phylosofs
folder of your system) with the needed sequence
and structure databases for the homology modelling step.
To test the molecular modelling suite, we are going to create an example input
pir file in a GeneName
folder. PhyloSofS is going to look for transcripts.pir
files
in the indicated folder and its subfolders:
mkdir GeneName
echo ">P1;gene transcript ABCDE" >> ./GeneName/transcripts.pir
echo "AAAAAABBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCDDDDDDDDDEEEEEEEEE" >> ./GeneName/transcripts.pir
echo "ACTNEFCASCPTFLRMDNNAAAIKALELYKINAKLCDPHPSKKGASSAYLENSKGAPNNS*" >> ./GeneName/transcripts.pir
Where the pir annotation below the id is used to indicate the exon (A
, B
, C
...)
to which belong each residue of the protein isoform.
phylosofs -M -i GeneName --databases databases
If you are using the PhyloSofS' Docker image, you must know that errors can
occur when very large files are being written to bind-mounted NTFS file
systems. This happens particularly when setup_databases
is run because it
tries to download and decompress large files. To avoid this problem, you can
install PhyloSofS on Windows and run setup_databases.exe
to set up the
databases before using the docker image.
You can run phylosofs -h
to see the help and the list of arguments.
phylosofs -P -s 100 --tree path_to_newick_tree --transcripts path_to_transcripts
If databases where installed using setup_databases
and the HH-Suite3
scripts and programs are in the executable paths, then you can run:
phylosofs -M -i path_to_input_dir --databases path_to_databases_folder
PhyloSofS is going to look for transcripts.pir
files in the folder and
sub-folders of path_to_input_dir
to perform the homology modelling of each
sequence in those files.
If you have a more manual installation of the databases and/or the HH-Suite3 scripts and programs are not in the path:
phylosofs -M -i path_to_input_dir --hhlib path_to_hhsuite_folder --hhdb path_to_uniclust_database/uniclust_basename --structdb path_to_pdb70/pdb70 --allpdb path_mmcif_pdb_cache_folder
Please note that for the databases --hhdb
and --structdb
, you need to
provide the path to the folder and also the basename of the files in it.
For example, if the database uniclust30_2018_08
is located in /home
, you
need to write:
--hhdb /home/uniclust30_2018_08/uniclust30_2018_08
You can also find useful the arguments:
--ncpu number_of_cpu
--julia path_to_julia_executable
If you installed the databases using setup_databases
in a folder that you
have bind-mounted to /databases
, then you only need to run:
phylosofs -M --databases /databases
Because the Docker image has HH-Suite3 installed with its programs and scripts in the executable paths.
The PhyloSofS package has been developed under the MIT License.
For questions, comments or suggestions feel free to contact Elodie Laine or Hugues Richard