A comprehensive method to analyze and identify structurally similar RNA motif families
- Md Mahfuzur Rahaman†, mahfuzur at knights dot ucf dot edu
- Nabila Shahnaz Khan†, nabilakhan at knights dot ucf dot edu
- Shaojie Zhang*†, shzhang at cs dot ucf dot edu
†Department Computer Science, University of Central Florida, Orlando, FL 32816 USA
*To whom correspondence should be addressed.
RNAMotifComp is a method to generate similarity graph from RNA structural motifs using either RNAMotifScanX or RNA-align alignment tool. It is developed and tested in a 64-bit Linux machine. For the basic features, only python (3.x recommended) is required to be installed on the computer with 64-bit Linux environment. The preliminary files to run this code is included here.
All recent Linux systems normally come with python installed by default. If not, please install python and pip before proceeding to the next step.
It is required to install several python libraries to run RNAMotifComp. These libraries are included in the requirements.txt file. To install all required python libraries, please navigate to the RNAMotifComp home directory in the terminal and execute the following command.
pip install -r requirements.txt
Motif Family Similarity Analysis tool takes input from any text file (e.g. ‘*.in’) which needs to be in the RNAMotifFamilySimilarity/data/ directory or any subdirectory inside. Each line in that file represents a motif family. Each line starts with the motif family name, followed by a comma-separated list of motifs (the indices for motifs are PDB index, but FASTA index can also be used by setting a parameter in the configuration file). Please check the sample input file (sample1.in) provided in the data directory to look into the formatting of input data in detail.
usage: python3 run.py [-h] -i I [-r [R]][-p [P]][-t [T]]
I - <input_file_name> [Input file name under the data directory (Required). e.g.: sample1.in]
R - <rmsd_threshold> [RMSD threshold to consider as similar. e.g.: 1.0]
P - <participating_motif_instance_threshold> [Percentage of participating motif instances threshold. e.g.: 20.0]
T - <alignment_tool> [Alignment tool to be used, default - ScanX, options: ScanX / RNAalign]
Examples:
To generate a similarity graph along with the text files containing similar motif instances from sample1.in using default alignment tool (RNAMotifScanX), use the following command:
python3 run.py -i sample1.in
To generate a similarity graph along with the text files containing similar motif instances from sample1.in using ‘RNAalign’ alignment tool, use the following command:
python3 run.py -i sample1.in -t RNAalign
To generate a similarity graph along with the text files containing similar motif instances from sample1.in using ‘ScanX’ alignment tool while setting the RMSD threshold as 2.0 and participating motif instances threshold as 10%, use the following command:
python3 run.py -i sample1.in -t ScanX -r 2.0 -p 10.0
We provided pre-generated data for 360 internal loop motifs from 11 families. For any new dataset, it will automatically download and/or generate required data files (e.g. *.cif, *.fasta, *.aln, etc.) which might take some time. Please make sure to provide valid (not obsolete) PDB number in the input data.
A subdirectory will be created inside the ‘output’ directory using the name of the alignment tool used. Inside that subdirectory there will be an image file (‘*.png’) that represents the generated graph after analyzing the similarities among the input motif families. There will also be a couple of text files (‘*.csv’), each of which represents the similar motif instances for a motif family pair.
Motif Family Similarity Analysis is developed for an NIH funded project (R01GM102515).
The author of the source code is Md Mahfuzur Rahaman. The benchmarking using ML part is authored by Nabila Shahnaz Khan. For bug reports or comments please contact mahfuz@ucf.edu.