PseudoRA : Pseudogene Re-Aligner for short-read next-generation sequencing data

1. What is PseudoRA?

PseudoRA aims to realign mismapped reads from the general short-read NGS BAM files due to highly-homologous sequences between gene-pseudogene pairs. Briefly, the software works in following steps: (1) Take all the reads from both functional and pseudogene, (2) Phase each read independently, (3) Rank each read by haplotype, (4) Remake BAM and VCF. The software outputs (1) correctly aligned BAM file with only reads belonging to the functional gene and (2) vcf file made by HaplotypeCaller(gatk). These files can be merged into your existing BAM and VCF for further analysis. Currently, the program's default is set for SBDS and SBDSP1 region, user can change these settings.

2. Installation

Just download the latest release from the GitHub repository and uncompress the tarball in a suitable directory. The tarball includes the PseudoRA script as well as the third-party software redistributed with PseudoRA (see section 6). The INSTALL files contain detailed installation instructions, including all the external libraries required to make PseudoRA run in Ubuntu.

bash </path/to/PseudoRA/>INSTALL.sh

The test_install.pl script can be run in order to check whether the required dependencies are available in your environment.

</path/to/PseudoRA/>utils/test_install.pl

3. Testing PseudoRA (DEMO)

You can test PseudoRA using following command

bash </path/to/PseudoRA/>pseudoRA.sh -t

This runs demo.bam in demo folder and output demo.correct.bam and demo.correct.hc.vcf. Visualization with IGV(Integrative Genomics Viewer) is followed.

Before PseudoRA on exon 2 of SBDS gene After PseudoRA on exon 2 of SBDS gene

4. Running scripts

PseudoRA uses hg19/GrCH37 as reference genome. If hg38 is used, user will need to make a custom library before running PseudoRA (see section 5).

The command for running PseudoRA has the following syntax:

bash </path/to/PseudoRA/>utils/pseudoRA.sh -i <input BAM>

Arguments Mandatory parameters
-i <string>: Destination of original BAM file (REQUIRED)
-r <string>: Reference FASTA (OPTIONAL) [Default:reference/chr7_b37_SBDS.fa]
-b <string>: Region of interest BED (OPTIONAL)  [Default:reference/SBDS.bed]

Output files

Corrected BAM file
HaplotypeCaller vcf file

5. Customization for other genes

Currently, PseudoRA is fit to handle only variants in SBDS and SBDSP1, but user can customize the reference FASTA to fit their needs using the code below. First, BED file including the functional and pseudogenes are needed. Make sure that the functional gene is on the first line. Python code will mask all nucleotides except for the region corresponding to the functional gene. Bwa, samtools, picard is used to index the output FASTA.

python3 </path/to/PseudoRA/>utils/customization.py -r <reference FASTA> -b <region-of-interest BED> -o <output FASTA>
java -jar </path/to/PseudoRA/>jar/picard.jar CreateSequenceDictionary R=<output FASTA> O=<output DICT>

6. License and third-party software

PseudoRA is distributed under a GPL-3 license. Additionally, SqueezeMeta redistributes the following third-party software:

7. Reference

Acknowledgements

Author of pipeline: Yu Jin Park ([email protected])

Principal Investigators: Saeam Shin and Seung-Tae Lee

Institution: Yonsei University, College of Medicine, Department of Laboratory Medicine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PseudoRA : Pseudogene Re-Aligner for short-read next-generation sequencing data

1. What is PseudoRA?

2. Installation

3. Testing PseudoRA (DEMO)

4. Running scripts

Output files

5. Customization for other genes

6. License and third-party software

7. Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
demo		demo
jar		jar
reference		reference
utils		utils
INSTALL.sh		INSTALL.sh
LICENSE		LICENSE
README.md		README.md
pseudoRA.sh		pseudoRA.sh

License

ParkYJ217/pseudoRA

Folders and files

Latest commit

History

Repository files navigation

PseudoRA : Pseudogene Re-Aligner for short-read next-generation sequencing data

1. What is PseudoRA?

2. Installation

3. Testing PseudoRA (DEMO)

4. Running scripts

Output files

5. Customization for other genes

6. License and third-party software

7. Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages