Skip to content

Extract species specific orthogroups from Orthofinder results

Notifications You must be signed in to change notification settings

JWDebler/orthofinder_extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

orthofinder_extractor

A Python3 script that parses the Orthogroups.tsv output file created by Orthofinder and extracts species specific orthogroups (orthogroups that only appear in a single species tested).

Prerequisits

  • Python3
  • Orthogroups.tsv (created by Orthofinder in the Orthogroups result directory)
  • Fasta files containing protein sequences which were used to run Orthofinder

Usage

usage:

extract_species_specific_orthogroups.py [-h] [-p PREFIX [PREFIX ...]] [-i INPUT] [-o OUTPUT] [-f FASTA]

-p: a list of protein prefixes used in the fasta files to distinguish species
-i: path to the Orthogroups.tsv file
-o: path to store extracted Orthogroups
-f: path to a directory containing the fasta files

Usage example 1

Expects Orthogroups.tsv and all FASTA files to be in the same directory.
FASTA files look like this:

file1.fasta:

>Alen_Al4_ctg00.g1.t1
MPTGDKLIEIKYSDAVHKFSNWWIE...
...

file2.fasta:

>Arab_Me14_ctg00_-_Arab_Me14_ctg00.g1.t1
MLHQLDRIVIDECHVLLELTQDWRP...
...

Command:

extract_species_specific_orthogroups.py -p Alen Arab

This will parse your Orthogroups.tsv and look for orthogroups that only contain proteins starting with Alen or Arab and then use the provided fasta files to extract those orthogroups into separate fasta files for each orthogroups.

Usage example 2

  • provide path to Orthogroups.tsv
  • provide path to FASTA files
  • provide path to output directory
extract_species_specific_orthogroups.py -p Alen Arab -i /path/to/Orthogroups.tsv -f /path/to/directory/containing/fastas/ -o /path/to/output/directory/
  • uses Alen and Arab as prefixes to look for in the Orthogroups.tsv file
  • uses the Orthogroups.tsv file located at /path/to/Orthogroups.tsv
  • uses FASTA files in /path/to/directory/containing/fastas/
  • saves output files to /path/to/output/directory

About

Extract species specific orthogroups from Orthofinder results

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages