README

AnnotationHive

Annotation is the process by which pertinent information about raw DNA sequences is added to genome databases. Multiple software applications have been developed to annotate genetic variants that can be derived automatically from diverse genomes (e.g., ANNOVAR, SnpEff). The first shortcoming of the existing tools relates to downloading the software and the large build files. The second problem is scalability. Because current tools are mainly sequential or parallel only at the node level (requiring a large machine with many cores and a large main memory), annotating of large numbers of patients is tedious and takes a significant amount of time.
The pay-as-you-go model of cloud computing, which eliminates the maintenance effort required for a high performance computing (HPC) facility while simultaneously offering elastic scalability, is well suited for genomic analysis.
In this project, we developed a cloud-based annotation engine that annotates input datasets (e.g., VCF, mVCF files) in the cloud using distributed algorithms.
Version 1.0

Quickstart

Install the Google Cloud SDK, including the gcloud tool.
Setup the gcloud tool.
```
gcloud init
```
Authentication
```
gcloud auth application-default login
```

Clone this repo.

git clone https://github.com/StanfordBioinformatics/AnnotationHive.git

Install Maven.

Containerized Version

Create a container.

docker run -it annotationhive/annotationhive_public:v1.6 bash

Authentication
```
gcloud auth application-default login
```
Set your GCP project
```
gcloud config set project <PROJECT-ID>
```

Section 1: Import VCF/mVCF/Annotation Files

This section explains how to import VCF, mVCF and annotation files to BigQuery.

Section 2: List Available Public Annotation Datasets

This part of the code demonstrates how to list AnnotationHive's public datasets.

Section 3: Variant-based Annotation

This section explains how to annotate a VCF/mVCF table against any number of variant-based annotation datasets.

Section 4: Interval-based Annotation

This section explains how to annotate a VCF/mVCF table against any number of interval-based annotation datasets.

Section 5: Variant-based and Interval-based Annotation

This section explains how to run a combination of interval-based and variant-based annotation datasets.

Section 6: Gene-based Annotation

This section demonstrates how to run our gene-based annotation process for a VCF/mVCF table.

Section 7: Sample Experiments

This section presents several experiments on scalability and the cost of the system.

Section 8: Export Annotated VCF Table

This section explains how to export an annotated VCF file.

Section 9: Annotate a Small Number of Variants or Regions

This section explains how to annotate a small number of regions/variants.

Section 10: Import Private Annotation Datasets

This section explains how to import private annotation datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

README

AnnotationHive

Quickstart

Containerized Version

Section 1: Import VCF/mVCF/Annotation Files

Section 2: List Available Public Annotation Datasets

Section 3: Variant-based Annotation

Section 4: Interval-based Annotation

Section 5: Variant-based and Interval-based Annotation

Section 6: Gene-based Annotation

Section 7: Sample Experiments

Section 8: Export Annotated VCF Table

Section 9: Annotate a Small Number of Variants or Regions

Section 10: Import Private Annotation Datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

README

AnnotationHive

Quickstart

Containerized Version

Section 1: Import VCF/mVCF/Annotation Files

Section 2: List Available Public Annotation Datasets

Section 3: Variant-based Annotation

Section 4: Interval-based Annotation

Section 5: Variant-based and Interval-based Annotation

Section 6: Gene-based Annotation

Section 7: Sample Experiments

Section 8: Export Annotated VCF Table

Section 9: Annotate a Small Number of Variants or Regions

Section 10: Import Private Annotation Datasets