-
Notifications
You must be signed in to change notification settings - Fork 10
Description
I'm trying to use the VariantAnnotation package to annotate a VCF file from nf-core/sarek (UCSC style) using AnnotationHub EnsDb
objects (Ensembl style), where I encountered the following issues:
- The VariantAnnotation package does not accept
EnsDb
objects. I raised this (Allow 'ANY' txdb, eg. EnsDb objects Bioconductor/VariantAnnotation#74), but it will probably not get fixed; I worked around this by providing my own S4 method in my code - The
seqlevelsStyles()
do not match
For point (2), I can change the EnsDb
style to UCSC
:
ens106 = AnnotationHub::AnnotationHub()[["AH100643"]]
seqlevelsStyle(ens106) = "UCSC"
and then either load UCSC-style genome or change the Ensembl-style genome to UCSC:
asm = BSgenome.Hsapiens.NCBI.GRCh38::BSgenome.Hsapiens.NCBI.GRCh38
seqlevelsStyle(asm) = "UCSC"
asm = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
However, this does not work, because the genome()
of the EnsDb
object is GRCh38
, while the one of the assembly is hg38
, raising an assertion error in VariantAnnotation. So this needs an additional line changing the internal state of the S4 genome object (we can't change the EnsDb
object):
genome(ens106)[] = "hg38"
# Error: unable to find an inherited method for function 'seqinfo<-' for signature 'x = "EnsDb"'
asm@seqinfo@genome[] = "GRCh38" # works, but is messing with internals
# this was previously the default, but they explicitly changed it
I'm not sure what a good solution is here. It seems to be that a check if they genomes are identical (as performed by VariantAnnotation) is reasonable. I'm raising this issue more to document it rather than suggesting a change in ensembldb.