Skip to content

Compatibility between EnsDb objects and BSgenome objects #159

@mschubert

Description

@mschubert

I'm trying to use the VariantAnnotation package to annotate a VCF file from nf-core/sarek (UCSC style) using AnnotationHub EnsDb objects (Ensembl style), where I encountered the following issues:

  1. The VariantAnnotation package does not accept EnsDb objects. I raised this (Allow 'ANY' txdb, eg. EnsDb objects Bioconductor/VariantAnnotation#74), but it will probably not get fixed; I worked around this by providing my own S4 method in my code
  2. The seqlevelsStyles() do not match

For point (2), I can change the EnsDb style to UCSC:

ens106 = AnnotationHub::AnnotationHub()[["AH100643"]]
seqlevelsStyle(ens106) = "UCSC"

and then either load UCSC-style genome or change the Ensembl-style genome to UCSC:

asm = BSgenome.Hsapiens.NCBI.GRCh38::BSgenome.Hsapiens.NCBI.GRCh38
seqlevelsStyle(asm) = "UCSC"

asm = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38

However, this does not work, because the genome() of the EnsDb object is GRCh38, while the one of the assembly is hg38, raising an assertion error in VariantAnnotation. So this needs an additional line changing the internal state of the S4 genome object (we can't change the EnsDb object):

genome(ens106)[] = "hg38"
# Error: unable to find an inherited method for function 'seqinfo<-' for signature 'x = "EnsDb"'

asm@seqinfo@genome[] = "GRCh38" # works, but is messing with internals
# this was previously the default, but they explicitly changed it

I'm not sure what a good solution is here. It seems to be that a check if they genomes are identical (as performed by VariantAnnotation) is reasonable. I'm raising this issue more to document it rather than suggesting a change in ensembldb.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions