Skip to content

seqApply results in dims error #98

@gnxsf

Description

@gnxsf

@zhengxwen
I get an "incorrect dimensions" error when I try to run the seqApply function on my gds object. Here is the code I'm using:

vcf_fn <- "Variant_set.vcf.gz"
gds_fn <- str_replace(vcf_fn, ".vcf.gz", ".gds")

seqVCF2GDS(
  vcf.fn = vcf_fn,
  out.fn = gds_fn,
  fmt.import = c("AD", "DP", "GQ")
)

seqOptimize(
  gds_fn,
  target = "by.sample",
  verbose = TRUE
)

genofile <- seqOpen(gds_fn)

seqApply(
  genofile,
  "genotype",
  as.is = "list",
  margin = "by.sample",
  FUN = print
)

The error I receive is:

Error in seqApply(genofile, "genotype", as.is = "list", margin = "by.sample", : dims [product 11112236] do not match the length of object [11163598]

but I'm not sure exactly what this means or how to troubleshoot the issue. I've tried this on two different VCF files and receive the same error, which makes me suspicious that maybe it's a bug? Other functions (e.g. seqGetData) run without issue on the same gds object. For context, these VCFs will contain a mix of biallelic and multi-allelic variants, as well as SNPs and InDels. Unfortunately, I cannot share my VCF data as it's proprietary, but any suggestions on what I should troubleshoot would be most appreciated.

I also manually calculated the length of the object and length does match the product, so I think there must be a bug in the way "length" is being calculated in seqApply.

geno_array <- seqGetData(genofile, "genotype")
apply(geno_array, 2, length)
 [1] 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236
[15] 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236
[29] 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236
[43] 11112236 11112236 11112236 11112236 11112236 11112236

Edit: gds object passes dimension check with seqCheck function:

Hash check:
    sample.id:  ‘md5’ [OK]
    variant.id: ‘md5’ [OK]
    chromosome: ‘md5’ [OK]
    position:   ‘md5’ [OK]
    allele:     ‘md5’ [OK]
    genotype/data:      ‘md5’ [OK]
    genotype/@data:     ‘md5’ [OK]
    phase/data: ‘md5’ [OK]
    annotation/id:      ‘md5’ [OK]
    annotation/qual:    ‘md5’ [OK]
    annotation/filter:  ‘md5’ [OK]
    annotation/info/BaseQRankSum:       ‘md5’ [OK]
    annotation/info/DP: ‘md5’ [OK]
    annotation/info/MQ: ‘md5’ [OK]
    annotation/info/MQRankSum:  ‘md5’ [OK]
    annotation/info/QD: ‘md5’ [OK]
    annotation/info/ReadPosRankSum:     ‘md5’ [OK]
    annotation/info/SOR:        ‘md5’ [OK]
    annotation/format/AD/data:  ‘md5’ [OK]
    annotation/format/AD/@data: ‘md5’ [OK]
    annotation/format/DP/data:  ‘md5’ [OK]
    annotation/format/DP/@data: ‘md5’ [OK]
    annotation/format/GQ/data:  ‘md5’ [OK]
    annotation/format/GQ/@data: ‘md5’ [OK]
Dimension check:
    variant.id  [OK]
    position    [OK]
    chromosome  [OK]
    allele      [OK]
    annotation/id       [OK]
    annotation/qual     [OK]
    annotation/filter   [OK]
    annotation/info/BaseQRankSum        [OK]
    annotation/info/DP  [OK]
    annotation/info/MQ  [OK]
    annotation/info/MQRankSum   [OK]
    annotation/info/QD  [OK]
    annotation/info/ReadPosRankSum      [OK]
    annotation/info/SOR [OK]

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions