@zhengxwen
I get an "incorrect dimensions" error when I try to run the seqApply function on my gds object. Here is the code I'm using:
vcf_fn <- "Variant_set.vcf.gz"
gds_fn <- str_replace(vcf_fn, ".vcf.gz", ".gds")
seqVCF2GDS(
vcf.fn = vcf_fn,
out.fn = gds_fn,
fmt.import = c("AD", "DP", "GQ")
)
seqOptimize(
gds_fn,
target = "by.sample",
verbose = TRUE
)
genofile <- seqOpen(gds_fn)
seqApply(
genofile,
"genotype",
as.is = "list",
margin = "by.sample",
FUN = print
)
The error I receive is:
Error in seqApply(genofile, "genotype", as.is = "list", margin = "by.sample", : dims [product 11112236] do not match the length of object [11163598]
but I'm not sure exactly what this means or how to troubleshoot the issue. I've tried this on two different VCF files and receive the same error, which makes me suspicious that maybe it's a bug? Other functions (e.g. seqGetData) run without issue on the same gds object. For context, these VCFs will contain a mix of biallelic and multi-allelic variants, as well as SNPs and InDels. Unfortunately, I cannot share my VCF data as it's proprietary, but any suggestions on what I should troubleshoot would be most appreciated.
I also manually calculated the length of the object and length does match the product, so I think there must be a bug in the way "length" is being calculated in seqApply.
geno_array <- seqGetData(genofile, "genotype")
apply(geno_array, 2, length)
[1] 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236
[15] 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236
[29] 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236 11112236
[43] 11112236 11112236 11112236 11112236 11112236 11112236
Edit: gds object passes dimension check with seqCheck function:
Hash check:
sample.id: ‘md5’ [OK]
variant.id: ‘md5’ [OK]
chromosome: ‘md5’ [OK]
position: ‘md5’ [OK]
allele: ‘md5’ [OK]
genotype/data: ‘md5’ [OK]
genotype/@data: ‘md5’ [OK]
phase/data: ‘md5’ [OK]
annotation/id: ‘md5’ [OK]
annotation/qual: ‘md5’ [OK]
annotation/filter: ‘md5’ [OK]
annotation/info/BaseQRankSum: ‘md5’ [OK]
annotation/info/DP: ‘md5’ [OK]
annotation/info/MQ: ‘md5’ [OK]
annotation/info/MQRankSum: ‘md5’ [OK]
annotation/info/QD: ‘md5’ [OK]
annotation/info/ReadPosRankSum: ‘md5’ [OK]
annotation/info/SOR: ‘md5’ [OK]
annotation/format/AD/data: ‘md5’ [OK]
annotation/format/AD/@data: ‘md5’ [OK]
annotation/format/DP/data: ‘md5’ [OK]
annotation/format/DP/@data: ‘md5’ [OK]
annotation/format/GQ/data: ‘md5’ [OK]
annotation/format/GQ/@data: ‘md5’ [OK]
Dimension check:
variant.id [OK]
position [OK]
chromosome [OK]
allele [OK]
annotation/id [OK]
annotation/qual [OK]
annotation/filter [OK]
annotation/info/BaseQRankSum [OK]
annotation/info/DP [OK]
annotation/info/MQ [OK]
annotation/info/MQRankSum [OK]
annotation/info/QD [OK]
annotation/info/ReadPosRankSum [OK]
annotation/info/SOR [OK]
@zhengxwen
I get an "incorrect dimensions" error when I try to run the
seqApplyfunction on my gds object. Here is the code I'm using:The error I receive is:
Error in seqApply(genofile, "genotype", as.is = "list", margin = "by.sample", : dims [product 11112236] do not match the length of object [11163598]but I'm not sure exactly what this means or how to troubleshoot the issue. I've tried this on two different VCF files and receive the same error, which makes me suspicious that maybe it's a bug? Other functions (e.g. seqGetData) run without issue on the same gds object. For context, these VCFs will contain a mix of biallelic and multi-allelic variants, as well as SNPs and InDels. Unfortunately, I cannot share my VCF data as it's proprietary, but any suggestions on what I should troubleshoot would be most appreciated.
I also manually calculated the length of the object and length does match the product, so I think there must be a bug in the way "length" is being calculated in
seqApply.Edit: gds object passes dimension check with
seqCheckfunction: