Skip to content

Jagged kmer coverage profiles with gzipped FASTA #46

@warrenlr

Description

@warrenlr

We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.

NA12878 ONT FASTA
HG12878_FASTAlog10

NA12878 ONT FASTA GZIPPED
HG12878_GZFASTA_log10

====

NA19240 ONT FASTA
NA19240log10FASTAuncompressed

NA19240 ONT FASTA GZIPPED
NA19240log10FASTAcompressed

*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions