Skip to content

Test k-mer frequency distribution idea #21

@dkoslicki

Description

@dkoslicki

Goal of this is to create a fast way to re-create k-mer count distributions (similar to what is trying to be done in publications like this and references therein).

Current code base already keeps track of sketch k-mer counts (see here, and here).

Project would be:

  • compare histogram of sketch k-mer counts with actual k-mer count histogram
  • implement a metric to compare the difference between the distributions (probably the Total Variation Metric, Wasserstein metric, or a simple L1 metric.
  • compare change in these metrics as sketch sizes is increased.
  • compare to existing methods that re-create k-mer count distributions (eg. this one and the methods it compares to).

Optional:

This would be sufficient for a conference paper.

For a journal paper would need to:

  • characterize/prove the convergence between the true and estimated distributions as a function of sketch size. (not too difficult, but would take a bit of probability work)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions