Membership Testing False Positive Rate: Count Min Sketch

Each Count-Min sketch is of size 4096 bits. We vary the depth and width of each sketch, while maintaining its size at 4096 bits, to see how the false positive rate changes. (Assuming we store each count in the count table as an integer, the size of a Count-Min sketch with depth d and width w is 32 * dw bits.)

Simulated FP Rate

Let's also plot the expected false positive rate as a black line on top of the simulated one.

Expected and Simulated FP Rate

And here's a zoomed in version:

Zoomed

False Positive Rate

Here's a derivation of the expected false positive rate, i.e., the probability that a new element is claimed as already being in the set. Let d be the depth of the Count-Min sketch (i.e., the number of hashes used), let w be the width of the sketch (so that the counts table has size d*w), and let there be n distinct elements seen so far (note that, in practice, we usually do not know the exact value of n when running a CMS).

On inserting a new element x, the probability we erroneously claim it as an existing element is:

P(false positive)
  = P(each of the d hashes maps x to a non-zero slot)
  = P(hash h_0 maps x to a non-zero slot)^d [since these are identical distributions]
  = [1 - P(no previous element has been mapped by h_0 to the same slot as x)]^d
  = [1 - (1 - 1/w)^n]^d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Membership Testing False Positive Rate: Count Min Sketch

False Positive Rate

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally