From 83694e78b5ae825601bd17da1da4e151de6d73a3 Mon Sep 17 00:00:00 2001
From: Gary Gregory
Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes. - * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware). + * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malicious software). * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer * operation.
* - *Some Bloom filters (e.g. CountingBloomFilter) use counters rather than bits. In this case each counter + *
Some Bloom filters (e.g. {@link CountingBloomFilter}) use counters rather than bits. In this case each counter * is called a {@code cell}.
* *There is an obvious association between the BitMap and the Index, as defined above, in that if bit 5 is enabled in the - * BitMap than the Index must contain the value 5.
+ *There is an obvious association between the bit map and the Index, as defined above, in that if bit 5 is enabled in the + * bit map than the Index must contain the value 5.
* * *The architecture is designed so that the implementation of the storage of bits is abstracted. Rather than specifying a - * specific state representation we require that all Bloom filters implement the BitMapExtractor and IndexExtractor interfaces, - * Counting-based Bloom filters implement {@code CellExtractor} as well. There are static + * specific state representation we require that all Bloom filters implement the {@link BitMapExtractor} and {@link IndexExtractor} interfaces, + * Counting-based Bloom filters implement {@link CellExtractor} as well. There are static * methods in the various Extractor interfaces to convert from one type to another.
* - *Programs that utilize the Bloom filters may use the {@code BitMapExtractor} or {@code IndexExtractor} to retrieve + *
Programs that utilize the Bloom filters may use the {@link BitMapExtractor} or {@link IndexExtractor} to retrieve * or process a representation of the internal structure. - * Additional methods are available in the {@code BitMaps} class to assist in manipulation of BitMap representations.
+ * Additional methods are available in the {@link BitMaps} class to assist in manipulation of bit map representations. * *The Bloom filter is an interface that requires implementation of 9 methods:
*The counting Bloom filter extends the Bloom filter by counting the number of times a specific bit has been + *
The {@link CountingBloomFilter} extends the Bloom filter by counting the number of times a specific bit has been * enabled or disabled. This allows the removal (opposite of merge) of Bloom filters at the expense of additional * overhead.
* *The layered Bloom filter extends the Bloom filter by creating layers of Bloom filters that can be queried as a single + *
The {@link LayeredBloomFilter} extends the Bloom filter by creating layers of Bloom filters that can be queried as a single * Filter or as a set of filters. This adds the ability to perform windowing on streams of data.
* *The Shape describes the Bloom filter using the number of bits and the number of hash functions. It can be specified + *
The {@link Shape} describes the Bloom filter using the number of bits and the number of hash functions. It can be specified * by the number of expected items and desired false positive rate.
* *A Hasher converts bytes into a series of integers based on a Shape. Each hasher represents one item being added + *
A {@link Hasher} converts bytes into a series of integers based on a Shape. Each hasher represents one item being added * to the Bloom filter.
* - *The EnhancedDoubleHasher uses a combinatorial generation technique to create the integers. It is easily - * initialized by using a byte array returned by the standard {@code MessageDigest} or other hash function to - * initialize the Hasher. Alternatively a pair of a long values may also be used.
+ *The {@link EnhancedDoubleHasher} uses a combinatorial generation technique to create the integers. It is easily + * initialized by using a byte array returned by the standard {@link java.security.MessageDigest} or other hash function to + * initialize the Hasher. Alternatively, a pair of a long values may also be used.
* - *Other implementations of the Hasher are easy to implement, and should make use of the {@code Hasher.Filter} + *
Other implementations of the {@link Hasher} are easy to implement, and should make use of the {@code Hasher.Filter} * and/or {@code Hasher.FileredIntConsumer} classes to filter out duplicate indices when implementing * {@code Hasher.uniqueIndices(Shape)}.
*