Conceived by Burton Howard Bloom in 1970, a Bloom filter is a
probabilistic data structure used to test whether a member is an element
of a set. Bloom filters have a strong space advantage over other data
structures such as a Java
Set, in that
each element uses the same amount of space, no matter its actual size. For
example, a string of 32 characters takes up the same amount of memory in a
Bloom filter as a string of 1024 characters, which is drastically
different than other data structures. Bloom filters are introduced as part
of a pattern in Bloom Filtering.
While the data structure itself has vast memory advantages, it is not always 100% accurate. While false positives are possible, false negatives are not. This means the result of each test is either a definitive “no” or “maybe.” You will never get a definitive “yes.” With a traditional Bloom filter, elements can be added to the set, but not removed. There are a number of Bloom filter implementations that address this limitation, such as a Counting Bloom Filter, but they typically require more memory. As more elements are added to the set, the probability of false positives increases. Bloom filters cannot be resized like other data structures. Once they have been sized and trained, they cannot be reverse-engineered to achieve the original set nor resized and still maintain the same data set representation.
The following variables are used in the more detailed explanation of a Bloom filter below: ...