User:AlinaValta/sandbox

= Approximate Membership Query Filter = Approximate Membership Query Filter (AMQ-Filter) is a group of space-efficient probabilistic data structures that supports approximate membership queries. An approximate membership query determines if an element is definitely not in a set or if an element is probably not in a set. The latter has a false positive rate of $$\epsilon$$.

The most known AMQ-Filter is the Bloom filter but there are also other AMQ-Filters that support additional operations or have different space requirements.

AMQ-Filters have numerous applications, mainly in distributed systems and databases.

Approximate Membership Query Problem
The approximate membership query problem is maintaining a set of elements S in a space-efficient way. All AMQ-Filter support the operations insert and lockup. Some AMQ-Filter support additional operations like deleting elements or merging two filters.

Lockup
The lockup can determine if an element is in definitely not in the set or if an element is probably in the set:

$$s \in S$$: always returns true.

$$s \notin S$$: return false with a probability of $$1-\epsilon$$.

A false positive is a lockup of an element that is not part of the set, but the lockup returns true anyway. The probability of this happening is the false positive rate $$\epsilon$$. False negatives (the lockup returns false although the element is part of the set) are not allowed for AMQ-Filters.

Insertion
AMQ-Filter support dynamic insertions. After an element is inserted the lockup for this element must return true.

False positive rate vs. space
There is a tradeoff between storage size and the false positive rate $$\epsilon$$. Increasing the storage space reduces the false positive rate. The theoretical lower bound is $$log_2 (1/\epsilon)$$ bits for each element. Different AMQ-Filter have different ranges of false positive rates and space requirements. Choosing the best AMQ-Filter depends on the application.

Data Structures
There are different ways to solve the approximate membership query problem. The most known data structure are Bloom filters but there are other data structures that perform better for some false positive rates and space requirements, support additional operations or have other insertion and lockup time. Some well known AMQ-Filters are:

Bloom filter
A Bloom filter is a bit array of $$m$$ bits with $$k$$ hash functions. Each hash function maps an element to one of the $$m$$ positions in the array. In the beginning all bits of the array are set to zero. To insert an element all hash functions are calculated and all corresponding bits in the array are set to one. To lockup an element the $$k$$ hash functions are calculated as well. Only if all corresponding bits are set true is returned. To reduce the false positive rate the number of hash functions and $$m$$ can be increased.

Quotient filter
The ideas of Quotient filters is to hash an element and to split its fingerprint into the $$r$$ least significant bits called the remainder $$d_R$$ and the most significant bits called the quotient $$d_Q$$. The quotient determines where in the hash table the remainder is stored. Additional three bits for every slot in the hash table are used to resolve soft collisions (same quotient but different remainders). Hard collisions (same remainder and quotient) can lead to false positives.

The space used by Quotient filters is comparable to Bloom filters, but Quotient filters can be merged without affecting its false positive rate.

Cuckoo filter
Cuckoo filters are based on Cuckoo hashing but only fingerprints of the elements are stored in the hash table. Each element has two possible locations. The second location is calculated based on the first location and the fingerprint of the element. This is necessary to enable moving already inserted elements if both possible slots for an element are full.

After a reaching a load threshold the insertion speed of Cuckoo filter degrades. It is possible that an insertion fails, and the table must be rehashed. Bloom filters have always constant insertion time, but as the load factor increases the false positive rate increase as well.

A Cuckoo filter supports deleting elements if we know for certain that the element was inserted. This is an advantage over Bloom filter and Quotient filter which do not support this operation.

Application
Typical applications of AMQ-Filters are distributed systems and database systems. The AMQ-Filter functions as a proxy to the set of keys of a database or remote memory. Before a presumable slow query to the database is performed the AMQ-Filter is used to approximate if the key is in the database. The database query is only performed if the AMQ-Filter returns true. Only a false positive result of the AMQ-Filter leads to an unnecessary I/O operation or a remote access to the database. The applications are numerous and include package and resource routing, P2P Networks and distributed cashing .