User:Guanaco96/sandbox

Definitions
Consider a family $$ \mathcal F$$ of functions $$h\colon M \to S$$. We say that $$ \mathcal F $$ is an LSH family for
 * a metric space $$\mathcal M =(M, d)$$,
 * a threshold $$R>0$$,
 * an approximation factor $$c>1$$,
 * and probabilities $$P_1 > P_2$$.

if it satisfies the following condition. For any two points $$p, q \in M$$ and a hash function $$h$$ chosen uniformly at random from $$\mathcal F$$:
 * If $$d(p,q) \le R$$, then $$h(p)=h(q)$$ (i.e., $p$ and $q$ collide) with probability at least $$P_1$$,
 * If $$d(p,q) \ge cR$$, then $$h(p)=h(q)$$ with probability at most $$P_2$$.

Such a family $$\mathcal F$$ is called $$(R,cR,P_1,P_2)$$-sensitive.

Alternatively it is defined with respect to a universe of items $U$ that have a similarity function $$\phi\colon U \times U \to [0,1]$$. An LSH scheme is a family of hash functions $H$ coupled with a probability distribution $D$ over the functions such that a function $$h \in H$$ chosen according to $D$ satisfies the property that $$Pr_{h \in H} [h(a) = h(b)] = \phi(a,b)$$ for any $$a,b \in U$$.

Locality-preserving hashing
A locality-preserving hash is a hash function $f$ that maps points in a metric space $$\mathcal M =(M, d)$$ to a scalar value such that
 * $$d(p,q) < d(q,r) \Rightarrow |f(p) - f(q)| < |f(q) - f(r)|$$

for any three points $$p,q,r \in M$$.

In other words, these are hash functions where the relative distance between the input values is preserved in the relative distance between the output hash values; input values that are closer to each other will produce output hash values that are closer to each other.

This is in contrast to cryptographic hash functions and checksums, which are designed to have random output difference between adjacent inputs.

Locality preserving hashes are related to space-filling curves.