Talk:Universal hashing

The two articles on universal hashing should clearly be merged. Any volunteer? Pagh 11:23, 24 April 2007 (UTC)

I don't see two articles on universal hashing, just this one. One might merge this universal_hashing article into the main article hash_function, but since the present article is reasonably coherent and self-contained, I don't think that's urgent. I have removed the Unreferenced tag at the beginning of this universal_hashing article since, in its present form, the article does indeed cite the main basic references on universal hashing.CharlesHBennett (talk) 11:17, 23 September 2008 (UTC)

In the Examples section, the symbols H and h are used without having been defined or introduced. I think h is supposed to refer to the function called f in the definitions (a clue: f is defined but never used), and H would be the family of all such h. But I'm not editing the page because I don't know enough about the subject to be confident that my interpretation is correct. 217.109.185.32 (talk) 16:25, 7 December 2009 (UTC)

The section on avoiding modular arithmetic was heavily edited. The previous discussion suggested that the function h_a(x) had a collision probability of 1/m; it doesn't. It has a collision probability as high as 2/m. A reference was added to the multiply-add-shift scheme, which does have collision probability 1/m. Finally, the discussion of hashing with the uniform difference property was removed, since the same issue with the analysis of h_a(x) applies to this scheme. At best, it has some almost-uniform difference property. (Patmorin) Tue May 17 10:53:11 EDT 2011 —Preceding undated comment added 14:53, 17 May 2011 (UTC).

multiply-shift
Something doesn't seem right with one equation for "the multiply-shift scheme described by Dietzfelbinger et al. in 1997.", which currently states:


 * $$h_{a,b}(x) = ((ax + b) \bmod 2^w)\, \mathrm{div}\, 2^{w-M}$$
 * $$h_{a,b}(x) = $$
 * ... where ... $$b$$ is a random non-negative integer with $$b < 2^{w-M}$$.

The first step is to multiply $$a*x$$, right? Let's define T(ax) as the top (w-M) bits of the bottom w bits of a*x -- i.e.,
 * $$T(ax) = ((ax) \bmod 2^w)\, \mathrm{div}\, 2^{w-M}$$
 * $$T(ax) = $$

For example, I'm using a machine where the word size is w=32 bits, and I'm trying to hash to a table of 1024 entries so M=10. When I think I'm getting too many collisions, I pick some random unsigned integer b < 2^(32-10), or in other words, b < 2^(22), and rehash -- right?

Each time I add the 22-bit integer b to that intermediate value $$a*x$$, it directly affects the bottom 22 bits of the result and may or may not cause a carry into the higher bits. But then the hash function immediately discards those bottom 22 bits, and so I am left with either $$h_{a,b}(x) = T(ax)$$ or $$h_{a,b}(x) = T(ax) + 1$$.

So with this equation, the value of b has very little effect on the final hash value.

I expected the hash value to depend more strongly on the value of b, such that given any particular value of a and x, which leads to some particular value of T(ax), the output hash value would ideally be any possible value 0 <= h < m.

Should that equation be something maybe more like the following?:
 * $$h_{a,b}(x) = ((a(x + b)) \bmod 2^w)\, \mathrm{div}\, 2^{w-M}$$
 * $$h_{a,b}(x) = $$

--DavidCary (talk) 03:28, 26 March 2015 (UTC)

uniform distance property
The term uniform distance property is used, but not explained. If it is the same as uniform difference property, then it should be changed to match. Cebus (talk) 08:42, 31 March 2016 (UTC)

Universe for hash functions
In the introduction we define "A family of functions $$H = \{ h : U \to [m] \}$$", however in the "Constructions" section, we "Let the universe to be hashed be $$U = \{0, \dots, m-1\}$$". This is clearly problematic, since if the universe is not larger than the space we are hashing to, we could just use the identity function. The consequence is that the section is harder to read, because statements like $$p >> m$$ could either mean "larger than the domain" or "larger than the range". --Thomasda (talk) 02:23, 30 March 2017 (UTC)

Construction section incorrect
The sentence "for some integer {\displaystyle i} i between {\displaystyle 0} {\displaystyle 0} and {\displaystyle (p-1)/m} (p-1)/m." is incorrect. It is true that i can take on that many values, but i can be negative as well. — Preceding unsigned comment added by Johnmichaelwu (talk • contribs) 19:31, 23 April 2019 (UTC)