Wikipedia:Reference desk/Archives/Mathematics/2016 October 27

= October 27 =

Hash functions based on quadratic residues
Initially, I wrote this simple Javascript implementation (you can download the raw HTML file here) as an exercise in creating a hash function based on the Quadratic residuosity problem. The algorithm uses several large safe primes (including the modulus) on the order of roughly ten thousand digits and generates hashes of comparable size which seem to be of the highest quality. At first I was convinced that the fact that the primes in question are "exposed" (and thus known variables) shouldn't matter, given their size (Quadratic_residue seems to support that conclusion), but now I'm not so sure. Was my assumption correct, or am I missing something here? Earl of Arundel (talk) 01:11, 27 October 2016 (UTC)
 * Perhaps there's something I'm not understanding, but if the modulus is prime then the quadratic residuosity problem would not apply since it depends on the modulus being composite with unknown factors. The section you pointed to in the quadratic residue article talks about finding the actual square root once you know you have a residue, a different problem. The article suggests that there are algorithms which are efficient in practice that find square roots for prime moduli, and from there you could get square roots for moduli whose factorization is known. In fact it seems like pretty much everything depends on the factors of the modulus being unknown. --RDBury (talk) 12:43, 27 October 2016 (UTC)
 * Yes, I was afraid that might be the case! Fortunately, I think there is a very simple solution. The basic idea is this: generate two hashes using a pair of distinct safe-prime triplets and then just add them together. That should be more than sufficient, considering that it is unfeasible to deduce which two addends have been used to produce any given sum (provided, of course, that the numbers are large enough to preclude a brute-force search). I've updated the implementation to further demonstrate the concept here (also at pastebin). Earl of Arundel (talk) 16:50, 27 October 2016 (UTC)
 * My expertise on this is limited at best, but I think the idea of of using a well-known problem as the basis for encryption is that you know already that many people have attempted to find an easy solution and failed. In other words, the problem has already beaten the best brains around so you're pretty certain that some random hacker isn't going to be able to do it. This isn't the ideal situation since we'd like to say the problem is provably difficult, but short of solving the P versus NP problem or something similar this is the best we can hope for. So yes, you can make the hash more complicated, which would seem to make it more secure, but in essence you're creating a new problem which no one has tried to solve yet and there may be an clever but easy solution for it which someone may eventually find. --RDBury (talk) 17:26, 29 October 2016 (UTC)
 * Yes, that's a reasonable concern, but it isn't really a issue in this particular case. For one thing, understand that this isn't exactly encryption, per se. That may seem like a pedantic point at first, but the important distinction is that encryption *requires* reversibility. What we're aiming for, rather, is an irreversible pseudorandom number. It's not difficult to prove that we have that here either. The final step in the calculation reduces to the equation H = S1 + S2. Now only the value of H is known, so we have to solve for either S1 or S2. But the solution set there is actually larger than H. Example: H = 10, so the set of numbers {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} all satisfy the equation. In that respect, the solutions for S1 and S2 taken together simply represent all of the integral points of a two-dimensional plane. Still, the sum may be deduced if there does exist some sort of correlation between S1 and S2. But there isn't. First of all, the two distinct moduli that we're working with individually form a "multiplicative group of integers modulo N". Strictly speaking, this means there isn't even the slightest notional relationship between S1 and S2. Of course, if we were doing just straight multiplication using our "seed" values, adding or subtracting some constant to our seed *would* create a casual relationship to the effect that the difference between S1 and S2 would be always be the same. However, in this case the seed is instead use to form two separate quadratic residues. This effectively removes any such relationship, which helps to explain why hashing the letter "b" (binary: 01100010) and "c" (binary: 01100011) results in two very different hashes, even though we've just changed a single bit of an eight bit seed. As far as collision-resistance goes, this algorithm also performs exceedingly well. Unlike a finite field, which requires the use of a primitive root to generate the full set of numbers below N, multiplicative groups are guaranteed to do so. So each partial result is for that very reason form a perfect hash. Of course, truncating the value from ten-thousand digits to say, four, will naturally induce collisions, but fortunately the number will be proportional to the logarithm of the number of digits, so a thirty-two digit number will indeed produce roughly twice as many collisions as one of just four digits, but in terms of collisions per range the actual ratio is in fact infinitesimal in the former case with comparison to the latter. Even so, the sum of two perfect hashes is by no means guaranteed to be perfect. That said, due to the lack of correlation we *can* say that the statistical probability is still theoretically ideal. That is, it should perform only as about as badly as a truly random hash, which turns out to not so bad after all. Some other benefits to using this algorithm are that it's quite scalable, the equation so simple that it ports well to different programming languages, and it's reasonably efficient even for pretty large numbers. In fact, there are some circumstances where you might actually want to tweak the moduli parameters to produce a much more "expensive" hash. If you're the owner of a website it could effectively prevent an attacker from producing numerous brute-force guesses to your users' passwords quickly enough to be useful. Which brings me to one final point. The majority of crypto-hashes in use today fail in just about every one of the above categories. Most are suspected to be insecure to one degree or another (some being outright reversible), poorly collision-resistant, and often lacking more than superficial bit-propagation. The main reason for that is that they all basically boil down to bit-twiddling routines. That's a somewhat of an over-simplification, I know, but it captures the essence of the problem really. And that's precisely why I've chosen to put this one together. Just straight-forward math, no tricky-dicky assumptions. Earl of Arundel (talk) 18:24, 31 October 2016 (UTC)