Talk:Fingerprint (computing)

please add any example

Hash function
How is this different than a hash function? — sligocki (talk) 18:08, 12 November 2009 (UTC)
 * "Hash function" is more ambiguous, or more general, depending on what you mean with it. The hash function article mentions that. The difference also leads to different criteria for choices of good functions.
 * It's not unusual for concepts that are techincally just as similar to have several words for specific cases, and several articles. See e.g. the articles about Light and X-ray.
 * HFuruseth (talk) 18:29, 6 February 2012 (UTC)

What.
So basically, this article can be summarized by #REDIRECT Hash function ? I won't blank the page entirely, but I'm very skeptical that this deserves an independent article. "Overlapping to some extent" (as written now) is an understatement. .froth. (talk) 22:09, 24 January 2011 (UTC)


 * Is subject of the article distinct from a Cryptographic hash function?
 * The topic seems to be defined as Cryptographic hash function that may or may not have Rolling hash property.
 * Is there a nuance that me (and commenters above) are missing? Pinging @Jorge Stolfi. --PaulT2022 (talk) 04:50, 7 January 2023 (UTC)

Virtual Uniqueness
The phrase "virtual uniqueness" is an oxymoron. A thing is unique or it is not. What you are trying to say is that for a given hash function the likelihood of collision is less than a maximum upper bound.

Calling this "virtual uniqueness" is misleading since it is not unique and without knowing the upper bound. one can not even say it is unlikely. Shrugging your shoulders and saying the equivalent of "eh, good enough" is not mathematical rigor.

Also, the only difference between a cryptographic hash and an otherwise ordinary hash is that a crypto hash is hard to reverse, that is, given an output calculating the corresponding input is sufficiently difficult that the time and resources involved preclude it from being useful to an adversary. This is not of any benefit in data reduction, fingerprinting.

What is needed to make data reduction via hash functions viable, crypto or otherwise, is mathematical proof with the result that the likelihood of output collision is less than an upper bound and such upper bound be less than commonly excepted risk for the class of device or application. For commonly used crypto hashes (MD and SHA families) no such proof is known. The birthday attack is frequently referenced but it requires the hash function output probability density to be uniform (constant). No one has proved this for the MA and SHA family. If they could it would diminish their cryptographic usefulness, probably greatly so. Recent crypto analysis showing collision pairs can be generated in less than 2^62 operations for SHA-1 when output uniformity would require 2^80 strongly suggests the output probability density is not uniform. In short, no one can bound the probability of a collision for the MD and SHA family of hashes. '''Therefore they *SHOULD NOT* be used independently for duplicate detection (data reduction). '''

If a hash is used with a source compare to establish certain duplication of input this is, of course, viable. Some manufactures of storage systems do this (i.e. HP 3PAR, EMC VNX, Kaminario K2 storage systems). However for such purposes it is better to use lower overhead hashes, as the consequences of false positives (collision) are eliminated by the source compare. — Preceding unsigned comment added by 73.238.3.45 (talk) 23:25, 8 September 2017 (UTC)

soft
mio 2806:370:5580:2CCB:55F2:9E2:DC18:C01D (talk) 09:01, 30 November 2022 (UTC)