Talk:Rabin fingerprint

Cryptographic or any hash?
The article states that: "Now, it is clear that any cryptographic hash could be used for this purpose". However, are Rabin fingerprint actually cryptographic? I am aware of no reference of this and since a data stream of zeros has always the same fingerprint in the rolling version of Rabin fingerprints. I would claim that they are not. (I am working on storage systems using Rabin fingerprints, I am not a cryptography expert). What Michael Rabin claimed in the original paper is that Rabin fingerprints are class of hash functions with good understood collision probabilities, but this doesn't mean that they are cryptographic. Dirk --84.60.33.214 (talk) 18:06, 20 July 2011 (UTC)

The author's claim is not that Rabin Fingerprints are cryptographic but that cryptographic hashes make good shift resistant fingerprints. --Hkc94501 (talk) 05:13, 25 February 2012 (UTC)

I think you are right. Would "any hash function" work? So I changed that sentence from "any cryptographic hash" to "any hash function". If I've misunderstood something, please revert. --DavidCary (talk) 12:19, 6 September 2013 (UTC)

I see that someone took me up on my offer and reverted back to "any cryptographic hash". Great! Perhaps I will learn something new!

If I understand the "A Low-bandwidth Network File System" reference in this article correctly, LBFS uses 2 different hash functions for two different purposes:
 * (a) To break up long files into reasonably-sized blocks in a way that is "shift invariant". (LBFS uses Rabin for this).
 * (b) When Bob requests a file from Alice, Alice sends a short summary of the file (for each block in the file, the hash of that block). For each block, if Bob is lucky he already has an identical block (in Bob's old version of the file, or perhaps as part of some other file entirely); otherwise he asks Alice to literally send that block. (LBFS uses SHA-1 for this).

(Is there some other article that discusses what kinds of hashes are suitable for (a) ?)

Most hash functions won't work for (b), even in the absence of malicious adversaries. When a file has several different blocks that give the same Crc16 checksum -- as often happens with common files -- then it's impossible for Bob to figure out which of those blocks needs to be copied if Alice were to summarize some block(s) by its CRC16 checksum. The summary (b) requires at least a fingerprint (computing) type hash function. If a malicious adversary could attack some system by creating two different files that have exactly the same summary -- then a cryptographic checksum is required for (b) to prevent that attack.

Is there any reason Crc16 won't work for (a) ? Is a cryptographic hash required? If so, could someone add a few words explaining *why* it is required -- or, if it requires a long explanation, add a link to that explanation? (Even if Crc16 would work for (a), a rolling hash such as the Rabin fingerprint may be a better choice because it is faster to compute). --DavidCary (talk) 17:03, 9 September 2013 (UTC)


 * Sorry, my bad. I was the one who reverted and I didn't see this discussion on the talk page. In fact, I was confused and thought Rabin fingerprints meant something else. So a total mind fart on my part. Please ignore my edit. Since you have already changed the given text, I don't know whether I should revert or not. I'll leave that up to you. -- intgr [talk] 19:59, 9 September 2013 (UTC)


 * Thank you, intgr. It looks fine to me now. --DavidCary (talk) 02:48, 8 June 2015 (UTC)

Implementation link(s)
I'm convinced https://github.com/joeltucci/rabin-fingerprint-c does not implement Rabin-type fingerprint, so I'm deleting the link. For shifting I see multiplication by a fixed prime (=3!) and it's done in the 64-bit word ring (no field).

I don't know a good replacement. https://github.com/lemire/rollinghashcpp/blob/master/generalhash.h seems perfectly correct except that the irreducible polynomial there is fixed instead of chosen at random.

--Neználek (talk) 10:53, 5 June 2016 (UTC)


 * I adapted from restic an implementation of the rabin fingerprint in https://github.com/chmduquesne/rollinghash (the "simple" version is https://github.com/chmduquesne/rollinghash/blob/master/rabinkarp64/rabinkarp64.go#L185, and the rest is a lot of caches for doing it in a rolling fashion)
 * The author of restic wrote https://github.com/fd0/rabin-cdc, which contains an implementation in C.
 * lbfs has https://github.com/fd0/lbfs/blob/bdf4f17d23b68536e7805c88e269026c74c32d59/liblbfs/rabinpoly.C and https://github.com/fd0/lbfs/blob/bdf4f17d23b68536e7805c88e269026c74c32d59/liblbfs/rabinpoly.h — Preceding unsigned comment added by Chmduquesne (talk • contribs) 14:04, 4 February 2018 (UTC)