Talk:Birthday attack/Archive 1

Birthday attack recommendation

 * Copied from User talk:Matt Crypto:

Hi, re Birthday attack, you removed the recommendation of changing a foreign-supplied text befor signing, saying that using a longer hash is better. While using a long hash is certainly good, it seems to me that changing the text can only help: it squares the time your adversary needs, especially since you never know whether the hashes are long enough for the adversary's hardware, or whether the adversary has discovered a weakness in the hash function. Do you know any downsides to changing the text before signing? Thanks, AxelBoldt 11:44, 18 Aug 2004 (UTC)


 * The text I removed was, "to avoid this birthday attack, it is recommended that Alice slightly modify any digital contract that's presented to her, before signing it.". The downside is that Alice has to modify any digital contract presented to her; the user should be able to treat the crypto primitives as "black boxes", to the greatest possible extent &mdash; this is a point brought out in, say, Ferguson and Schneier's Practical Cryptography. The birthday attack is defeated at the minor cost of an extra, say, 80 bits in the hash length; the user shouldn't have to worry about modifying her actions at a higher level of abstraction. &mdash; Matt 21:07, 18 Aug 2004 (UTC)


 * Matt is right. It would be possible to imagine a signing system in which a document is randomly "salted" before hashing, which would provide similar security, but I don't know of any analysis of the security of such signing.  As seems to be the case inevitably with hash functions, it's easy to prove the security of such a measure in the random oracle model, but very hard to state the properties a hash function would have to have to make such a measure secure. Universal one-way hash functions are a distinct but related idea. ciphergoth 10:32, 2004 Dec 8 (UTC)

I just wanted to check in regarding "It has also been recommended that Alice slightly change any contract presented to her before signing" &mdash; has this practice actually been recommended to any degree? &mdash; Matt Crypto 09:02, 24 Feb 2005 (UTC)


 * Sorry for the late reply. It's on page 430 of Schneier's Applied Cryptography, 2nd ed.. After describing the birthday attack, he writes: "This is a big problem. (One Moral is to always make a cosmetic change to any document you sign.)". I know that I initially read the recommendation somewhere else, but now I don't remember where, and that source probably got it from Schneier anyway. Cheers, AxelBoldt 05:38, 5 Mar 2005 (UTC)


 * OK, I added that reference. --68.0.120.35 20:02, 9 March 2007 (UTC)

I can see two problems with this recommendation. The first one is that an attacker, who tries to forge a signature, can now exploit the birthday attack. This has already been mentioned in this discussion. The second problem is that the recommendation is very fuzzy. A concrete proposal, how a message should be modified and an analysis of that proposal is missing. Here is an example showing why a simple modification may not help at all. Assume an attacker finds two messages m and m'  that hash to the same value. Most hash functions have the property that collisions can be extended. I.e., given h(m)=h(m') we can add further blocks s at the end of the message and get h(m || s) = h(m' || s) and in particular we can even choose s randomly. Hence an attack is still possible. All the attacker has to do is to reject modifications in the first part of the message. In particular, if the signer is given m||s to sign, but signs m||s'  instead then the attacker gets the signature of the message m'||s' . This doesn't mean that randomizing messages before hashing them can not improve the security of the scheme. However, any such proposal must be analyzed first. I'd generally be careful about any statement in Applied Cryptography that is neither referenced nor justified. 85.2.52.225 07:22, 31 May 2007 (UTC)

bob's a she?
"It has also been recommended that Bob cosmetically modify any contract presented to her before signing." now, i guess this is related to the other discussion, since it seems this sentence used to be about alice. but now it's about bob, who is apparently having a gender crisis. is this common among cryptographers? is mallory involved? pauli 11:53, 12 Apr 2005 (UTC)


 * It's my fault, I changed Bob and Alice around, because I thought it wouldn't be fair if Bob is always portrayed as trying to trick Alice. But I forgot to change one instance of her <-> him. Sorry Bob!


 * What kind of editor are you?!?

modifying the contract is no solution
It doesn't matter whether it was Alice or Bob who last modified the contract - if this attack is possible, either could suspect the other of attempting to defraud her or him. So, Bob, altering the contract yourself is no solution, now Alice suspects you of trying to trick her into a fraudulent contract!


 * I removed the sentence

However, this does not solve the problem, because now Alice suspects Bob of attempting to use a birthday attack.


 * Yes, it is true that no matter which person "last" modifies the contract, the other could suspect that person of specially crafting it.


 * But what about this alternative:
 * Alice signs ( hash( fair contract with Alice's cosmetic modifications ) )
 * Bob signs ( hash( fair contract with Bob's cosmetic modifications ) )
 * Each person stores both versions of the contract (and the signatures).


 * This makes it impossible for Alice to use a birthday attack to "prove" that Bob signed some other contract, and it makes it impossible for Bob to use a birthday attack to "prove" that Alice signed some other contract. (But plausible deniability is a different issue).


 * Should we add more clarification that *each* person adds cosmetic modifications,
 * and each person only signs the version that she modified?


 * --68.0.120.35 20:02, 9 March 2007 (UTC)


 * No. Any such proposal would be original research. Cosmetic modifications to documents is not used in practice, unless you count probabilistic signature schemes. The usual way to defeat birthday attacks is -- as already stated in the article -- to use large enough collision resistant hashes. 85.2.115.82 10:50, 10 March 2007 (UTC)

Hard to understand
While I understand the mathmatics behind the Birthday Paradox, I'm having trouble seeing its application to the Birthday Attack. I think I understand a bit of it, but the article isn't very clear.


 * Agreed! I'm finding this all a bit confusing. The Birthday Paradox page is simple to understand however it seems that a clear example hasn't been applied to this topic.


 * I added a derivation of the probability and added an example. Removed the 'confusing' tag.--Kbk 18:14, 2 May 2006 (UTC)

deja vu

 * http://www.md5online.com/hash/Birthday_attack.php

Birthday vs Collision Attack
I've noticed that most other articles seem refer to this as a Collision attack not a Birthday attack. Which is more correct? This is not my area expertise so I don't know if this is noteworthy. k —Preceding unsigned comment added by 203.152.115.183 (talk) 22:58, 23 September 2007 (UTC)

Hi guys... Something is wrong with the references. People reference this page as "collision attack" and "collision resistant". When they get here, it talks about the "birthday attack", which is really a kind-of bruteforce attack; it assumes that the hash is really collision resistant and that no better collision attacks exist. As we all know better attacks exist (for e.g. MD5, SHA, some HMAC implementations), we really really need a page about the general collision attack problem. It would be great if someone more familiar with how to use wikipedia would create a page for "collision attack" /"collision resistant" and let us start pumping information to it. Birthday attack would of course be referenced on the new page.... --Blaufish 15:48, 29 September 2007 (UTC)


 * We've already got hash collision, preimage attack and collision resistance. Are you saying that these are not enough? -- intgr [talk] 13:28, 30 September 2007 (UTC)


 * Extremely late reply: Not really, I guess what I am saying is that the references from other articles to this article are (or at least, were) weird. For example, Collision attack still links here, and I don't see how that can be correct. Pages referencing the general subject of collision attacks comes here, which describes the specific (and least efficient) type of collision attacks. It makes the subject hard to comprehend. Actually the easiest thing would be redirect Collision attack to hash collision instead. I'm not sure how that is done though. Blaufish (talk) 14:20, 15 February 2009 (UTC)

Page is incomprehensible to non-mathematicians - plain-language introduction would be good.
I don't consider myself mathematically illiterate, but I am not a mathematician, and I find this page completely incomprehensible. This actually illustrates a problem I've seen with dozens of Wikipedia pages which deal with subjects which have mathematical theory behind them, and I've just chosen this article to raise a problem that I think could apply to many, many pages.

I'm not suggesting that any of the mathematical theory (including specialist notation, if relevant) should be removed; but it would seem a good idea to me that the article should open with a broad explanation or description that should give readers an overall view of what the term means. I think I read a Wikipedia policy somewhere that longer articles should contain two main parts: a relatively brief description that gives an overall sense of what the term means, with perhaps a few elementary details, which should be understandable by a person with no knowledge of that topic; and then a much longer section (perhaps subdivided into further sections) that contains all the technical detail.

In articles like this one, I really think this first section is either lacking altogether or inadequate. I would like to be able to read an opening section for articles like this that tell me what the term means, with perhaps some of the broad mathematical theory included if that can be described approximately in plain language without detailed mathematical notation. The technical detail, including mathematical notation, would be better placed in the second, more detailed section; and then, after I've read the opening description and got a sense of what the term means, I can start the technical section, and be in a good position to decide if I'm likely to understand it, or if I'm interested enough to wade through difficult details.

M.J.E. 09:19, 19 October 2007 (UTC)


 * I agree, this is a pretty prevalent on specialist topics. The birthday paradox article does a much better job explaining the theory behind this, and some of it should probably be repeated on this article.
 * When you spot problems like these, it can be useful to tag the article or section with "context", "introrewrite" or "technical" to draw attention to the problem (just insert that, including the braces, to the beginning of the article). More such tags can be found at WP:TC. When the tag isn't self-explanatory, it's a good practice to leave a short edit summary or a talk page message, explaining how the tag applies to the particular article. -- intgr [talk] 13:57, 19 October 2007 (UTC)

OK I added some intro stuff, see if it makes sense. OsamaBinLogin (talk) 00:14, 20 June 2010 (UTC)

Implications of the space/time tradeoff
What is the nature of the space-time tradoff? Do you need to store every source and every hash that you compute to get the stated probability? —Preceding unsigned comment added by NeilenMarais (talk • contribs) 21:50, 19 November 2007 (UTC)


 * It means that, instead of naively picking one hash, and comparing it against different hashes (which would require 2n computations), you store the hash somehow, and compare it to all previously calculated hashes, requiring "just" $$\sqrt{2^n} = 2^{n/2}$$ operations according to the birthday problem. This doesn't mean that you have to store all 2n/2 hashes; an obvious alternative is using rainbow tables to reduce the storage requirements, although still quite inefficient.


 * A much better approach is to start off a large number of computers at a random hash value, and make them simply calculate hashes of the previous hash (or alternatively a reduce function of the previous hash). Hashes that match a particular criteria (for example, have upper half bits set to 0), would be stored ("checkpointed") in a database, along with the previous checkpoint. Eventually, two such threads will "coincide", producing an equal checkpoint hash. At this point, the colliding inputs can be re-calculated from the previous checkpoint hashes. This is how the distributed.net MD5 collision search worked. -- intgr [talk] 01:02, 20 November 2007 (UTC)


 * Aren't you mixing collision search and preimage attacks here? Rainbow tables are mainly used to store a large number of (input, hash) pairs in a very space-efficient way, at the cost that given a hash retrieving the corresponding input can be done with modest computation depending on how compressed the rainbow table is. Here we have a time-memory tradeoff, but it is not a birthday attack. Collision searches which are birthday attacks can be done in time $$O(2^{n/2})$$ regardless of how much memory is available. I.e. cycle finding algorithms require just constant memory, but are only by a constant slower than if we stored all the input/outputs during a collision search or if we used the distributed method that you describe above. The situation is similar for discrete logarithms in generic cyclic groups. If the order of the group is a prime p then the best known generic algorithms takes time $$O(\sqrt{p})$$. Pollard rho needs only constant memory and runs in $$O(\sqrt{p})$$ too.
 * Hence again, memory does not improve the time complexity. Hence classifying a birthday attack as a time-memory attack is not justified. 85.2.91.75 (talk) 14:53, 20 November 2007 (UTC)


 * You're right, I was describing collision attacks and didn't realize that. -- intgr [talk] 21:19, 20 November 2007 (UTC)

Birthday attacks and cyphers
Not a comment on this article, but I've removed a comment re birthday attacks on the Blowfish and Disk encryption hardware pages which make reference to birthday attacks and cypher block size. Birthday attacks can potentially be effective wrt hashed values, but aren't effective against encrypted data. In encryption, being deterministic, any given block of cyphertext may only be decrypted to a single plaintext, unless a different key is used

i.e. E(k, a) == E(k, b) only when a == b. Unlike a hash algorithm where H(a) == H(b) can be true when a != b for various a and b

If anyone can find a citation otherwise, feel free to add them back in with a suitable citation! Nuwewsco (talk) 18:47, 21 January 2008 (UTC)


 * It depends on the block cipher mode that is used. For many modes the birthday paradox does apply. E.g., when using a cipher in CBC mode the i-th ciphertext block is computed as $$C_i = E_K(P_i \oplus C_{i-1})$$, where $$P_i$$ is the i-th plaintext block. Here collisions $$C_i=C_j$$ are possible and imply $$P_i \oplus C_{i-1}=P_j \oplus C_{j-1}$$, hence $$P_i\oplus P_j$$ is computable and thus some information about the plaintext leaks. Such leaks are expected to appear once the number of encrypted blocks reaches the birthday bound, i.e. about $$2^{k/2}$$ blocks where $$k$$ is the block size in bits of the cipher. 85.2.62.68 (talk) 19:54, 21 January 2008 (UTC)

Alice and bob
In the section about digital signatures it says "Suppose Alice wants to trick Bob into signing a fraudulent contract", Shouldn't it be Mallory or Eve instead of Alice? I thought you didn't want to use alice or bob when doing "mean" things to avoid confusion? -RichoDemus 2008-05-09 09:15 gtm+1 edit: hm, I'll be bold and do it myself —Preceding unsigned comment added by RichoDemus (talk • contribs) 08:05, 15 May 2008 (UTC)

"On average"
The "Birthday Attack" article says that a collision occurs after 1.25*sqrt(N) "on average" which I interpret as "mean value" but the article obtains 1.25*sqrt(N) as the number of trials required for at least 50% probability of collision -- the "median value". In general the median and the mean are not equal. If they are in this case that should be justified. Otherwise, if the "on average" statement refers to the median it should be clarified to that effect. 70.50.11.172 (talk) 00:04, 3 November 2008 (UTC)
 * You are certainly right that the average number of trials before a collision and the mean value for the number of trials before a collision are not the same. However your change mixes the two results. The average number of trials before a collision is $$Q(H) \approx 1.25\sqrt{H}$$. The mean value before a collision is $$n(0.5,H)\approx 1.17\sqrt{H}$$. Therefore I'm reverting your change. For more results of this type see the page on the birthday paradox. 83.77.97.81 (talk) 11:15, 25 November 2008 (UTC)
 * The value of $$\approx 1.25\sqrt{H}$$ is frequently referred to as the "expected" value of trials before a collision. Isn't this normally taken to mean that it is the mean and not the median?

144.32.80.201 (talk) 19:38, 12 February 2009 (UTC)
 * Also, the phrase "the expected number of values we have to choose before finding the first collision." seems ambiguous. Is this the number of values we have to choose so that E(number of collisions) = 1?144.32.80.201 (talk) 22:06, 12 February 2009 (UTC)
 * No these are not the same things. The expected number of collisions after n trials is $$n(n-1)/2H$$. Hence E(number of collisions)=1 needs about $$\sqrt{2H}$$ trials. So this is different than both the mean $$Q(H)$$ or the median $$n(0.5, H)$$ number of trials before the first collision. 92.106.14.98 (talk) 23:28, 12 February 2009 (UTC)
 * My thanks for that, but I'm afraid I'm not sure this is the case. Although the $$n$$ trials generate $$n(n-1)$$ pairs, these pairs are not statistically independent of each other. The equation you give would apply if they were, but I'm not sure if it still applies in this case.144.32.80.201 (talk) 17:12, 13 February 2009 (UTC)

144.32.80.201 (talk) 20:15, 13 February 2009 (UTC)
 * For any random variables X and Y we have $$E(X + Y) = E(X) + E(Y)$$ independently of whether X and Y are statistically independent or not. Hence if a pair of values collide with probability $$1/H$$ then expected value of the number of collisions between n(n-1) pairs is exactly $$n(n-1)/2H$$ regardless whether the collisions are independent of each other or not. 92.106.56.48 (talk) 20:28, 13 February 2009 (UTC)
 * You're right! Thanks. 144.32.80.201 (talk) 23:59, 13 February 2009 (UTC)

What's with Mary?
People - it is general practice to use the names Alice, Bob, Eve, Mallory, and Trent when discussing cryptographic protocols. Why? Because Schneier uses the names and his works on cryptography are just the most important ones around. Quit fussing and making Bob into Eve or Alice into Mary - that just confuses things more. Alice and Bob are instead of A and B; Eve evesdrops; Mallory is malicious; Trent is trusted. Do we see a pattern here? --WiseWoman (talk) 08:01, 8 December 2008 (UTC)
 * Your comment lacks precison. First, Schneier did not introduce the Alice and Bob conventions. Nor is his work anywhere reliable enough that his claims would become instant standards. The Alice and Bob conventions are not binding at all. Authors are free to choose whatever names they like to use in their papers. As such using Mary for the attacker would be acceptable as long as the author makes clear that Mary is an attacker. Your choice to use Alice as the attacker, (given that Alice frequently is used for an honest participant in a cryptographic protocol) is actually much more confusing. Preferable to Mary was the previous choice of Mallory as this name is often used for active attackers. 62.203.12.206 (talk) 17:00, 26 December 2008 (UTC)