Talk:Birthday attack

Readability
Could someone who is not a mathematician add a layman's description of the problem and its social significance right at the start? After reading the entire article, I'm still struggling to grasp what the Birthday Attack is exactly. Thanks. Meservy (talk) 10:09, 10 June 2010 (UTC)

I am not sure if the above has been answered.

[Electronic] Messages usually have some sort checksum (hash) to verify that the message has not been corrupted in transit. Examples of hashes people are more familiar with would be their zip code when placing something on layaway, the last four numbers of their social security number, or their day of the year they were born. The birthday problem is used to illustrate a counterintuitive phenomenon of hashes. It takes only 23 people in a room to have a greater than 50% chance of any two people in the room being born on the same day.

Applied to communication an illustrative scenario might go like this. Bob sends a message to Alice stating: “Eve should be demoted”. Eve intercepts the message and tries to get something like “Eve should be promoted” or “Steve should be demoted”, to hash to the same value as the true message. Coining it the ‘birthday attack’ emphasizes that it is not implausible given the probability. Though it is probably not something a script kiddie hacker would successfully execute or be interested in. More like a state sponsor or major corporate entity motivated to engage in misinformation such as “do launch the missiles now” verses “do not launch the missiles now”

Another application might be a piece of software with a publicly know hash code with just a few lines of malicious code planted in it in just the right way so that the software still hashes to the same value. Possible, but no simple feat. Bdayresponse (talk) 05:31, 15 April 2012 (UTC)


 * Bdayresponse, please note that the birthday attack does not apply to the scenarios that you describe. What you describe are situations for a preimage attack. The birthday attack does however work against digitial signatures in the scenario that is already described in the article (as well as many other situations). 178.195.230.127 (talk) 06:30, 15 April 2012 (UTC)


 * Very good clarification. As a lay person, this is the first I have heard of pre-image to distinguish the subtlety, but it makes sense.  Both are trying to discover a hash collision, except in the fair/fraud example Alice might have a little bit of control over the intended target message, although in limited way since the target message is constrained to be a fair contract (assuming the number of ways to make a contract fair is far less than the number of ways to make a contract fraudulent).  The birthday problem is not someone in the room having YOUR birthday, but ANY two people in the room having the same birthday, ergo to qualify as a birthday attack, the attacker must have room to play with both messages.


 * Though, in the contract problem it is not entirely clear why Bob is signing a hash prepared by someone else. One would think Bob would add something unique to him, such as a serial number and time stamp, to every contract before hashing and signing, ergo Alice would not know the hash value to be signed a priori.  Also, in court it would seem easy to demonstrate that both the fair and fraud contracts have the same hash and that is not a coincidence.  The only question would be who did it, Bob or Alice, so it sounds more a like a non-repudiation problem solved by always adding something unique before hashing.


 * As another aside, organizations may have documents that required 5 signatures that protocol dictates occur in a precise order. With pen and paper it may difficult to prove if something was signed out of order, but with hashes and digital signatures it seem signature order could be enforced.Bdayresponse (talk) 02:46, 17 April 2012 (UTC)

Structure
The headers 'Understanding the Problem', 'source code', and 'simple approximation' relate more to the birthday problem than defining a birthday attack. The code is particularly egregious in that it focuses on finding the probability of a birthday match, not the mechanics of creating that match. 'Understanding the problem' could be called 'Definition' (and rewritten).

The information under the "Understanding the Problem" section is mostly related to smart contracts, but they're never actually defined or referenced anywhere, making the entire paragraph related to that mostly useless information.

The mathematics section is of secondary interest and should be moved after the 'Digital Signature susceptibility' (which in itself should be renamed). — Preceding unsigned comment added by 24.104.59.76 (talk) 01:11, 14 December 2016 (UTC)

Digital signature susceptibility
I changed the name "Alice" to "Mallet". Because in IT Sec. it is common to use the names Bob and Alice for good communication partners and Eve or Mallet for bad communication partners. So Alice wouldn't fake signatur, but Mallet would. — Preceding unsigned comment added by 138.246.2.74 (talk) 10:58, 2 July 2011 (UTC)

Which table is correct?
The table given here and the table at Birthday_problem have different values. Which one is correct? Or if they're calculating different things, that probably should be made clearer. -- 205.175.124.30 (talk) 19:15, 12 November 2012 (UTC)
 * They seem to match to me. Don't forget that that table has a different unit for key size. Number774 (talk) 16:22, 20 March 2014 (UTC)

Git hash collisions
It would be interesting to add some discussion about the probability of git hash collisions, since git absolutely relies on the probability of hash collision being so low. — Preceding unsigned comment added by 108.68.105.246 (talk) 03:36, 24 November 2012 (UTC)
 * Doesn't the security of git depend on the collision resistance of the underlying hash function and not on the probability of a random collision? 178.195.225.28 (talk) 11:25, 24 November 2012 (UTC)

Math details
it would be nice to have a reference for the full mathematical details in the "Mathematics" section. Why do the approximation holds? How is Q computed? A reference would be appreciated. — Preceding unsigned comment added by 158.110.147.162 (talk) 08:58, 5 April 2013 (UTC)
 * The page Birthday problem contains derivations and references for the for the equations used here. 83.77.189.6 (talk) 18:11, 5 April 2013 (UTC)
 * It doesn't appear to show how $$Q(H)$$ is derived, specificially where the approximation/derivation of $$\sqrt{\frac{\pi}{2}}$$ comes from (there's no mention of pi in the birthday problem article) or how it is different from $$p(n) = 0.5$$. WhiteCrane (talk) 10:21, 24 July 2023 (UTC)

Integral number of birthdays
The table had such interesting values as 6.1 for the number of times you need to run an attack to get the desired probability of a collision. Since you've either run a hash through, or you haven't, the number of attacks is an integer. I've therefore rounded them off to the nearest integer.

Note that where the probability is low, and the key size is small, you only need to run two attacks. I've marked all these as "less than two". Number774 (talk) 16:30, 20 March 2014 (UTC)

Source code example
The example is unfortunately not written in a totally valid standard C++ way. — Preceding unsigned comment added by 82.236.187.110 (talk) 22:47, 28 March 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Birthday attack. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20040913080209/http://www.rsasecurity.com:80/rsalabs/node.asp?id=2182 to http://www.rsasecurity.com/rsalabs/node.asp?id=2182

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 05:18, 3 November 2016 (UTC)

Misstatement of Bellare and Kohno Paper results
From the main article:

''"It is easy to see that if the outputs of the function are distributed unevenly, then a collision could be found even faster. The notion of 'balance' of a hash function quantifies the resistance of the function to birthday attacks (exploiting uneven key distribution) and allows the vulnerability of popular hashes such as MD and SHA to be estimated (Bellare and Kohno, 2004)."

The Bellare and Kohno "balance" concept requires that the hash function be evaluated for its entire input range. They state that this is "hard to compute" and "infeasible" so they experimentally calculate for very small subsets (a subrange of 32 input bits, note SHA-1 allows 65536 input bits).

They state "These results do not imply that the functions SHAn;t1...t2 or SHAn, n > 32 and t1,t2 as before, are regular." Regular here means uniform probability of collision across the input range.

Implying that their paper solves the problem of understanding the probability density of algorithms like SHA1 and MD5 or even provides an estimate is a misstatement. They refer to their results as only "encouraging."

Given that, there is no estimate for the uniformity of MD style hashes (SHA1, MD5) you can not imply that the birthday bound is the typical case for determining collision probability. It is only a lower bound on likelihood, actual collisions may be much easier to find and thus much more probable.

As a result, your statement:

"Table shows number of hashes n(p) needed to achieve the given probability of success, assuming all hashes are equally likely. For comparison, 10−18 to 10−15 is the uncorrectable bit error rate of a typical hard disk [1]. In theory, MD5 hashes or UUIDs, being 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more."

can not be supported, 820 Billion is only an upper bound, the actual number may be very much, perhaps catastrophically, lower.

You should remove it or caveat it as not guaranteeing real world results.

Corrected the statement regarding Bellare and Kohno 'balance' calculation
Edited the line stating the B&K work allowed quantifying or even estimating frequency of collisions of a hash function. — Preceding unsigned comment added by 73.69.182.76 (talk) 00:45, 15 May 2017 (UTC)

Possible citogenesis
One of the equations in the mathematics section is cited from the article "What is birthday attack??", by Ganesh Gupta. However, in the article itself, Wikipedia is cited, and the specific items are not individually cited, simply placed in a references section at the end. The cite was added on the 18th July 2020, with the article being written in 2015. Could this be an example of citogenesis? 2A00:23C7:1D08:4C01:41F4:1158:48C8:F77C (talk) 17:24, 4 November 2020 (UTC)


 * I've replaced the citation with lecture notes that provide a proof for the inequality. Not sure about the approximation though. Phesch-de (talk) 16:37, 31 March 2023 (UTC)

Delete “inserting commas”
The article cites “inserting commas [and] empty lines” as two ways to alter a document “without changing [its] meaning”. In my opinion, inserting commas is a bad example, and should be deleted from that section of the article. In support of this, here’s a case where a single missing comma, in a legal document, lead to a US$10 million lawsuit: https://www.abc.net.au/news/2017-03-21/the-case-of-the-$13-million-comma/8372956 07:56, 17 June 2021 (UTC) TC

Probability of one common birthday date (any date) vs probability of one student having a specific (chosen) birthday date?...
I'm no expert at all, but still wondering: If in the example of a class of 30 students, there is 70% probability of having 2 students having the same birthday, then how can the probability of one student having a specific date so low? I mean in the worst case scenario, you'd suppose no 2 students have the same birthday, in that case you can simplify the "16 September" probability to this: 30*(1/365) = 30/365 ~= 0.0822 ~= 8.22% (very bad math I know). But we have 70% probability that 2 students have the same birthday, so the chances that one has a specific birthday date should actually be higher than 8.22%, not lower! So to me it seems formula 1-(364/365)^30 gives a false result and doesn't take into account the 70% probability we found out one line above with formula 1-(365!/(365-30)!*365^30 … Shouldn't be the result for the "16 September" example more something like "about 7.9%" / 0.70 ~= 11.3% ? My math is probably wrong but one could make the idea work with better math I guess? Thanks for bearing with me... and sorry if it sounds silly :) 80.215.65.208 (talk) 08:27, 20 July 2022 (UTC)


 * Me again, I just realized that with 100% probability of 2 students having the same birthday we would not get an increase in probability of one student having for birthday the 16th of September but actually a small decrease, wich might after all be accounted for already since 7.9% is indeed a bit lower than 8.22%. I'd do that horrible math again saying it's like among our 30 students we are certain to have only 29 different dates so n becomes 29 and thus 30/365 (too simple I know) becomes indeed 29/365 ~= 0.07945 ~= 7.945%. I know we'd need more than 30 students to get that 100% in the first place, it's just for the sake of simplicity (taking the same numbers again and again) and to compare to the "7.9%" of the article.
 * Still, one could argue the value should intuitively be slightly higher than 7.9% … ;) 80.215.65.208 (talk) 08:39, 20 July 2022 (UTC)
 * Aaand me again: we could do an approximation with a linear interpolation between 7.9% and 8.22% in our specific case of a class of 30 students and thus a 70% probability that 2 students have the same birthday => if 7.945% corresponds to certainty to have only 29 different dates among the 30 students, and if 8.22% corresponds to the certainty of having 30 different dates, then for 70% probability we'd have approx 8.22-(8.22-7.945)*0.70 ~= 8.03% probability that at least one student was born on a 16 September. Ugly math I guess... and who cares anyway? ^^ 80.215.65.208 (talk) 08:57, 20 July 2022 (UTC)
 * Pfew I'm getting bored of myself lol OK one last editing. 30/365 and 29/365 can and should be replaced respectively with the article's formula "1-(364/365)^30" ~= 7.901% and 1-(364/365)^29 ~= 7.65% now we do the linear interpolation:
 * 7.901 - (7.901 - 7.65) * 0.70 = 7.65 + (7.901 - 7.65) * 0.30 ~= 7.725%
 * Less ugly math now... xD 80.215.65.208 (talk) 09:33, 20 July 2022 (UTC)

Add SHA1 160 length to table
Suggesting that since SHA1 is widely used, more-so than SHA384, that it would be important enough to add a row to the table to show the results at 160 bits, at both of the birthday pages. Silversplash (talk) 03:37, 22 October 2022 (UTC)
 * I would argue no because SHA-1 is one of the few algorithms with this bit size and, more importantly, has been deprecated since it has been cryptographically broken.


 * WhiteCrane (talk) 01:03, 24 July 2023 (UTC)

Possible Clarity Required for Q(X)
It's not immediately clear how we get from this...


 * $$n(0.5;H) \approx 1.1774 \sqrt H$$

... to this...


 * $$Q(H)\approx \sqrt{\frac{\pi}{2}H}$$

I assume it's the expected value as defined in statistics, but it is unclear how $$\sqrt{\frac{\pi}{2}}\approx 1.25$$ is generated (I would hazard a guess that $$\sqrt{\pi}$$ is related to the gaussian curve, which fits with the exponential approximation).

WhiteCrane (talk) 00:57, 24 July 2023 (UTC)