Rainbow table

A rainbow table is a precomputed table for caching the outputs of a cryptographic hash function, usually for cracking password hashes. Passwords are typically stored not in plain text form, but as hash values. If such a database of hashed passwords falls into the hands of attackers, they can use a precomputed rainbow table to recover the plaintext passwords. A common defense against this attack is to compute the hashes using a key derivation function that adds a "salt" to each password before hashing it, with different passwords receiving different salts, which are stored in plain text along with the hash.

Rainbow tables are a practical example of a space–time tradeoff: they use less computer processing time and more storage than a brute-force attack which calculates a hash on every attempt, but more processing time and less storage than a simple table that stores the hash of every possible password.

Rainbow tables were invented by Philippe Oechslin as an application of an earlier, simpler algorithm by Martin Hellman.

Background
For user authentication, passwords are stored either as plaintext or hashes. Since passwords stored as plaintext are easily stolen if database access is compromised, databases typically store hashes instead. Thus, no one – including the authentication system – can learn a password merely by looking at the value stored in the database.

When a user enters a password for authentication, a hash is computed for it and then compared to the stored hash for that user. Authentication fails if the two hashes do not match; moreover, authentication would equally fail if a hashed value were entered as a password, since the authentication system would hash it a second time.

To learn a password from a hash is to find a string which, when input into the hash function, creates that same hash.

Though brute-force attacks (e.g. dictionary attacks) may be used to try to invert a hash function, they can become infeasible when the set of possible passwords is large enough. An alternative to brute-force is to use precomputed hash chain tables. Rainbow tables are a special kind of such table that overcome certain technical difficulties.

Etymology
The term rainbow tables was first used in Oechslin's initial paper. The term refers to the way different reduction functions are used to increase the success rate of the attack. The original method by Hellman uses many small tables with a different reduction function each. Rainbow tables are much bigger and use a different reduction function in each column. When colors are used to represent the reduction functions, a rainbow appears in the rainbow table. Figure 2 of Oechslin's paper contains a black-and-white graphic that illustrates how these sections are related. For his presentation at the Crypto 2003 conference, Oechslin added color to the graphic in order to make the rainbow association more clear. The enhanced graphic that was presented at the conference is shown in the illustration.

Precomputed hash chains
Given a password hash function H and a finite set of passwords P, the goal is to precompute a data structure that, given any output h of the hash function, can either locate an element p in P such that H(p) = h, or determine that there is no such p in P. The simplest way to do this is compute H(p) for all p in P, but then storing the table requires Θ(|P|n) bits of space, where |P| is the size of the set P and n is the size of an output of H, which is prohibitive for large |P|. Hash chains are a technique for decreasing this space requirement. The idea is to define a reduction function R that maps hash values back into values in P. Note, however, that the reduction function is not actually an inverse of the hash function, but rather a different function with a swapped domain and codomain of the hash function. By alternating the hash function with the reduction function, chains of alternating passwords and hash values are formed. For example, if P were the set of lowercase alphabetic 6-character passwords, and hash values were 32 bits long, a chain might look like this:
 * $${\color{Red}\mathtt{aaaaaa}}\,\xrightarrow[\;H\;]{}\,\mathtt{281DAF40}\,\xrightarrow[\;R\;]{}\,\mathtt{sgfnyd}\,\xrightarrow[\;H\;]{}\,\mathtt{920ECF10}\,\xrightarrow[\;R\;]{}\,{\color{Violet}\mathtt{kiebgt}}$$

The only requirement for the reduction function is to be able to return a "plain text" value in a specific size.

To generate the table, we choose a random set of initial passwords from P, compute chains of some fixed length k for each one, and store only the first and last password in each chain. The first password is called the starting point and the last one is called the endpoint. In the example chain above, "aaaaaa" would be the starting point and "kiebgt" would be the endpoint, and none of the other passwords (or the hash values) would be stored.

Now, given a hash value h to invert (find the corresponding password for), compute a chain starting with h by applying R, then H, then R, and so on. If at any point a value matches one of the endpoints in the table, the corresponding starting point allows to recreate the complete chain. There's a high chance that this chain will contain the value h, and if so, the immediately preceding value in the chain is the password p that we seek.

For example, given the hash 920ECF10, its chain can be computed by first applying R:


 * $$\mathtt{920ECF10}\,\xrightarrow[\;R\;]{}\,{\color{Violet}\mathtt{kiebgt}}$$

Since "kiebgt" is one of the endpoints in our table, the corresponding starting password "aaaaaa" allows to follow its chain until 920ECF10 is reached:


 * $${\color{Red}\mathtt{aaaaaa}}\,\xrightarrow[\;H\;]{}\,\mathtt{281DAF40}\,\xrightarrow[\;R\;]{}\,\mathtt{sgfnyd}\,\xrightarrow[\;H\;]{}\,\mathtt{920ECF10}$$

Thus, the password is "sgfnyd" (or a different password that has the same hash value).

Note however that this chain does not always contain the hash value h; it may so happen that the chain starting at h merges with a chain having a different starting point. For example, the chain of hash value FB107E70, also leads to kiebgt:


 * $$\mathtt{FB107E70}\,\xrightarrow[\;R\;]{}\,\mathtt{bvtdll}\,\xrightarrow[\;H\;]{}\,\mathtt{0EE80890}\,\xrightarrow[\;R\;]{}\,{\color{Violet}\mathtt{kiebgt}}$$

The chain generated by the corresponding starting password "aaaaaa" is then followed until FB107E70 is reached. The search will end without reaching FB107E70 because this value is not contained in the chain. This is called a false alarm. In this case, the match is ignored and the chain of h is extended looking for another match. If the chain of h gets extended to length k with no good matches, then the password was never produced in any of the chains.

The table content does not depend on the hash value to be inverted. It is created once and then repeatedly used for the lookups unmodified. Increasing the length of the chain decreases the size of the table. However, it also increases the time required to perform lookups, and this is the time-memory trade-off of the rainbow table. In a simple case of one-item chains, the lookup is very fast, but the table is very big. Once chains get longer, the lookup slows, but the table size goes down.

Simple hash chains have several flaws. Most serious if at any point two chains collide (produce the same value), they will merge and consequently the table will not cover as many passwords despite having paid the same computational cost to generate. Because previous chains are not stored in their entirety, this is impossible to detect efficiently. For example, if the third value in chain 3 matches the second value in chain 7, the two chains will cover almost the same sequence of values, but their final values will not be the same. The hash function H is unlikely to produce collisions as it is usually considered an important security feature not to do so, but the reduction function R, because of its need to correctly cover the likely plaintexts, cannot be collision resistant.

Other difficulties result from the importance of choosing the correct function for R. Picking R to be the identity is little better than a brute force approach. Only when the attacker has a good idea of likely plaintexts will they be able to choose a function R that makes sure time and space are only used for likely plaintexts, not the entire space of possible passwords. In effect R shepherds the results of prior hash calculations back to likely plaintexts but this benefit comes with the drawback that R likely won't produce every possible plaintext in the class the attacker wishes to check denying certainty to the attacker that no passwords came from their chosen class. Also it can be difficult to design the function R to match the expected distribution of plaintexts.

Rainbow tables
Rainbow tables effectively solve the problem of collisions with ordinary hash chains by replacing the single reduction function R with a sequence of related reduction functions R1 through Rk. In this way, for two chains to collide and merge they must hit the same value on the same iteration: consequently, the final values in these chain will be identical. A final postprocessing pass can sort the chains in the table and remove any "duplicate" chains that have the same final values as other chains. New chains are then generated to fill out the table. These chains are not collision-free (they may overlap briefly) but they will not merge, drastically reducing the overall number of collisions.

Using sequences of reduction functions changes how lookup is done: because the hash value of interest may be found at any location in the chain, it's necessary to generate k different chains. The first chain assumes the hash value is in the last hash position and just applies Rk; the next chain assumes the hash value is in the second-to-last hash position and applies Rk−1, then H, then Rk; and so on until the last chain, which applies all the reduction functions, alternating with H. This creates a new way of producing a false alarm: an incorrect "guess" of the position of the hash value may needlessly evaluate a chain.

Although rainbow tables have to follow more chains, they make up for this by having fewer tables: simple hash chain tables cannot grow beyond a certain size without rapidly becoming inefficient due to merging chains; to deal with this, they maintain multiple tables, and each lookup must search through each table. Rainbow tables can achieve similar performance with tables that are k times larger, allowing them to perform a factor of k fewer lookups.

Example

 * 1) Starting from the hash ("re3xes") in the image below, one computes the last reduction used in the table and checks whether the password appears in the last column of the table (step 1).
 * 2) If the test fails (rambo doesn't appear in the table), one computes a chain with the two last reductions (these two reductions are represented at step 2)
 * Note: If this new test fails again, one continues with 3 reductions, 4 reductions, etc. until the password is found. If no chain contains the password, then the attack has failed.
 * 1) If this test is positive (step 3, linux23 appears at the end of the chain and in the table), the password is retrieved at the beginning of the chain that produces linux23. Here we find passwd at the beginning of the corresponding chain stored in the table.
 * 2) At this point (step 4), one generates a chain and compares at each iteration the hash with the target hash. We find the hash re3xes in the chain, and the password that produced it (culture) one step earlier in the chain: the attack is successful.



Rainbow tables use a refined algorithm with a different reduction function for each "link" in a chain, so that when there is a hash collision in two or more chains, the chains will not merge as long as the collision doesn't occur at the same position in each chain. This increases the probability of a correct crack for a given table size, at the cost of squaring the number of steps required per lookup, as the lookup routine now also needs to iterate through the index of the first reduction function used in the chain.

Rainbow tables are specific to the hash function they were created for e.g., MD5 tables can crack only MD5 hashes. The theory of this technique was invented by Philippe Oechslin as a fast form of time/memory tradeoff, which he implemented in the Windows password cracker Ophcrack. The more powerful RainbowCrack program was later developed that can generate and use rainbow tables for a variety of character sets and hashing algorithms, including LM hash, MD5, and SHA-1.

In the simple case where the reduction function and the hash function have no collision, given a complete rainbow table (one that makes sure to find the corresponding password given any hash) the size of the password set |P|, the time T that had been needed to compute the table, the length of the table L and the average time t needed to find a password matching a given hash are directly related:


 * $$T = |P| $$


 * $$t = \frac {|P|}{2L}$$

Thus the 8-character lowercase alphanumeric passwords case (|P| ≃ 3&times;1012) would be easily tractable with a personal computer while the 16-character lowercase alphanumeric passwords case (|P| ≃ 1025) would be completely intractable.

Defense against rainbow tables
A rainbow table is ineffective against one-way hashes that include large salts. For example, consider a password hash that is generated using the following function (where "+" is the concatenation operator):

Or

The salt value is not secret and may be generated at random and stored with the password hash. A large salt value prevents precomputation attacks, including rainbow tables, by ensuring that each user's password is hashed uniquely. This means that two users with the same password will have different password hashes (assuming different salts are used). In order to succeed, an attacker needs to precompute tables for each possible salt value. The salt must be large enough, otherwise an attacker can make a table for each salt value. For older Unix passwords which used a 12-bit salt this would require 4096 tables, a significant increase in cost for the attacker, but not impractical with terabyte hard drives. The SHA2-crypt and bcrypt methods—used in Linux, BSD Unixes, and Solaris—have salts of 128 bits. These larger salt values make precomputation attacks against these systems infeasible for almost any length of a password. Even if the attacker could generate a million tables per second, they would still need billions of years to generate tables for all possible salts.

Another technique that helps prevent precomputation attacks is key stretching. When stretching is used, the salt, password, and some intermediate hash values are run through the underlying hash function multiple times to increase the computation time required to hash each password. For instance, MD5-Crypt uses a 1000 iteration loop that repeatedly feeds the salt, password, and current intermediate hash value back into the underlying MD5 hash function. The user's password hash is the concatenation of the salt value (which is not secret) and the final hash. The extra time is not noticeable to users because they have to wait only a fraction of a second each time they log in. On the other hand, stretching reduces the effectiveness of brute-force attacks in proportion to the number of iterations because it reduces the number of attempts an attacker can perform in a given time frame. This principle is applied in MD5-Crypt and in bcrypt. It also greatly increases the time needed to build a precomputed table, but in the absence of salt, this needs only be done once.

An alternative approach, called key strengthening, extends the key with a random salt, but then (unlike in key stretching) securely deletes the salt. This forces both the attacker and legitimate users to perform a brute-force search for the salt value. Although the paper that introduced key stretching referred to this earlier technique and intentionally chose a different name, the term "key strengthening" is now often (arguably incorrectly) used to refer to key stretching.

Rainbow tables and other precomputation attacks do not work against passwords that contain symbols outside the range presupposed, or that are longer than those precomputed by the attacker. However, tables can be generated that take into account common ways in which users attempt to choose more secure passwords, such as adding a number or special character. Because of the sizable investment in computing processing, rainbow tables beyond fourteen places in length are not yet common. So, choosing a password that is longer than fourteen characters may force an attacker to resort to brute-force methods.

Specific intensive efforts focused on LM hash, an older hash algorithm used by Microsoft, are publicly available. LM hash is particularly vulnerable because passwords longer than 7 characters are broken into two sections, each of which is hashed separately. Choosing a password that is fifteen characters or longer guarantees that an LM hash will not be generated.

Common uses
Nearly all distributions and variations of Unix, Linux, and BSD use hashes with salts, though many applications use just a hash (typically MD5) with no salt. The Microsoft Windows NT/2000 family uses the LAN Manager and NT LAN Manager hashing method (based on MD4) and is also unsalted, which makes it one of the most popularly generated tables. Rainbow tables have seen reduced usage as of 2020 as salting is more common and GPU-based brute force attacks have become more practical. However, rainbow tables are available for eight and nine character NTLM passwords.