Take-the-best heuristic

In psychology, the take-the-best heuristic is a heuristic (a simple strategy for decision-making) which decides between two alternatives by choosing based on the first cue that discriminates them, where cues are ordered by cue validity (highest to lowest). In the original formulation, the cues were assumed to have binary values (yes or no) or have an unknown value. The logic of the heuristic is that it bases its choice on the best cue (reason) only and ignores the rest.

Psychologists Gerd Gigerenzer and Daniel Goldstein discovered that the heuristic did surprisingly well at making accurate inferences in real-world environments, such as inferring which of two cities is larger. The heuristic has since been modified and applied to domains from medicine, artificial intelligence, and political forecasting. It has also been shown that the heuristic can accurately model how experts, such as airport customs officers and professional burglars, make decisions. The heuristic can also predict details of the cognitive process, such as number of cues used and response times, often better than complex models that integrate all available cues; as such, it is an example of the less-is-more effect.

One-reason decision-making
Theories of decision making typically assume that all relevant reasons (features or cues) are searched and integrated into a final decision. Yet under uncertainty (as opposed to risk), the relevant cues are typically not all known, nor are their precise weights and the correlations between cues. In these situations, relying only on the best cue available may be a reasonable alternative that allows for fast, frugal, and accurate decisions. This is the logic of a class of heuristics known as "one-reason decision making," which includes take-the-best. Consider cues with binary values (0, 1), where 1 indicates the cue value that is associated with a higher criterion value. The task is to infer which of two alternatives has the higher criterion value. An example is which of two NBA teams will win the game, based on cues such as home match and who won the last match. The take-the-best heuristic entails three steps to make such an inference:

Search rule: Look through cues in the order of their validity.

Stopping rule: Stop search when the first cue is found where the values of the two alternatives differ.

Decision rule: Predict that the alternative with the higher cue value has the higher value on the outcome variable.

The validity v of a cue is given by v = C/(C+W), where C is the number of correct inferences when a cue discriminates, and W is the number of wrong inferences, all estimated from samples.

Take-the-best for the comparison task
Consider the task to infer which object, A or B, has a higher value on a numerical criterion. As an example imagine someone having to judge whether the German city of Cologne has a larger population than the other German city of Stuttgart. This judgment or inference has to be based on information provided by binary cues, like "Is the city a state capital?". From a formal point of view, the task is a categorization: A pair (A, B) is to be categorized as XA > XB or  XB > XA (where X denotes the criterion), based on cue information.

Cues are binary; this means they assume two values and can be modeled, for instance, as having the values 0 and 1 (for "yes" and "no"). They are ranked according to their cue validity, defined as the proportion of correct comparisons among the pairs A and B, for which it has different values, i.e., for which it discriminates between A and B. Take-the-best analyses each cue, one after the other, according to the ranking by validity and stopping the first time a cue discriminates between the items and concluding that the item with the larger value has also a larger value on the criterion.

The matrix of all objects of the reference class, from which A and B have been taken, and of the cue values which describe these objects constitutes a so-called environment. Gigerenzer and Goldstein, who introduced take-the-best (see Gerd Gigerenzer and Daniel Goldstein, D. G. (1996) ) considered, as a walk-through example, precisely pairs of German cities. yet only those with more than 100,000 inhabitants. The comparison task for a given pair (A,B) of German cities in the reference class, consisted in establishing which one has a larger population, based on nine cues. Cues were binary-valued, such as whether the city is a state capital or whether it has a soccer team in the national league.

The cue values could be modeled by 1s (for "yes") and 0s (for "no") so that each city could be identified with its "cue profile", i.e., a vector of 1s and 0s, ordered according to the ranking of cues. The question was: How can one infer which of two objects, for example, city A with cue  profile  (100101010)  and city B with cue  profile  (100010101) , scores higher  on  the  established  criterion, i.e., population size? The take-the-best heuristic simply compares the profiles lexicographically, just as numbers written in base two are compared: the first cue value is 1 for both, which means that the first cue does not discriminate between A and B. The second cue value is 0 for both, again with no discrimination. The same happens for the third cue value, while the fourth cue value is 1 for A and 0 for B, implying that A is judged as having a higher value on the criterion. In other words, XA > XB  if and only if  (100101010)  > (100010101). Mathematically this means that the cues found for the comparison allow a quasi-order isomorphism between the objects compared on the criterion, in this case cities with their populations, and their corresponding binary vectors. Here "quasi" means that the isomorphism is, in general, not perfect, because the set of cues is not perfect.

What is surprising is that this simple heuristic has a great performance compared with other strategies. One obvious measure for establishing the performance of an inference mechanism is determined by the percentage of correct judgements. Furthermore, what matters most is not just the performance of the heuristic when fitting known data, but when generalizing from a known training set to new items.

Czerlinski, Goldstein and Gigerenzer compared several strategies with take-the-best: a simple tallying, or unit weight model (also called "Dawes' rule" in that literature), a weighted linear model on the cues weighted by their validities (also called "Franklin's rule" in that literature), linear regression, and Minimalist. Their results show the robustness of take-the-best in generalization.





For example, consider the task of selecting the bigger city of two cities when
 * Models are fit to a data set of 83 German cities
 * Models select the bigger of a pair of cities for all 83*82/2 pairs of cities.

The percent correct was roughly 74% for regression, take-the-best, unit weight linear. More specifically, the scores were 74.3%, 74.2%, and 74.1%, so regression won by a small margin.

However, the paper also considered generalization (also known as out-of-sample prediction).
 * Models are fit to a data set of a randomly-selected half of 83 German cities
 * Models select the bigger of a pair of cities drawn from the other half of cities.

In this case, when 10,000 different random splits were used, regression had on average 71.9% correct, Take-the-best had 72.2% correct, and unit with linear had 71.4% correct. The take-the-best heuristic was more accurate than regression in this case.