Utility assessment

Utility assessment, also called utility measurement, is a process by which the utility function of individuals or groups can be estimated. There are many different methods for utility assessment.

Assessing single-attribute utility functions
A single-attribute utility function maps the amount of money a person has (or gains), to a number representing the subjective satisfaction he derives from it. The motivation to define a utility function comes from the St. Petersburg paradox: the observation that people are not willing to pay much for a lottery, even if its expected monetary gain is infinite. The classical solution to this paradox, suggested by Daniel Bernoulli and Gabriel Cramer, is that most people have a utility function that is strictly concave, and they aim to maximize their expected utility, rather than their expected gain.

Power-log utility
Bernouli himself assumed that the utility is logarithmic, that is, u(x)=log(x) where x is the amount of money; this was sufficient for solving the St. Petersburg paradox. Gustav Fechner also supplied psychophysical justification for the logarithmic function (known as the Weber–Fechner law). But Stanley Smith Stevens showed that the relation between physical stimulus and psychological perception can be better explaind by a power function, that is, u(x)=xp, with exponent p between 0.3 to 2.

Many investigators tried to determine whether utility is better represented by logarithmic functions or by power functions. Using various methods, they showed that power functions fit utility data better. As a result, power functions were incorporated into psychological decision theories, such as Cumulative prospect theory, rank-affected multiplicative (RAM) weights model, and transfer of attention exchange (TAX) model. Some economic applications still use logarithmic functions though.

Wakker noted that power functions can have a negative exponent, but in this case their sign should change so that they remain increasing. One way to define this generalized family of functions is:"$u_r(x) = x^r/r$"which is increasing for any exponent r ≠ 0. Moreover, the limit of this function when r → 0 is exactly the logarithmic function: $$u_0(x) = \ln(x)$$. Therefore, the family of functions ur(x) for all real p is sometimes called power-log utility.

Procedures for assessing utility
Utility functions are usually assessed in experiments checking subjects' preferences over lotteries. Two general types of procedures have been used:

There are several problems with these procedures.
 * 1) In equivalence procedures, subjects are asked to adjust a sum of money in one lottery so that it becomes equivalent - in their eyes - to another lottery. For example, subjects may be asked to consider two lotteries: (a) getting $x for sure; (b) getting $20 with probability 60% and getting $0 with probability 40%. They are asked "for what value of x, would you be indifferent between these two lotteries?". Each such question yields an equation of the form $$\sum_i p_{1,i} u(x_{1,i}) = \sum_i p_{2,i} u(x_{2,i})$$, which can be used to assess the form of the utility function. For example, if the subject answers x=$10, we get the equation $$u(10) = 0.6 u(20) + 0.4 u(0)$$. Assuming power utility, this yields $${10}^r = 0.6 \cdot {20}^r$$, which yields $$r=\log(0.6)/\log(0.5)\approx 0.74$$.
 * 2) In choice procedures, subjects are shown two or more lotteries, and asked which lottery they prefer. For example, subjects may be asked to choose between two options: "(a) getting $10 for sure; (b) getting $20 with probability 60% and getting $0 with probability 40%". Each such question yields an inequality of the form $$\sum_i p_{1,i} u(x_{1,i}) > \sum_i p_{2,i} u(x_{2,i})$$, where p1,i and x1,i are the probabilites and sums in the preferred lottery, and p2,i and x2,i are the probabilities and sums in the other lottery. The advantage of choice questions is that they are easier to answer; a sequence of such questions can lead to the point of equivalence.

First, they assume people weight events by their true (objective) probabilities, p1,i and p2,i. In fact, much evidence shows that people weight events by subjective probabilities. In particular, people tend to overweight small probabilities and underweight medium to large probabilities (see Prospect theory). The non-linearity in the subjective probability may be confounded with the concavity of the utility function. For example, the person indifferent between the lotteries [100%: $10] and [60%:$20, 40%:0] can be modeled by a linear utility function, if we assume that he underweights the probability 60% to around 50%.

One way to avoid this confounding is using equal probabilities in all queries; this was done, for example, by Coombs and Komorita. This trick works for non-configural weight theories, which assume that the subjective probability is a function of the objective probability (that is, every objective probability is translated to a unique subjective probability). In this case, when all probabilities in the queries are equal, and they cancel out in the equations. The equations involve only the utilities, and we can again use them to infer the form of the utility function.

However, configural weight theories, motivated by the Allais paradox, show that subjective probability may depend both on the objective probability and on the outcome. Kirby presented a way to design the queries such that, for power-log utilities and negative-exponential utilities, the predictions do not depend on canceling subjective probabilities.

A second problem is that some experiments use both gains and losses. However, later research show that the concavity of the utility function may be different between gains and losses (see prospect theory and loss aversion). Combining gain and loss domains may yield an incorrect utility function.

A possible solution is to measure each of these two domains separately. Eugene Galanter devised another solution, for both the first and the second problem. He conducted experiments in which no probabilities were used; instead, he asked questions such as "how much money would you need in order to feel twice as happy as $10"? If the answer is e.g. $18, then we get an equation such as $$u(18) = 2 u(10)$$, which gives information on the utility function, without and dependence on probabilities and risk attitudes. His experiments consistently showed that power functions better fit the data than log functions.

A third problem is that most experiments compare the relative fit of different utility models to the data. For example, they can show that power functions fit the data better than logarithmic functions, but cannot reject the hypothesis that power functions fit the data. Kirby presented a novel experiment design, that allowed him to get point-predictions for each model separately. His experiments indicate that both power-log functions and negative-exponent functions do not fit the data. He leaves finding a better-fitting function as an open problem.

Assessing multi-attribute utility functions
A multi-attribute utility (MAU) function maps a bundle with two or more attributes (e.g. money and free time) to a number representing the subjective satisfaction from that bundle.

Assessing MAU is relevant even in conditions of certainty. For example, whereas most people prefer $12,000 for sure to $10,000 for sure, different people may have different preferences between the bundles ($10,000 salary, 8 work hours per day) and ($12,000 salary, 9 work hours per day), even when both bundles are certain. A procedure for assessing MAU in conditions of certainty is presented in Ordinal utility.

Assessing MAU in conditions of uncertainty is more complex; see multi-attribute utility for details.

In health
Assessement of MAU functions is particularly relevant in Health economics. It is often required to choose among different possible treatments, where each treatment has different attributes regarding life expectancy, life quality, safety, and cost. Following extensive surveys, MAU functions for health-related conditions were developed; see Quality-adjusted life year and EQ-5D. The most common method for assessing health-related utilities is time trade-off. To enable decision-making in the national level, a MAU function for health is constructed in the national level, as an "average" utility function of all patiens in the country.

The utility functions are usually normalized such that a utility of 1 means "full health", and a utility of 0 means "death". Negative utility functions are possible, for situations considered "worse than death". As an example, here is a description of a protocol for constructing a value-set for EQ-5D-Y (the EuroQol 5-Dimensional scale for Young people). The construction is done in two steps: an online Discrete Choice Experiment (DCE) survey, and a face-to-face composite time-tradeoff interview (cTTO):

In both steps, the subjects are adults, and they are asked to answer the queries from the point-of-view of a 10-years-old child. The reasons for asking adults, rather than children, were (1) adults are the taxpayers - they should decide how health budget is used; (2) there are queries about death, which may be inappropriate for children; (3) children may misunderstand the questions.
 * 1) The DCE step measures the relative importance of the five dimensions, but yields results on a latent scale, rather than normalized utility values. The recommended experimental design contains 10 blocks, with 15 pairs of health-states in each block (150 pairs overall). Each subject is given a single block (15 pairs). For each pair, the subject has to say which of these two states he prefers (e.g. "do you prefer 12333 or 31122?").
 * 2) The cTTO step is used to compute a normalized value to the anchor state, which is the state 33333: the state with the worst level (3) in all 5 dimensions. Each subject should answer 10 TTO queries. Each TTO query shows a pair of health-state+duration, and the subject has to say which of these he prefers. There are two kinds of queries:
 * 3) * One kind assumes that the state is better than death. The questions look like: "what life do you prefer: 2 years in full health, or 10 years in state 33333"?.
 * 4) * The other kind assumes that the state is worse than death. The questions look like: "what life do you prefer: 2 years in full health, or 5 years in full health plus 10 years in state 33333"?.

The above protocol was first executed in Slovenia as follows:


 * The DCE survey was done to a representative sample of 1276 Slovenian adults, through an online panel of a commercial market research company (Valicon Ljubljana), using LimeSurvey. The experimental design was a D-efficient design, divided into 10 blocks with 15 pairs per block, so all in all, 150 pairs were compared. The 150 pairs were selected randomly, among pairs that maximize the Fisher information matrix. The design allowed the estimation of a multinomial logistic regression model with 50 parameters: 10 parameters for main effects, and 40 parameters for two-way interaction.
 * As quality checks, 202 responses were removed (responses who chose incorrectly in two of three fixed pairs in which one health state dominates the other, and respondents who answered too quickly).
 * The cTTO interview was conducted face-to-face with 210 adults, which are not a representative sample (all live in Primorska, and most were young), but the samples were weighted in order to increase representativeness. The elements of the cTTO task include evaluation of "worse than dead" health states and "better than dead" health states, using a wheelchair scenario. Interviewees were asked to value 10 cTTO states, complete an EQ-5D-Y descriptive profile, and the VAS (Visual analogue scale).
 * As quality checks, 8 responses were removed (responses without explanation of the "worse than dead" task, the cTTO ratings were not consistent: 33333 was not the lowest state, or not enough time was spent on the task).
 * The value of the anchor state 33333 was found to be -0.691; other values were 33233:-0.48, 31133:-0.049, 32223:0.139, 22232:0.237, 11121:0.911, 21111:0.962.
 * Based on these two phases, a linear additive utility model was estimated. A mixed logit model was used, and the coefficients of the worst outcome 3 in the 5 dimensions were (from most important to least important): pain/discomfort:-0.463, anxiety/depression:-0.380, usual-activities:-0.322, mobility:-0.305, self-care:-0.221. Note that 1-0.463-0.380-0.322-0.305-0.221 = -0.691, which is indeed the anchor value of the 33333 state.