User:Carolineneil/sandbox

When modelling discrete choice model, it is always assumed that the choice is determined by the comparison of the underlying latent utility. Denote the population of the agents as $$ T $$, the common choice set for each agent as $$ C $$. For agent $$ t \in T $$, denote her choice as $$ y_{t,i} $$ , which is equal to 1 if choice $$ i $$ is chosen and 0 otherwise. Assume the latent utility is linear with the parameters and the error term is additive, then for an agent $$ t \in T $$ , $$ y_{t,i} = 1 \leftrightarrow x_{t,i}\beta + \epsilon_{t,i} > x_{i,j}\beta + \epsilon_{t,j}, \forall j \neq i and  j \in C $$

where $$ x_{t,i} $$ and $$ x_{t,j} $$ are the $$ q $$ -dimensional observable covariates about the agent and the choice, and  $$\epsilon_{t,i} $$  and  $$ \epsilon_{t,j} $$ are the decision errors caused by some cognitive reasons or information incompleteness. The construction of the observable covariates is very general. For instance, if $$ C $$ is a set of different brands of coffee, then $$ x_{t,i} $$  includes the characteristics both of the agent $$ t $$ , such as age, gender, income and ethnicity, and of the coffee $$ i $$  , such as price, taste and whether it is local or imported. All of the error terms are assumed i.i.d and we need estimate $$ \beta $$ which characterize the effect of different factors on the agent’s choice.

Usually some specific distribution assumption on the error term is imposed, such that the parameter $$ \beta $$ is estimated parametrically. For instance, if the distribution of error term is assumed to be normal, then the model is just a multinomial probit model ; if it is assumed to be an extreme value distribution, then the model becomes a multinomial logit model. The parametric model is convenient for computation but might not be consistent once the distribution of the error term is misspecified.

To make the estimator more robust to the distributional assumption, Manski (1975) proposed a non-parametric model to estimate the parameters. In this model, denote the number of the elements of the choice set as $$ J $$, the total number of the agents as $$ N $$ , and $$ W (J -1) > W (J - 2) > \dots > W (1) > W (0) $$ is a sequence of real numbers. The Maximum Score Estimator is defined as: $$ \hat {b} = {\operatorname{arg\max}}_b \frac {1}{N} \sum_{t=1}^N \sum_{i=1}^J y_{t,i} W (\sum\nolimits_{j \in C, j \neq i} 1 (x_{t,i}b > x_{t,j}b)) $$

Here, $$ \sum\nolimits_{j \in C, j \neq i} 1 (x_{t,i}b > x_{t,j}b)$$   is the ranking of the certainty part of the underlying utility of choosing $$ i $$. The intuition in this model is that the ranking is higher, the more weight will be assigned to the choice, based on which, the optimization objective function similar to the likelihood function in parametric model is constructed. For more about the consistency and asymptotic property about the maximum score estimator, refer to Manski (1975).