Unit-weighted regression

In statistics, unit-weighted regression is a simplified and robust version (Wainer & Thissen, 1976) of multiple regression analysis where only the intercept term is estimated. That is, it fits a model


 * $$\hat{y} = \hat{f}(\mathbf{x}) = \hat{b} + \sum_i x_i$$

where each of the $$x_i$$ are binary variables, perhaps multiplied with an arbitrary weight.

Contrast this with the more common multiple regression model, where each predictor has its own estimated coefficient:


 * $$\hat{y} = \hat{f}(\mathbf{x}) = \hat{b} + \sum_i \hat{w}_i x_i$$

In the social sciences, unit-weighted regression is sometimes used for binary classification, i.e. to predict a yes-no answer where $$\hat{y} < 0$$ indicates "no", $$\hat{y} \ge 0$$ "yes". It is easier to interpret than multiple linear regression (known as linear discriminant analysis in the classification case).

Unit weights
Unit-weighted regression is a method of robust regression that proceeds in three steps. First, predictors for the outcome of interest are selected; ideally, there should be good empirical or theoretical reasons for the selection. Second, the predictors are converted to a standard form. Finally, the predictors are added together, and this sum is called the variate, which is used as the predictor of the outcome.

Burgess method
The Burgess method was first presented by the sociologist Ernest W. Burgess in a 1928 study to determine success or failure of inmates placed on parole. First, he selected 21 variables believed to be associated with parole success. Next, he converted each predictor to the standard form of zero or one (Burgess, 1928). When predictors had two values, the value associated with the target outcome was coded as one. Burgess selected success on parole as the target outcome, so a predictor such as a history of theft was coded as "yes" = 0 and "no" = 1. These coded values were then added to create a predictor score, so that higher scores predicted a better chance of success. The scores could possibly range from zero (no predictors of success) to 21 (all 21 predictors scored as predicting success).

For predictors with more than two values, the Burgess method selects a cutoff score based on subjective judgment. As an example, a study using the Burgess method (Gottfredson & Snyder, 2005) selected as one predictor the number of complaints for delinquent behavior. With failure on parole as the target outcome, the number of complaints was coded as follows: "zero to two complaints" = 0, and "three or more complaints" = 1 (Gottfredson & Snyder, 2005. p. 18).

Kerby method
The Kerby method is similar to the Burgess method, but differs in two ways. First, while the Burgess method uses subjective judgment to select a cutoff score for a multi-valued predictor with a binary outcome, the Kerby method uses classification and regression tree (CART) analysis. In this way, the selection of the cutoff score is based not on subjective judgment, but on a statistical criterion, such as the point where the chi-square value is a maximum.

The second difference is that while the Burgess method is applied to a binary outcome, the Kerby method can apply to a multi-valued outcome, because CART analysis can identify cutoff scores in such cases, using a criterion such as the point where the t-value is a maximum. Because CART analysis is not only binary, but also recursive, the result can be that a predictor variable will be divided again, yielding two cutoff scores. The standard form for each predictor is that a score of one is added when CART analysis creates a partition.

One study (Kerby, 2003) selected as predictors the five traits of the Big five personality traits, predicting a multi-valued measure of suicidal ideation. Next, the personality scores were converted into standard form with CART analysis. When the CART analysis yielded one partition, the result was like the Burgess method in that the predictor was coded as either zero or one. But for the measure of neuroticism, the result was two cutoff scores. Because higher neuroticism scores correlated with more suicidal thinking, the two cutoff scores led to the following coding: "low Neuroticism" = 0, "moderate Neuroticism" = 1, "high Neuroticism" = 2 (Kerby, 2003).

z-score method
Another method can be applied when the predictors are measured on a continuous scale. In such a case, each predictor can be converted into a standard score, or z-score, so that all the predictors have a mean of zero and a standard deviation of one. With this method of unit-weighted regression, the variate is a sum of the z-scores (e.g., Dawes, 1979; Bobko, Roth, & Buster, 2007).

Literature review
The first empirical study using unit-weighted regression is widely considered to be a 1928 study by sociologist Ernest W. Burgess. He used 21 variables to predict parole success or failure, and the results suggest that unit weights are a useful tool in making decisions about which inmates to parole. Of those inmates with the best scores, 98% did in fact succeed on parole; and of those with the worst scores, only 24% did in fact succeed (Burgess, 1928).

The mathematical issues involved in unit-weighted regression were first discussed in 1938 by Samuel Stanley Wilks, a leading statistician who had a special interest in multivariate analysis. Wilks described how unit weights could be used in practical settings, when data were not available to estimate beta weights. For example, a small college may want to select good students for admission. But the school may have no money to gather data and conduct a standard multiple regression analysis. In this case, the school could use several predictors—high school grades, SAT scores, teacher ratings. Wilks (1938) showed mathematically why unit weights should work well in practice.

Frank Schmidt (1971) conducted a simulation study of unit weights. His results showed that Wilks was indeed correct and that unit weights tend to perform well in simulations of practical studies.

Robyn Dawes (1979) discussed the use of unit weights in applied studies, referring to the robust beauty of unit weighted models. Jacob Cohen also discussed the value of unit weights and noted their practical utility. Indeed, he wrote, "As a practical matter, most of the time, we are better off using unit weights" (Cohen, 1990, p. 1306).

Dave Kerby (2003) showed that unit weights compare well with standard regression, doing so with a cross validation study—that is, he derived beta weights in one sample and applied them to a second sample. The outcome of interest was suicidal thinking, and the predictor variables were broad personality traits. In the cross validation sample, the correlation between personality and suicidal thinking was slightly stronger with unit-weighted regression (r = .48) than with standard multiple regression (r = .47).

Gottfredson and Snyder (2005) compared the Burgess method of unit-weighted regression to other methods, with a construction sample of N = 1,924 and a cross-validation sample of N = 7,552. Using the Pearson point-biserial, the effect size in the cross validation sample for the unit-weights model was r = .392, which was somewhat larger than for logistic regression (r = .368) and predictive attribute analysis (r = .387), and less than multiple regression only in the third decimal place (r = .397).

In a review of the literature on unit weights, Bobko, Roth, and Buster (2007) noted that "unit weights and regression weights perform similarly in terms of the magnitude of cross-validated multiple correlation, and empirical studies have confirmed this result across several decades" (p. 693).

Andreas Graefe applied an equal weighting approach to nine established multiple regression models for forecasting U.S. presidential elections. Across the ten elections from 1976 to 2012, equally weighted predictors reduced the forecast error of the original regression models on average by four percent. An equal-weights model that includes all variables provided calibrated forecasts that reduced the error of the most accurate regression model by 29% percent.

Example
An example may clarify how unit weights can be useful in practice.

Brenna Bry and colleagues (1982) addressed the question of what causes drug use in adolescents. Previous research had made use of multiple regression; with this method, it is natural to look for the best predictor, the one with the highest beta weight. Bry and colleagues noted that one previous study had found that early use of alcohol was the best predictor. Another study had found that alienation from parents was the best predictor. Still another study had found that low grades in school were the best predictor. The failure to replicate was clearly a problem, a problem that could be caused by bouncing betas.

Bry and colleagues suggested a different approach: instead of looking for the best predictor, they looked at the number of predictors. In other words, they gave a unit weight to each predictor. Their study had six predictors: 1) low grades in school, 2) lack of affiliation with religion, 3) early age of alcohol use, 4) psychological distress, 5) low self-esteem, and 6) alienation from parents. To convert the predictors to standard form, each risk factor was scored as absent (scored as zero) or present (scored as one). For example, the coding for low grades in school were as follows: "C or higher" = 0, "D or F" = 1. The results showed that the number of risk factors was a good predictor of drug use: adolescents with more risk factors were more likely to use drugs.

The model used by Bry and colleagues was that drug users do not differ in any special way from non-drug users. Rather, they differ in the number of problems they must face. "The number of factors an individual must cope with is more important than exactly what those factors are" (p. 277). Given this model, unit-weighted regression is an appropriate method of analysis.

Beta weights
In standard multiple regression, each predictor is multiplied by a number that is called the beta weight, regression weight or weighted regression coefficients (denoted βW or BW). The prediction is obtained by adding these products along with a constant. When the weights are chosen to give the best prediction by some criterion, the model referred to as a proper linear model. Therefore, multiple regression is a proper linear model. By contrast, unit-weighted regression is called an improper linear model.

Model specification
Standard multiple regression hinges on the assumption that all relevant predictors of the outcome are included in the regression model. This assumption is called model specification. A model is said to be specified when all relevant predictors are included in the model, and all irrelevant predictors are excluded from the model. In practical settings, it is rare for a study to be able to determine all relevant predictors a priori. In this case, models are not specified and the estimates for the beta weights suffer from omitted variable bias. That is, the beta weights may change from one sample to the next, a situation sometimes called the problem of the bouncing betas. It is this problem with bouncing betas that makes unit-weighted regression a useful method.