User:Zanderredux/sandbox

Attribute Log Odds
Assuming that a given characteristic i is being examined, so the subscript i will be omitted:

$$\mbox{LogOdds}_j=\ln\left(\frac{\mbox{bad rate}_j}{1-\mbox{bad rate}_j}\right)$$

where
 * $$\mbox{LogOdds}_j$$ is the log odds of the j-th attribute
 * $$\mbox{bad rate}_j$$ is the bad rate of the j-th attribute

Normalized Score
Normalized scores

$$\mbox{NS}_{ij}=-50\frac{\mbox{LogOdds}_j-\operatorname{mean}(\mbox{population log odds})}{\operatorname{stdev}(\mbox{population log odds})}$$

where
 * $$\mbox{LogOdds}_j$$ is the log odds of the j-th attribute
 * $$\operatorname{mean}(\mbox{population log odds})$$ is the average log odds of the population
 * $$\operatorname{stdev}(\mbox{population log odds})$$ is the standard deviation of the population

Scaling
The scaling function below will calculate the scaled score of each observation based on 20 points-to-double-odds (PDO), aligned at a odds of 20:1 at 600.

$$

S = \sum_{ij}{\left[ \text{NS}_{ij}\cdot\beta_i\frac{20}{\ln 2}+\left(600-\frac{20}{\ln 2}\ln 4\right)+\frac{1}{n}\left(\frac{20}{\ln 2}\beta_0\right)\right] }

$$

where
 * $$i$$ is the i-th characteristic in the model
 * $$j$$ is the j-th attribute of the i-th characteristic
 * $$\text{NS}_{ij}$$ is the normalized score of the j-th attribute of the i-th characteristic
 * $$\beta_i$$ is the logistic regression coefficient of the i-th characteristic
 * $$\beta_0$$ is the logistic regression intercept
 * $$n$$ is the number of characteristics in the model

X
se_roc = ((AUC * (1 - AUC) + (sumbads - 1) * (AUC / (2 - AUC) - AUC ^ 2) + (sumgoods - 1) * (2 * AUC ^ 2 / (1 + AUC) - AUC ^ 2)) / (sumbads * sumgoods)) ^ 0.5

$$ \mbox{SEROC} = 2\sqrt{ \frac{ a\left(1 - a\right) + \left(\sum_B -1\right) \left(\frac{a}{2-a} - a^2\right) + (\sum_G - 1) \left(2 \frac{a^2}{1+a} - a^2\right)}{ \sum_B \sum_G } } $$