User:Sean3000/sandbox

The Information Value is a method of Feature selection widely used in credit scoring. The formula is:


 * $$\sum_{i=1}^{n}\left (Distr Good_{i}-Distr Bads_{i} \right ) \times ln\left (\frac{Distr Good_{i}}{Distr Bads_{i}}  \right )$$

Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers.

For example, if the category is age band, it may be calculated as follows:

Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is