Tversky index

The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index.

For sets X and Y the Tversky index is a number between 0 and 1 given by

$$S(X, Y) = \frac{| X \cap Y |}{| X \cap Y | + \alpha | X \setminus Y | + \beta | Y \setminus  X |} $$

Here, $$X \setminus Y$$ denotes the relative complement of Y in X.

Further, $$\alpha, \beta \ge 0 $$ are parameters of the Tversky index. Setting $$\alpha = \beta = 1 $$ produces the Jaccard index; setting $$\alpha = \beta = 0.5 $$ produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then $$\alpha$$ corresponds to the weight of the prototype and $$\beta$$ corresponds to the weight of the variant. Tversky measures with $$\alpha + \beta = 1$$ are of special interest.

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions .

$$S(X,Y)=\frac{| X \cap Y |}{| X \cap Y |+\beta\left(\alpha a+(1-\alpha)b\right)}$$

$$a=\min\left(|X \setminus Y|,|Y \setminus X|\right) $$,

$$b=\max\left(|X \setminus Y|,|Y \setminus X|\right) $$,

This formulation also re-arranges parameters $$\alpha $$ and $$\beta $$. Thus, $$ \alpha $$ controls the balance between $$ |X \setminus Y| $$ and $$ |Y \setminus X| $$ in the denominator. Similarly, $$\beta$$ controls the effect of the symmetric difference $$ |X\,\triangle\,Y\,| $$ versus $$ | X \cap Y | $$ in the denominator.