User:Ottavia00/sandbox

Overview
Since the beginning, in many sports, there was a constant need of creating rating systems to accurately assess the performance of the players of a game. The rating systems employ different algorithms across sports. While in football the teams are ranked according to the BCS ranking, in chess for instance the master chess-players are ranked using the so called ELO rating system. A number of problems arise once one tries to extrapolate the performance ordering using pairwise comparisons. First of all it is not enough to simply choose the best by looking who has the highest number of wins compared to the losses. One should account for the probability of playing constantly against tougher opponents. Moreover a loss against a better opponent should be treated different compared to a loss against a equal or weaker opponent. Further one has to consider the periods of inactivity of the player and differentiate when dealing with a decisive win or a draw.

Rating systems based on Networks
An extended research was dedicated for explaining and building algorithms based on networks that would measure in a more efficient and simple way the strength of a player. A lot of them proved to be more intuitive providing soundness results and predictions, making them more attractive.

One example is the replacement of the Harkness rating system with the ELO rating system. The second approach, although statistically based, had the advantage of taking into account the indirect additional or decreasing points when winning against a overrated played or loosing against a underrated player, using a bayesian framework.

Park-Newman model
Park and Newman build a rating model using the network representation of American football games. Their main idea was to make use of the effect of indirect wins in a team's score. It closely replicates the results of ranking displayed by the BCS official scores, which is mainly an expert-based rating. The analysis has as preliminary step the representation of the schedule of games as a network, connecting two football teams which are the vertices through an edge if they played against each other during a season. The chosen framework was of directed network where the arrow of the edge pointed towards the loser of the match. Indirect wins and losses were defined by directed paths of length two between the teams (the model used indirect wins and losses up to a second-order in the form: X wins over Y, Y wins over Z hence X wins over Z).

This network based algorithm of ranking is offering a solution to the case when a player wins against a tougher opponent constantly. The number of indirect links will be higher when winning against a player that has larger number of direct wins. The indirect links of higher order were also considered, but they count less than for instance a direct win or loss.

The researchers Juyong Park, Mark Newman expressed the scoring model mathematically using the concept of adjacency matrix of the college American football network and defined the total score of a football team using a generalization of Katz centrality as the following:

Total score team i is the difference between total win score and total loss score.

Win score team i - total number of direct wins and total discounted indirect wins based on the distance:

w_{i} = \sum_{j} a_{ji} + \alpha \sum_{k,j} a_{kj} a_{ji} + \alpha \sum_{h,k,j} a_{hk} a_{kj} a_{ji}+... = \sum_{j} (1 + \alpha w_j) a_{ji} = k_{out}^i + \alpha \sum_{j} w_{j} a_{ji} $$ ,

Loss score team i - total number of direct wins and total discounted indirect wins based on the distance:

l_{i} = \sum_{j} a_{ij} + \alpha \sum_{j,k} a_{ij} a_{jk} + \alpha \sum_{j,k,h} a_{ij} a_{jk} a_{kh}+... = \sum_{j} a_{ij} (1 + \alpha l_j) = k_{in}^i + \alpha \sum_{j} a_{ij} l_{j} $$ ,
 * The parameter $$ \alpha $$ is entailing the weight put on indirect wins or losses, where value of 0 means that the rating is taking into account only direct wins and losses.

Callaghan, Mucha and Porter model
Other algorithm applied for football that made use of complex networks and developed by Callaghan, Mucha and Porter was the so called Random walker ranking.

The researchers tried to replicate the voting process by employing voting automatons. These will act as random walkers stating their bet on a single team. The probability of change in the voter's preference was obtained by repeatedly selecting at random a game of the chosen team's schedule. Based on the random change in the voters preferences, paths were drawn in the graph where the teams (vertices) are connected if they played against each other (edges).

The expected rate of change in the number of votes assigned to a certain team is explained using the following formula: $$ v' = D v .$$

,where the elements of D matrix are:


 * $$ D_{ii}=-p l_{i} - (1-p) w_{i}, $$


 * $$ D_{ij}= \frac{1}{2} n_{ij} + \frac{(2p-1)}{2} a_{ij}, $$


 * $$ l_{i} $$ - number of games lost by team i,


 * $$ w_{i} $$ - number of games won by team i,


 * $$ n_{ij} $$ - number of games team i played with team j,


 * $$ a_{ij} $$ = number of games team i wins over team j - number of games team i lost against team j,


 * $$ p $$ - average rate at which random walker changes his preference from team i to team j, when team i wins over team j in a game.

In the end the ranking of the football teams is given by the dynamics of the random walkers in the network.

Motegi and Masuda model
Furthermore researchers Motegi	and Masuda extend the Park - Newman scoring system to the dynamic case scenario applying it for tennis data. Hence the model accounts for the time when two players competed against each other. One should make the difference when competing with a high rated opponent when that player is close to retirement or in the peak of his performance. In this respect the researchers proposed a dynamic centrality measure that would be applicable for this type of network. One issue of the Park - Newman network-based static scoring model was that it fails to account for the change in the strength of a player over time. For instance once player $$ i$$ beat player $$j$$ in a game, he is rewarded with indirect links from the player $$j$$ future wins against other players and hence the scoring system will overestimate the rating of player $$i$$.

Hence the rating system proposed assumed that the Park-Newman win-lose score of player $$i$$ is dependent only on the win-lose score of player $$j$$ at the moment of the game and that for each player the win-lose score is decreasing exponentially over time.

The dynamic score for winning at time $$ t_{n} $$ ( $$ w_{t_{n}} $$ ) is defined as follows:



w_{t_{n}} = a_{t_{n}} + e^{-\beta( t_{n} - t_{n-1})} \sum_{{m_{n}}\in} \alpha^{m_{n}} a_{t_{n-1}} a_{t_{n}}^{m_{n}} +...+ e^{-\beta( t_{n} - t_{1})} \sum_{{m_{2}...m_{n}}\in{0,1}} \alpha^{\sum_{i=2}^n{m_{i}}} a_{t_{1}} a_{t_{2}}^{m_{2}} ... a_{t_{n}}^{m_{n}} $$                                         (1),


 * Adjacency matrix $$ a_{t_{n}} $$ - shows the win-lose for the game occurring at time t. Hence if player $$i$$ lost to player $$j$$ at time $$ t_{n} $$ then $$ (i,j) $$ element of the matrix will be 1;


 * Parameter $$ \alpha $$ has the same interpretation as in the Park-Newman model;


 * Parameter $$ \beta $$ is the discount rate of the score.

The first term accounts for the number of direct wins player $$i$$ had at time $$ t_{n} $$, while the second term represents the discounted number of direct wins at time $$ t_{n-1} $$ for the case when $$ m_{n}=0 $$. For the case when $$ m_{n}=1 $$ the second term means the discounted number of indirect wins at time $$ t_{n-1} $$. The third term implies the total discounted number of direct wins at time $$ t_{n-2} $$ for the case when both $$ m_{n}=0 $$ and $$ m_{n-1}=0 $$, total discounted indirect wins from the games played at time $$ t_{n-2}$$ and $$ t_{n}  $$ when $$ m_{n}=1 $$ and $$ m_{n-1}=0 $$, total discounted indirect wins from the games played at $$ t_{n-2} $$ and $$ t_{n-1}  $$ when $$ m_{n}=0 $$ and $$ m_{n-1}=1 $$, total discounted indirect wins from from the games played at $$ t_{n-2} $$ , $$t_{n-1} $$  and $$t_{n}  $$ when $$ m_{n}=1 $$ and $$ m_{n-1}=1 $$.

In the end equation (1) becomes:

w_{t_{n}} = a_{t_{n}} + e^{-\beta( t_{n} - t_{n-1})} w_{t_{n-1}} ( I + \alpha a_{t_{n}}), $$

l_{t_{n}} = a_{t_{n}}^{T} + e^{-\beta( t_{n} - t_{n-1})} w_{t_{n-1}} ( I + \alpha a_{t_{n}}^{T}), $$

The dynamic scoring at time $$ t_{n} $$ is: $$ s_{t_{n}} = w_{t_{n}} - l_{t_{n}}. $$

According to the authors' results, the dynamic scoring system replicates the official ATP rankings, displaying a higher accuracy in predictions.

All in all there are certain advantages that network-based rating systems entail, making use of intuitive arguments, being more comprehensive and computationally less costly. Moreover these models can easily replicate the rankings of other statistical rating models such as ELO rating and Bradley-Terry model or other expert-based ratings such as BCS standings.