User:Asakhuja/sandbox

Discussion: Options for Embeddings
Embeddings are flexible and there are many options for using them in sports. We have discussed different options for embeddings (see Mike's summary in this directory). This page aims to provide a brief summary. There are usually two version of a model: instantaneous: apply embeddings only to a current match state, and historical: condition on the observation history.

Available Statistical Information
Our basic idea are event logs of the form $$ x_{t},a_{t},player_{t} $$

with integer time stamps $$ _{t} $$. The local features $$ x_{t}$$ include things like game time, $x-y$ location, score differential, manpower etc. At time $t$ action $a_t$ is taken by $\mathit{player}_t$.

Focusing on player $i$, the data for this player is given by

$\mathbf{x}_t,a_t,\mathit{player}_t=i.$


 * 1) Discriminative Embeddings


 * 1) Current-Time Version

A player embedding can be used to predict the current state-features and action:

$P(\mathbf{x}_{t},a_{t}|\mathit{player}_t=i).$

This can be learned using [skip-gram techniques](https://bitbucket.org/sportlogiqteam/spg-sfu-waterloo-project/src/master/foundations/README.md). It seems similar to [Michael's 'causal model'](https://bitbucket.org/sportlogiqteam/spg-sfu-waterloo-project/src/master/embeddings/VI%20Player%20Embeddings%20Summary.pdf). The equation can be decomposed into quantities of interest for sports:

$P(\mathbf{x}_{t},a_{t}|\mathit{player}_t=i) = $

$P(\mathbf{x}_{t}| \mathit{player}_t=i) \times $

$P(a_t|\mathbf{x}_{t},\mathit{player}_{t}=i)$

The second term can be interpreted as the player's policy, and the first as the probability of finding player $i$ in a specific location in state-feature space. Thus the first arguably captures a player's style or role.


 * 1) Historical Version

A [marked point process model](https://bitbucket.org/sportlogiqteam/spg-sfu-waterloo-project/src/master/foundations/README.md) aims to predict the next observation given the history so far:

$P(\mathbf{x}_{t},a_{t}|\mathbf{x}_{< t},a_{<t}).$

Adding the information that player $i$ acts at time $t$, we can consider a conditional model

$P(\mathbf{x}_{t},a_{t}|\mathbf{x}_{< t},a_{<t}, \mathit{player}_t=i).$

Reasons why this quantity is of interest include the following.

1. We have proven a theorem for our goal impact metric that says that the impact metric is equivalent to the expected value for a team that results when we replace the conditional probability for the average player with that for a specific player (i.e. replace the first equation with the second).

2. As before, the equation can be decomposed into quantities of interest for sports: $P(\mathbf{x}_{t},a_{t}|\mathbf{x}_{< t},a_{<t}, \mathit{player}_t=i) = $

$P(\mathbf{x}_{t}|\mathbf{x}_{< t},a_{<t}, \mathit{player}_t=i) \times $

$P(a_t|\mathbf{x}_{\leq t},\mathit{player}_{t}=i,a_{<t})$

The second term can be interpreted as the player's policy, and the first as the probability of finding player $i$ in a specific location in state-feature space, both as functions of the match history.


 * 1) Conditional VAE

A generative model defines a distribution over players. One possibility champoined by Galen is to use a conditional VAE (see [tutorial](https://arxiv.org/abs/1606.05908)) to model

$P(\mathit{player}_t=i|\mathbf{x}_{t},a_{t})$

A conditional VAE produces an encoding $q(\mathit{player_t}=i|\mathbf{x}_{t},a_{t})$ which depends on the observed features. Therefore the *conditional VAE does not produce a single embedding for each player*.

The history version is especially interesting if we include other players in the history:

$P(\mathit{player}_t=i|\mathbf{x}_{\leq t},a_{\leq t},\mathit{player}_{\leq t})$,

because then it captures interactions among players. For example the model would represent the information usually visualized in a *passing graph.*


 * 1) Generative VAE: Point Process Model

A variational auto-encoder could produce a code $z_t$,  that generates the next observation:

$ P(\mathbf{x}_{t},a_{t},\mathit{player}_t|z_t)$

Several models of this type assume independence of the modelled components like Mikes VI Model, e.g.

$ P(\mathbf{x}_{t},a_{t},\mathit{player}_t|z_t) = P(\mathbf{x}_{t},a_{t}|z_t) \times P(\mathit{player}_t|z_t)$

The encoding distribution provides a joint embedding of the three components $\mathbf{x}_{t},a_{t},\mathit{player}_t$ (see [general explanation](https://bitbucket.org/sportlogiqteam/spg-sfu-waterloo-project/src/master/embeddings/README.md)). Mike's model VI proposes decomposing the encoding distribution to get a component for players only:

$ Q(z_t|\mathbf{x}_{t},a_{t},\mathit{player}_t) = Q(z_t|\mathbf{x}_{t},a_{t}) \times Q(z_t|\mathit{player}_t)$.

In the history version, the code can also depend on the previous history

$ Q(z_t|\mathbf{x}_{t},a_{t},\mathit{player}_t,\mathbf{x}_{\leq t},a_{\leq t},)$

If we assume conditional independence and allow the code to depend on the history, we have an extension of the variational auto-encoder model for point processes described [here](https://www.borealisai.com/en/publications/variational-auto-encoder-model-stochastic-point-process/)  from CVPR 2019. This would be a powerful and exciting extension of this CVPR paper.


 * 1) Questions

- How does Mike's model capture correlations between state and player? - Is it really necessary to embed states and actions if all we want is player embeddings?

${x}_t$