User:MJT027/sandbox

= Expected Goals = In association football, expected goals (xG) is a performance metric used to evaluate football team and player performance. It represents the probability of a scoring opportunity that may result in a goal considering the number and type of chances they had in a match. This helps evaluate players, especially strikers, in the number of goals they score season upon season, and can show if a team or player is outperforming or underperforming their xG value.

An xG metric uses the scale between 0 and 1 per shot in which the closer a shot’s xG to 1, the higher the probability of scoring. The xG method is a valuable tool in predicting players and team’s goal scoring and conceding probabilities

History
The term Expected Goals was first mentioned in Vic Barnett and his colleague Sarah Hilditch's 1993 paper that investigated the effects of artificial pitch (AP) surfaces on home team performance in association football in England. They observed for the AP group about 0.15 more goals per home match than expected and, allowing for the lower than expected goals against in home matches, an excess goal difference (for home matches) of about 0.31 goals per home match. Over a season this yields about 3 more goals for, an improved goal difference of about 6 goals. Jake Ensum, Richard Pollard and Samuel Taylor (2004) reported their study of data from 37 matches in the 2002 World Cup in which 930 shots and 93 goals were recorded. Their research sought "to investigate and quantify 12 factors that might affect the success of a shot". Their logistic regression identified five factors that had a significant effect on determining the success of a kicked shot: distance from the goal; angle from the goal; whether or not the player taking the shot was at least 1 m away from the nearest defender; whether or not the shot was immediately preceded by a cross; and the number of outfield players between the shot-taker and goal. They concluded "the calculation of shot probabilities allows a greater depth of analysis of shooting opportunities in comparison to recording only the number of shots".

In 2009, Howard Hamilton discussed what he believes is a useful statistic in soccer that will ultimately contribute to what he would call an 'expected goal value' — for any action on the field in the course of a game, the probability that said action will create a goal".

In 2011 Sarah Rudd discussed probable goal scoring patterns (P(Goal)) in her use of Markov Chains for tactical analysis (including the proximity of defenders) from 123 games in the 2010-2011 English Premier League season. In a video presentation of her paper at the 2011 New England Symposium of Statistics in Sport, Sarah reported her use of analysis methods to compare "expected goals" with actual goals and her process of applying weightings to incremental actions for P(goal) outcomes.

In April 2012, Sam Green, an Advanced Data Analyst from sport statistics company Opta first explained his innovative approach to assessing the performance of Premier League goal scorers, inspired by similar models being used in American sports. However, it was not until the beginning of the 2017/18 season when BBC’s Match of The Day debut their use of xG by their popular football pundits to make xG a focal topic of conversation by many football fans. The Opta team analyzed more than 300,000 shots and a number of different variables using Opta’s on-ball event data, such as angle of the shot, assist type, shot location, the in-game situation, the proximity of opposition defenders and distance from goal. Their xG model was designed to return an xG value for each player, team or chance depending on the dimension that the data is being analyzed in: a full season, a particular match, a specific half in a game or group of goal attempts Opta aslo took xG a step further and assessed the impact the player had to a specific chance using their shot quality. They did so by factoring into the xG calculation the propensity to hit the target a shot taken by the player has and then comparing the former xG(Overall) value against this new xG(On Target) one. The exact model with all the factors considered by Opta has not been made public yet.

Overview
Just over 2.5 goals are scored on average per football game, therefore, the historical number of goals does not provide a large enough sample to predict the outcome of a match. This means that shots on target and total number of shots are being used as the next closest stats to predict number of goals. There is not one specific model to calculate xG. So, when looking at xG it is important to consider that the xG value would depend on the factors that the analyst creating the xG model wants to incorporate in the calculations. There are lots of variables are used by the data-and-analytics companies to calculate xG in a soccer game. From passing types, game situations, parts of the body used to make a shot, even data-proven history can be used to count the quality of chances.

Some of the factors that are looked at when calculating xG are the number of defenders around the attacker, the distance of defenders, the type of pass to the attacker, and the positioning of the goalkeeper. Along with this, different shot-taking situations are taken into account such as shot attempts from open play, penalties, free-kicks (direct or indirect), corners, and rebounds (from a save by the goalkeeper, hitting the woodwork, etc.).

Though there is no one way to calculate xG, the simplest model of calculating xG is xG=0.10×shots, in which the closer a shot’s xG to 1, the higher the probability of scoring; vice versa.

= Original Article: =

Expected Goals
In association football, expected goals (xG) is a performance metric used to evaluate football team and player performance. It can be used to represent the probability of a scoring opportunity that may result in a goal.

Metric
There is some debate about the origin of the term Expected Goals. Vic Barnett and his colleague Sarah Hilditch referred to "expected goals" in their 1993 paper that investigated the effects of artificial pitch (AP) surfaces on home team performance in association football in England. Their paper included this observation:"Quantitatively we find for the AP group about 0.15 more goals per home match than expected and, allowing for the lower than expected goals against in home matches, an excess goal difference (for home matches) of about 0.31 goals per home match. Over a season this yields about 3 more goals for, an improved goal difference of about 6 goals. ☃☃"Jake Ensum, Richard Pollard and Samuel Taylor (2004) reported their study of data from 37 matches in the 2002 World Cup in which 930 shots and 93 goals were recorded. Their research sought "to investigate and quantify 12 factors that might affect the success of a shot". Their logistic regression identified five factors that had a significant effect on determining the success of a kicked shot: distance from the goal; angle from the goal; whether or not the player taking the shot was at least 1 m away from the nearest defender; whether or not the shot was immediately preceded by a cross; and the number of outfield players between the shot-taker and goal. They concluded "the calculation of shot probabilities allows a greater depth of analysis of shooting opportunities in comparison to recording only the number of shots". In a subsequent paper (2004), Richard, Jake and Samuel combined data from the 1986 and 2002 World Cup competitions to identify three significant factors that determined the success of a kicked shot: distance from the goal; angle from the goal; and whether or not the player taking the shot was at least 1 m away from the nearest defender.

In 2004, Alan Ryder shared a methodology for the study of the quality of an ice hockey shot at goal. His discussion started with this sentence “Not all shots on goal are created equal”. Alan's model for the measurement of shot quality was: Alan concluded: "The model to get to expected goals given the shot quality factors is simply based on the data. There are no meaningful assumptions made. The analytic methods are the classics from statistics and actuarial science. The results are therefore very credible." In 2007, Alan issued a product recall notice for his shot quality model. He presented “a cautionary note on the calculation of shot quality” and pointed to “data quality problems with the measurement of the quality of a hockey team’s shots taken and allowed”.
 * Collect the data and analyze goal probabilities for each shooting circumstance
 * Build a model of goal probabilities that relies on the measured circumstance
 * For each shot, determine its goal probability
 * Expected Goals: EG = the sum of the goal probabilities for each shot
 * Neutralize the variation in shots on goal by calculating Normalized Expected Goals
 * Shot Quality Against

He reported: "I have been worried that there is a systemic bias in the data. Random errors don’t concern me. They even out over large volumes of data. But I do think that ... the scoring in certain rinks has a bias towards longer or shorter shots, the most dominant factor in a shot quality model. And I set out to investigate that possibility.☃☃"Howard Hamilton (2009) proposed "a useful statistic in soccer" that "will ultimately contribute to what I call an 'expected goal value' — for any action on the field in the course of a game, the probability that said action will create a goal".

Sander Itjsma (2011) discussed "a method to assign different value to different chances created during a football match" and in doing so concluded: (link does not work)"we now have a system in place in order to estimate the overall value of the chances created by either team during the match. Knowing how many goals a team is expected to score from its chances is of much more value than just knowing how many attempts to score a goal were made. Other applications of this method of evaluation would be to distinguish a lack of quality attempts created from a finishing problem or to evaluate defensive and goalkeeping performances. And a third option would be to plot the balance of play during the match in terms of the quality of chances created in order to graphically represent how the balance of play evolved during the match.☃☃"Sarah Rudd (2011) discussed probable goal scoring patterns (P(Goal)) in her use of Markov Chains for tactical analysis (including the proximity of defenders) from 123 games in the 2010-2011 English Premier League season. In a video presentation of her paper at the 2011 New England Symposium of Statistics in Sport, Sarah reported her use of analysis methods to compare "expected goals" with actual goals and her process of applying weightings to incremental actions for P(goal) outcomes.

The term 'expected goals' appeared in a paper about ice hockey performance presented by Brian Macdonald at the MIT Sloan Sports Analytics Conference in 2012. Brian's method for calculating expected goals was reported in the paper: We used data from the last four full NHL seasons. For each team, the season was split into two halves. Since midseason trades and injuries can have an impact on a team’s performance, we did not use statistics from the first half of the season to predict goals in the second half. Instead, we split the season into odd and even games, and used statistics from odd games to predict goals in even games. Data from 2007-08, 2008-09, and 2009-10 was used as the training data to estimate the parameters in the model, and data from the entire 2010-11 was set aside for validating the model. The model was also validated using 10-fold cross-validation. Mean squared error (MSE) of actual goals and predicted goals was our choice for measuring the performance of our models.☃☃

(Make a separate section on hockey expected goals?) In April 2012, Sam Green wrote about 'expected goals' in his assessment of Premier League goalscorers. He asked "So how do we quantify which areas of the pitch are the most likely to result in a goal and therefore, which shots have the highest probability of resulting in a goal?". He added: "If we can establish this metric, we can then accurately and effectively increase our chances of scoring and therefore winning matches. Similarly, we can use this data from a defensive perspective to limit the better chances by defending key areas of the pitch." Sam proposed a model to determine "a shot's probability of being on target and/or scored". With this model "we can look at each player's shots and tally up the probability of each of them being a goal to give an expected goal (xG) value".☃☃ (More Info on this study in a different article)