User:Thomas Arao/sandbox

Intro
Sports analytics are a collection of relevant, historical, statistics that can provide a competitive advantage to a team or individual. Through the collection and analyzation of these data, sports analytics inform players, coaches and other staff in order to facilitate decision making both during and prior to sporting events. The term "sports analytics" was popularized in mainstream sports culture following the release of the 2011 film, Moneyball, in which Oakland Athletics General Manager Billy Beane (played by Brad Pitt) relies heavily on the use of analytics to build a competitive team on a minimal budget.

There are two key aspects of sports analytics — on-field and off-field analytics. On-field analytics deals with improving the on-field performance of teams and players. It digs deep into aspects such as game tactics and player fitness. Off-field analytics deals with the business side of sports. Off-field analytics focuses on helping a sport organization or body surface patterns and insights through data that would help increase ticket and merchandise sales, improve fan engagement, etc. Off-field analytics essentially uses data to help rightsholders take decisions that would lead to higher growth and increased profitability.

As technology has advanced over the last number of years data collection has become more in-depth and can be conducted with relative ease. Advancements in data collection have allowed for sports analytics to grow as well, leading to the development of advanced statistics as well sport specific technologies that allow for things like game simulations to be conducted by teams prior to play, improve fan acquisition and marketing strategies, and even understand the impact of sponsorship on each team as well as its fans. Analytics is becoming more and more prominent in sports as time goes on. While late to the curve, individual sports such as tennis and boxing have been harder to quantify, they have begun implementing analytics into their pregame strategy. Team sports such as baseball and football have reached the point where the vast majority of their strategy relies heavily on analytics. As time goes on, individual sports will catch up to where team sports are now, analytics wise, and team sports will continue to evolve.

Another significant impact sports analytics have had on professional sports is in relation to sport gambling. In depth sports analytics have taken sports gambling to new levels, whether it be fantasy sports leagues or nightly wagers, bettors now have more information at their disposal to help aid decision making. A number of companies and webpages have been developed to help provide fans with up to the minute information for their betting needs.

Major League Baseball (MLB)
The MLB has set the benchmark in sports analytics for a number of years, with some of the game's brightest minds having never stepped foot into the heat of a major or minor league baseball game. Theo Epstein of the Chicago Cubs is one of those minds who has never suited up in a professional baseball game; instead Epstein relies on his Yale University education and the numbers behind the game to make many of his decisions. Epstein, known for his role in ending two of baseball's most historic streaks (the Boston Red Sox curse of the Great Bambino in 2004, and as recently as the 2016 World Series, helping end the 108-year drought between World Series wins for the Chicago Cubs), is a member of a growing community in major league baseball who do not rely on years of major league playing experience. This community has been able to grow thanks to the in depth collection of statistics that has existed in baseball for decades. With analytics being relatively common in the MLB, there are a breadth of statistics that have become vital in the analysis of the game, which include:


 * Batting average is one of the most commonly discussed statistics in baseball. A players batting average is determined by dividing hits by the number of at bats that players have. The use of statistics also provides players with different pitches they struggle with at the plate, it shows their tendencies and which pitch usually strikes them out.
 * On-base percentage is the percentage of times a player reaches base on either a hit, walk, or by being hit by a pitch. This is a significant offensive stat as it looks beyond hits and more importantly illustrates how often a batter can avoid being put out at the plate. This is a more in depth offensive statistic than batting average as it takes into account walks and being hit by a pitch, both of which are indicators of how a player handles an at bat. Sabermetrics can help change a player's approach in order to raise their own base percentage increasing productivity and ultimately their overall worth as a player.
 * Slugging average is the calculation that determines the number of bases a player earns on hits. To determine this stat, the number of bases earned is divided by the number of at bats. This is a good measure for measuring a batters power as the higher their slugging average is, the more likely they are to hit for extra bases (i.e. a double, triple or homerun). For sluggers, analytics can help them improve decision making at the plate and look for their pitch. Now, hitters can study the tendencies of the pitchers they are going to face therefore familiarizing themselves before they are up to bat.
 * WHIP stands for Walks plus Hits allowed per Inning Pitched and tends to be viewed as a strong way to measure the success of a pitcher as it illustrates how many baserunners the pitcher allows on both hits and walks. This is also a proven method for looking at a pitcher's efficiency. Now, pitchers can study the upcoming lineup they are going to face and focus on tendencies of the batters. Like where they stand on the plate, what pitches they tend to chase, and what part of the field they like to hit.
 * Defensive Efficiency Ratio (DER) is a team defense statistic that finds the rate of times batters reach base on the balls put into play. Ultimately, this statistic will show how likely a defense will turn a ball into a out when the ball is put into play.
 * Ultimate Zone Rating (UZR) measures how many runs a player/defensemen will save. Its a measure to evaluate a player's defensive performance and takes into account errors, range, outfield arm and double play ability.
 * Runs Created measures how many total runs a offensive/batter will contribute. The measure will take the players opportunities as a denominator and take players ability to get on base and hit for extra base hits as a numerator.
 * Pitches Per Plate Appearance is a fairly simple measure that quantifies how many pitches a batter will take on average during an at-bat.
 * Win Probability Added (WPA) is a measure to determine the percent change in chance of winning from one event to the next. This measure practically measures how important or significant a plate appear was for a team. For example, a 9th inning home run to tie the game is much more impactful than a home run in a blowout game.
 * WAR stands for Wins Above Replacement and tends to be viewed as a strong way to measure a player's value in terms of a specific number of wins. It is one of the more commonly used advanced measures as players from all positions can be compared against each other. Ultimately, the number of wins is determined on how many more wins he is worth more than a replacement level player at the same position.
 * True Earned Run Average is a variation of the classic earned run average measure which quantifies a pitcher's performance based on what he can control. The measure factors in home runs, strikeouts, walks, HBPs, and accounts for batted-ball tendencies.
 * Run Support per Nine Innings is a measure to determine how many runs on average his/her team will score while that certain pitcher is in the game. Only the runs scored while he/she is in the game will be recorded for the statistic.
 * Ballpark Factor is a way of determining how favorable a stadium might be for pitchers or batters. The players and teams involved will not affect the measure because teams play games in other stadiums as well. Ultimately, it is a measure to determine how easy it is to score in the ballpark.
 * Pythagorean Winning Percentage is a statistic that will help determine is a team has been overachieving or underachieving. One way this measure is used is if a team has had a really hard or easy schedule early in the season, it can help to more accurately determine the teams expected winning percentage for the remainder of the season. This was a measure created by statistician Bill James and was originally used to determine how many games a team should have won in a given year.

National Hockey League (NHL)
The NHL has kept statistics since its inception, yet it is a relatively new adopter of analytics-based decision making. The Toronto Maple Leafs were the first team in the NHL to hire a member of management with a largely analytical background when they hired assistant general manager, Kyle Dubas, in 2014. Dubas, similar to Theo Epstein in the MLB, has never suited up in a professional game and relies on the numbers generated by players on a nightly basis both now and in the past to make decisions. Gone are the days where coaches and staff rely on gut feelings to make their strategic decisions. Current basic descriptive models that are used to measure advanced statistics are Corsi, Fenwick, PDO, and more complicated models such as xGF (expected goals for), and relative models such as CorsiRel, use linear and multiple regression techniques in order to chart scoring chance and possession metrics relative to both team-mates and the entire league in short, medium, and long term sample sizes. Four metrics of the Fenwick metric are Fenwick for at even strength shots + misses (FF), Fenwick against at even strength shots + misses (FA), Fenwick for percentage at even strength (FF%), and relative Fenwick for percentage (FF% rel). Thomas Chabot of the Ottawa Senators leads the entire NHL in both FF and FA.


 * The Corsi statistic is an advanced statistic that has been widely adopted throughout the NHL, as teams, fans and media alike rely on the Corsi statistic to track shot attempt differential. Corsi has been recognized as the most informative single statistic in the game of hockey as it can provide insight into both the offensive and defensive play of a team as well as the amount of time a team has possession of the puck.
 * Fenwick is basically the same idea as Corsi, however, it does not count the blocked shots in the stat. Only including shots on goal and shots wide gives credit the idea that blocked shots are intentional and could be a part of a coach’s system. Many of the ideas of Corsi apply to Fenwick. Fenwick for percentage (FF%), Fenwick plus-minus (F± or F+/-). Both Corsi and Fenwick are ways to see how a player performs compared to his teammates.
 * PDO is not a acronym for anything, but is just PDO. It is a statistic to measure luck whether it be player's luck or team's luck. For teams, it is simple as it is just the teams save percentage plus their shooting percentage. For individual players, it is on-ice shooting and save percentage. The idea behind the statistic is that every team and player should average out to 100.00 by the end of the season, but if it doesn't it suggests that a team/player was lucky or unlucky.
 * Zone Starts is a way of determining how many shifts a player will begin in the offensive zone v. the defensive zone. This is used to determine the usage of a player and how it may affect that player. It plays into the assumption that a player beginning in the offensive zone more often will be expected to have higher numbers in statistics like Corsi.

Professional Golf Association (PGA) Tour
The PGA Tour collects vast amounts of data throughout the season. These statistics track each shot a player takes in tournament play, collecting information on how far the ball travels and exactly where each shot is played from and where it finishes. These data have been used for a number of years by players and their coaches during practice sessions as well as during tournament preparation, highlighting the areas in which that player needs to improve before teeing it up in tournament play. Average performance of PGA TOUR players was used as the benchmark to compare performance, and strokes gained then used to explain the contributions of each shot to the total score. Several studies have looked at sequential variance of consecutive golf scores across both holes and rounds. 6–8 Round scores showed relatively weak correlations to scores on consecutive rounds, when external influences on performance were considered. Scores between successive holes also showed weak correlations when external influences like par and difficulty were considered (weather conditions, course setup, etc.). Other than the obvious fact that good players tend to shoot low scores, and poor players tend to shoot high scores, the results suggested performance in golf is not subject to ‘streakiness’ and performance on individual shots. Some current metrics used by the PGA are driving percentage accuracy, greens in regulation percentage, and scrambling (miss green in regulation but still make par or better)


 * Shotlink data collection has revolutionized the way that data is collected in the game of golf. Introduced on a full-time basis in 2003, Shotlink relies on a number of strategically placed on-course laser rangefinders and cameras to collect precise data from every shot that is struck on the PGA Tour. With these data, players are able to see the areas of their game that need improving, and on a broader year-to-year basis, players can review course statistics from previous years to allow for relevant tournament preparation. On top of the year-to-year stats provided players and fans can also easily access these statistics at an up to the minute rate, giving these data an extremely high velocity. Shotlink has also made its mark on the world of golf course design as designers have constant access to up to the minute statistics of professional golfers, allowing for these designers to create courses that can provide a challenge for the world's best players
 * Driving Accuracy Percentage is a measurement of how frequently the golfer's balls end up on the fairway after their tee shot on each hole that they play. While the correlation isn't entirely direct, the higher that a golfer's driving accuracy percentage is, the higher they place overall in the tournament because it is significantly easier to hit your second shot from the middle of the fairway than deep in the rough. While the elite outliers will consistently place at the top regardless of their DAP, this statistic can be helpful to differentiate the middle of the pack.
 * Green in Regulation Percentage is a measure of how frequently a golfer is able to reach the green on a hole within the allotted regulation amount of strokes. A ball is considered to be on the green in regulation if any portion of the ball is touching the green after the GIR stroke has been taken. The GIR stroke is taken by subtracting 2 from the par of the hole. Ex: 3 on par 5, 2 on par 4, 1 on par 3.
 * Scoring Average is the weighted scoring average which takes the stroke average of the field into account. It is computed by adding a players total strokes to an adjustment and dividing by the total rounds played. The adjustment is computed by determining the stroke average of the field for each round played. The average is subtracted from the par to create an adjustment for each round. A player accumulates these adjustments for each round played.

National Football League (NFL)
The NFL benefits from being a stop and start game, so a new set of data can essentially be taken from every snap. There are advanced metrics used for offense, defense, and special teams. Given the combination of being a team game with stop and start play, the NFL is a sport that has vast amounts of analytics that can be interpreted to give teams a competitive advantage in a given scenario. It is extremely beneficial when approaching a play to have an idea of what your opponent is going to do and what you can do to give yourself the greatest chance of success. While there are countless variations of analytics that can be broken down, here are some of examples of the main ones used:


 * Expected Rushing Yards uses the performance of the individual ball carrier, contributions from the offensive line, scheme and situation to help quantify how many yards a team can expect per rush on a given play. For example, Nick Chubb ran for an 88 yard touch down last year, and there was a less than 1% chance of that occurring. However, there was a 12.4% chance that he'd gain a first down and a 52.1% chance that he would gain 6+ yards
 * Expected Yards after Catch uses the same model as expected rushing yards. The EYAC model (combined with Completion Probability) will also have the ability to estimate outcome probabilities like first downs and touchdowns, in addition to a single point estimate
 * Route Recognition takes conventional stats such as receptions and receiving yards and advanced stats like depth of target, separation window, and completion probability to determine which routes are most effective. This provides greater insight, and leaves the question, which route did the pass catcher run to get open before catching the ball
 * Live Win Probability evaluates the likelihood of either team winning at any moment between plays in the game. The model, trained on every historical play in the last 10 seasons, looks at the score differential, down-and-distance, time remaining, timeouts remaining, expected points and team quality.
 * Next Gen Stats is a team of analysts and statisticians that are constantly developing new advanced metrics for the sport of football. They are a group that is enhancing our ability to analyze the game of football. This ultimately helps is data storytelling for players, teams, and the league. They are one of the leaders in football analytics and they will continue to improve and enhance the way we use analytics in the sport of football.
 * Pro Football Focus (PFF) is another leading group in football analytics. They are most notably known for their PFF player grades which evaluates every player on every play during a football game. The grading system focuses on "production" rather than traits or measurable. One way to describe it would be a player's "contribution to production" on a given play. Grading players is not strictly based on statistics like yardage and number of catches. Statistics can be indicative of performance, but don't tell the whole story. For example, not all sacks are equal. Some sacks come at opportune times, some are because of a missed assignment, and some sacks players need to fight through double teams. PFF grades are used to evaluate how good a player performs relative to other players. This helps in comparing players against each other. These grades are used to negotiate contracts, negotiate trades, evaluate team/player performance, and much more. Some say this may be the future of football analytics as it is widely used currently and also is something that will continue to improve over time.

National Basketball Association (NBA)
Much like the NFL and the MLB, the NBA benefits from having a stop and start style of play. There is essentially a 'reset' after each basket, foul, or out of bounds, which allows the following play to be analyzed from a clean slate. The use of data analytics in the NBA allows teams to design winning strategies, predict and avoid player injury, and more efficiently scout up and coming talent. Entire teams can be measured, or it can be on a player by player case, from metrics that range from fantasy points scored for fan duel to strength of schedule for a team. . Use of data analytics has definitely changed NBA games, but until data models are perfect, analysts should consider other factors when making decisions especially those that involve human psychology


 * Approximate Value (AV) is a metric that estimates a players value by distinguishing easily between very good seasons, average season, and poor seasons. The metric takes points, rebounds, assists, steaks, blocks, field goals misses, free throws misses, and turnovers into account.
 * Box Plus-Minus (BPM) is a way of evaluating a player contribution and quality to their team. It uses play-by-play regression to estimate a players performance relative to the NBA average. The inputs come from box score states from the team and individual level.
 * Diamond Rating is a metric created by Kevin Broom that works with any per-minute statistic. It takes a player's rating per game and subtracts that from the players rating per 40 minutes to determine how much the players per game stats undervalue his potential contributions. This is mainly used to look at players who are not playing a ton of minutes to see if they are undervalued and should play more minutes.
 * Trade Value was invented by Bill James and is a pretty simple metric. It estimates trade value by factoring in a player's age and his approximate value to determine the value of the player for the remainder of his career.
 * True Shooting Percentage (TS%) is a team or player metric which considers the efficiency on all types of shots as well as, performance at the free-throw line. It is one of the more commonly used statistics in the NBA.
 * Wins Above Replacement Player (WARP) was invented by Kevin Pelton who borrowed the approach from Sabermetrics and built off of the work of Dean Oliver. This metric is used to see how well a player is playing over a replacement player using wins as the result. Wins is a measure that is easy to understand and something that is of value to teams. WARP fails to account to any contributions outside the box score.
 * Relative Percentage Index (RPI) is commonly used to produce power ratings. This rating system only considers whether a team won or lost and not the margin of victory or how well a team played. It practically measures a team's strength of schedule and how well a team does against that schedule. It is calculated by weighting a team's winning percentage(25%), winning percentage of opponents(50%), and winning percentage of their opponent's opponents(25%).

History
Many statisticians attribute the popularization of sports analytics to current Oakland Athletics General Manager, Billy Beane. Strapped with a minimalist budget, Beane relied on sabermetrics, a form of sports analytics, to evaluate players and make personnel decisions. Understanding the importance of getting runners on base, Beane focussed on acquiring players with a high on base percentage with the logic that teams with a higher on base percentage are more likely to score runs. He was also able to achieve success on a shoestring budget by acquiring overlooked starting pitchers, often getting them for a fraction of the price that a big name pitcher may require. When Beane's Athletics began to achieve success, other major league teams took notice. The second team to adopt a similar approach was the Boston Red Sox, who in 2003 made Theo Epstein the interim general manager. Epstein, who remains the youngest general manager to ever be hired in the MLB, came into the position with zero professional playing experience, highly irregular at the time. Using a similar approach to that of Billy Beane, Epstein was able to form a Boston Red Sox team that in 2004, won the organization's first World Series in 86 years, breaking the alleged Curse of the Bambino. Many experts attribute some of Epstein's success to Boston Red Sox owner, John W. Henry, who achieved significant success in the investments industry by using data-based decision making. As owner, Henry provided Epstein with significant leeway when it came to data-based decision making and the use of sabermetrics, as he knew the impact that such tools can have in achieving success in both sports and business. Since his success in Boston, Epstein has moved on to Chicago, where in 2016 he led the Chicago Cubs to their first World Series title in 108 years. With both Beane and Epstein still leading successful MLB clubs, it is easy to see the longevity that is associated with an analytical approach to managing teams. More recently, teams like the Houston Rockets of the NBA have put a heavy focus on analytics to dictate front office and on-court decisions. Daryl Morey, the General Manager of the Rockets decided to emphasize three point shots and used analytics to support his argument. As a result, the Rockets began shooting many more three-point shots and even traded their budding big man, Clint Capela.

The success of analytic based strategies and decision making in baseball was noted by executives in other professional sports leagues. Today, you would be hard pressed to find any professional organization who does not have at least one analytical expert on staff, let alone an entire department dedicated to analytics.

Gambling
Sports analytics have had significant impact on the field of play but sports analytics have also contributed to the growing industry of sports gambling, which accounts for approximately 13% of the global gambling industry. Valued somewhere between $700-$1,000 billion, sports gambling is extremely popular among groups of all kinds, from avid sports fans to recreational gamblers, you would be hard pressed to find a professional sporting event with nothing riding on the results. Many gamblers are attracted to sports gambling because of the plethora of information and analytics that are at their disposal when making decisions. One gambler, Bob Stoll, has been ahead of the analytics curve for a number of years, successfully betting against the line 56% (575–453) of the time in college football, a significant rate as a winning percentage above 52.4% is considered profitable. With the number of statistics so openly available to fans, Stoll combines a number of different statistics such as, home and away records, record vs divisional/non-divisional teams, rush yards per rush, etc., to make educated picks that have paid off more than half of the time.

Results from academic research show evidence that Twitter contains enough information to be useful for predicting outcomes in football games.

With the popularity of sports gambling came the development of a number of sports betting services. "Sports betting services are provided by companies such as William Hill, Ladbrokes, bet365, bwin, Paddy Power, betfair, Unibet and many more through their websites and in many cases betting shops. In 2012, William Hill generated around 2 billion U.S. dollars in revenue with about 30 billion U.S. dollars in total being staked / wagered with the company."

Specifically regarding some metrics that go into sports betting data analytics algorithms:


 * Team/Player Stats: Use stats such as points for and points against and weigh them to help determine spreads and money lines
 * Team/Player Performance: Tracks how well teams have performed against the spread and odds that have been set against them
 * Betting data: The cumulative data that is put together that contains all information of how bettors bet on a given night, typically against a team, or odds that people are most likely to bet on.

Matchup grades, line analysis, and odds tracking are all important factors when using analytics to place a bet. However, there are many different sub-categories that go into these decisions:


 * Best Bets: The optimal bets that can be placed that will yield the highest profit given a reasonable amount of risk.
 * Predicted Performance: Estimate of the outcome of what is about to take place given the matchup
 * Line Analysis: Used to identify lines that would be considered non-favorable for bettors to take, whether it be because there is too much risk or not enough payout
 * Public Money: Tracks what the general population is trending towards in a given contest
 * Odds Tracking: Not all lines are the exact same on each book, even if it is the same game. It is important for a bettor to find the odds that are most favorable for them so they can turn the largest profit should the bet hit
 * Futures Trends: Seeing how teams and sports trend betting wise over the long term as opposed on a nightly basis. Ex: If a certain team, regardless of their record has covered the spread in the last X amount of contests.
 * Bet Optimizers: Used to determined bets where it can be worth it to place a parlay or a teaser as opposed to a straight bet
 * Pro Bettor Report: A report that shows where "professional" gambler are placing their money
 * Daily Fantasy Sports Tools: Not all sports bets are on sports games. Many books, such as draft kings use fantasy line ups. This tracks who is best to play or sit on a given night to help your chances of winning.