Popular Posts

Wednesday, November 9, 2011

How to Succeed in the EPL: Chances Created and Chance Conversion

A common statistic that many people have begun to value and notice a lot recently is the chances created statistic. Chances created, according to Opta's website, is defined as "assists plus Key passes" where a Key Pass is "the final pass or pass-cum-shot leading to the recipient of the ball having an attempt at goal without scoring" (Opta is a company that tracks and generates a ton of data in soccer). So basically, any pass that leads to a shot is considered a chance created.
Swansea's Mark Gower is a perfect example
of a player highlighted by the chances
created statistic.

Chances Created
The appeal of this measure is that it can value players that play on weaker teams better than assists do. For a player on a weaker team, it is harder to record assists since they are playing with teammates that are less likely to score. Chances created is a fairer statistic because it does not value the strength of your teammates as much. Overall, it can highlight creative players that are often overlooked because they are on weaker teams and do not have as many assists.

Do Chances Created Actually Matter?
With all this in mind, I was curious to find the actual worth of the chances created statistic. One way to measure this is to look at how chances created and wins are correlated. To make it a little easier, I looked at the relationship between goals scored and chances created for EPL teams. In other words, do teams that have more chances created score more? Do teams with less chances created score less? The answer, in short, is yes, they are correlated. Below is a scatterplot of the relationship. There is a clear positive relationship between chances created and goals in the EPL last season. The coefficient is statistically different than 0 (p=.000), which tells us that there is extremely strong evidence that there is a positive relationship.



Chance Conversion Percentages
This is only half the story though. Some teams get a lot of shots off, but either because they are not good at shooting or are taking shots that have a smaller chance of going in, some of these teams have a low number of goals because they have a poor conversion percentage for shots. The conversion percentage is defined as the goals divided by the total number of shots (excluding blocked shots). Below is a scatterplot similar to the one above, this time with conversion percentages on the x-axis. The conversion rates are rounded to 2 decimal places, hence the bunching. Again, this shows a positive relationship between conversion percentage and goals. Teams with higher conversion rates tend to score more and vice versa. This relationship is also statistically different from 0 (p=.002). A quick note: the product of chances created and conversion rate is very close to the number of goals a club has scored. I'm pretty sure the discrepancy comes from including blocked shots in shots attempted, but not in conversion rates.



EPL 2010-2011, Chances Created and Conversion %
With this in mind, I created a scatterplot of conversion rates and chances created for EPL teams last season. The plot shows that clubs found scoring success in different ways. The Manchester clubs did it by being efficient scorers; they had conversion percentages of .15 and .16. Chelsea and Tottenham were on the other end of the spectrum with higher chances created, but lower conversion percentages (.12 for both). The graphic also shows that West Ham did not struggle because they were not creating chances; they struggled because they had a low conversion percentage (.10). On the other hand, Birmingham struggled because they failed to create enough chances to score, despite a decent conversion percentage of .12.



EPL 2011-2012 thus far, Chances Created and Conversion %
What about this year? Below, I created the same scatterplot as above, this time for the current season. City's dominance is really highlighted. They are leading in both chances created AND conversion percentage, hence the massive number of goals this year. Again, United seems to be scoring because of their high conversion percentage. QPR and United actually have very similar number of chances created, United just finishes their chances with a much higher percentage. Liverpool sticks out because of their high number of chances created, but really low conversion percentage (.09).




Conclusion
The bottom line is that creating chances and conversion rates are the key to understanding goal scoring. A club can succeed with a high conversion rate (United) or by creating a lot of chances (Liverpool). A club can really dominate by doing both well (City). The graphic above can also suggest what kind of players each club needs. For example, Manchester United and Newcastle would benefit by picking up a creative midfielder who creates more chances, and Liverpool and QPR would benefit by picking up a more efficient scorer. The scatterplot also tells us why some clubs struggle. Wigan needs to up their conversion percentage (currently a dismal .06) and Stoke needs to create more chances. City, on the other hand, should just continue to buy all the best players.

All data comes from eplindex.com (@EPLIndex)

Tuesday, October 11, 2011

An Analysis of the Performance of Promoted Clubs

Joey Barton, of newly promoted QPR


An aspect of English football that I love that does not exist in American sports is the promotion/relegation aspect. It makes not just the race for first exciting, but also the race to avoid relegation entertaining. In American sports, last place teams often simply give up, a disappointment for fans. 

I wanted to see exactly how promoted/relegated teams fared throughout the season. Some statistical research has already been done on the subject: Omar Chaudhuri, writer of the 5 Added Minutes blog, looks at conversion rates of promoted teams and their corresponding ability to stay in the top flight here.  In part 1 of this post, I have looked at how promoted teams have done in their first season in the top flight. My original idea was that teams may struggle early in the season to adjust to the higher level of competition, and eventually even out as the weeks go on and the teams adjust. This also puts the performance of QPR, Swansea, and Norwich into perspective with past promoted team's performances. I use data from promoted teams from the 2003/2004 to 2010/2011 season. 

I've created 5 graphs to illustrate the performances of promoted teams. The first one, below, shows how all the promoted clubs' point totals have progressed over the 38 games. On average, promoted teams earn around a point per week. The greenish linear-looking line in the middle is the average. All the other jagged lines are the point totals over the season of promoted clubs. This graph isn't too informative, but is an interesting graphic nonetheless.


The next graph is the same as the one above, but only looks at the three promoted clubs this season in comparison to the average points line and the linear points line. To clarify,  the linear line shows is a line illustrating what would happen if a team earned the same points every week to end up with the average point total for promoted clubs. The average line shows the average points earned each week of the season. These may sound the same at first, but I will show in the next couple of paragraphs that there is an important distinction. Anyways, the graphic below illustrates that all 3 promoted clubs are faring about as well as the average promoted team does. QPR started off a little stronger, but has since returned to the average. Norwich and Swansea both started a little weaker, but have improved to end up just above the average 7 weeks in to the season. All 3 teams have 8 points so far, just above the point per week average of promoted teams.


Another way of looking at the first graph is by looking at points per game of promoted teams. The graph below shows this. Obviously, at first clubs' point per game total is a little spread out. As the season progresses, teams earn an average of 1 point per game, as mentioned above. Some clubs have done a little better, and some a little worse, as evident from the graph.


Next is the graph above, but again looking at the performance of the 3 promoted teams this season. Again, the graph shows that QPR started off the campaign a little stronger, but has since regressed to be even with Norwich and Swansea.


The final and most informative graph shows the cumulative points per game of promoted clubs. This graph answers my question of how promoted teams fare throughout the season. As you can see below, promoted teams seem to struggle up until week 7, where they turn it around and do better than their average point total up until around week 20, where they hover around the point per game mark until the end of the season. There could be a lot of explanations for this trend. Maybe clubs struggle at first, and then adjust to the higher competition? Maybe clubs transfer window acquisitions (think QPR) start to pay off around week 7? It would be tough to tell what the true factors driving the trend are really. However, the graph does highlight the interesting phenomenon. 

I'm still working on doing a similar analysis of clubs that are relegated at the end of the season to analyze how their performance fluctuates throughout the season.

Tuesday, August 30, 2011

Expected Points Added (EPA) Leaders Through Week 3

Below are the Expected Points Added (EPA) leaders for the EPL through week 3. The week 1 leaders can be found in an earlier post here. To reiterate, EPA weights goals based on how important they are to the team's chance of winning the game. This is based on the notion that a go ahead goal in the 90th minute is worth more than the 5th goal in a 5-0 win.


Some interesting things to point out...

  • While Rooney has 5 goals this season, Welbeck's 2 goals have actually been more beneficial to United. In fact, Rooney doesn't even make the top 15 list above considering most of his goals were in the recent Arsenal blowout.
  • Dzeko gets to the top of the list by scoring frequently and in important situations. His average goal weight is a solid .51 expected points added, but just because of the fact that he has scored 6 goals puts him at the top.
  • It's still early in the season. Arteta makes third on the list with only 1 goal (a late game winning goal). Soon we'll start to see the top dominated by players who have scored a lot, and in important situations.

Sunday, August 21, 2011

Expected Points Added (EPA) Data Through EPL Week 1

Before the season I promised to post Expected Points Added (EPA) totals after each week of the season. Here are the EPA totals from week 1. If you don't know what EPA is, check out a full explanation here.

To summarize it very basically, EPA is the total measure of how much each player's goals add to team's expected points total. That is why you see some EPA's of 0 below. These players scored goals that added nothing to the teams expected points total (for example, a team is up 3-0 and is already going to win, and a player scores a 4th in the 90th minute. This does not add to the team's chance of winning technically, because the team is already very likely to win.)

Average Goal Weight (AGW) is just EPA divided by the number of goals a player has scored. This measures how important, on average, a player's goals are. It can show us that a player consistently scores clutch goals (high AGW) or that they are scoring useless goals in blowouts (low AGW).


Dzeko has the highest EPA from his go ahead goal in the 57th minute. This equated to a little more than a point for City. Klasnic, Muamba, and Silva all scored goals that added no expected points for their team.

If you have any questions feel free to ask in the comment section. I'll be super busy this week between moving in to my apartment at school and 3-a-days for preseason but I'll try to keep some posts coming.

Wednesday, August 3, 2011

Refining The Win Probability Statistic


Last year I was planning on going to go to the Sloan Sports Conference but ended up not being able to make it. I was thinking about it again this year, and I decided it wouldn't be a bad idea to submit something for this year’s conference. At first I wasn’t going to, but why the hell not? Might as well go for it, I guess.

My win probability added statistic has generated some interest for people, and I think it gives some pretty interesting insight, so I’ve been working on expanding it. If you have no idea what win probability added is, check out my first post on win probability and another on win probability added. Anyways, thus begins my quest to refine and expand the win probability added statistic for submission to the sports conference. To make it a lot better, comments, criticisms, and suggestions are very much appreciated and would help a lot.

The first fix I made was change the name based on a simple fix. The problem with “win probability added” is that it doesn’t necessarily calculate the win probability added. That’s a little bit problematic. For example, if two teams are tied in the 90th minute, the win probability under my old calculations was .333 for both teams. This doesn’t really make sense, because each team has close to a 0% chance of winning the game, not 1/3. This comes from modeling the statistic after the similar calculation in professional baseball. My fix for the problem is extremely simple: multiply all the values by 3. This changes the statistic from win probability added, to the expected points added. It basically makes much more sense now. If a player scores a go ahead goal in the 90th minute, the Expected Points Added (easier to write EPA from now on) is going to be almost 2. If a player scores a tying goal in the 90th minute the EPA would be almost 1. Much simpler and easier this way (originally got the idea from @11tegen11’s similar analysis).

After this, I noticed the graphs were not nice easy curves. Even though I took a big sample size of games (about 10 years worth) there isn’t enough data to give a nice curve. To fix this, I just created lines of best fit for each game situation. The home and away graphs for each minute and goal differential are below. Before there were a few situations that didn’t give a realistic expected point total because there were so few game situations (like a 2 goal lead in the 5th minute). Making the nice smooth curves fixes this problem. It also allows me to use equations to calculate EPA instead of the annoying process of referencing a massive excel chart.





I think there’s a lot of possible paths to take from here. I’m going to recalculate the top goal scorer’s EPA using the equations. It won’t change much, but it’ll be nice to have some continuity because I’ll be calculating EPA week by week for every goal next EPL season.

I’m also working on creating a database of the top goal scorers in the last 10 years in the EPL with their goal totals and their EPA over the years. Looking at goals and EPA over time will hopefully give some insights in to clutch (or lack thereof) goal scoring. If some players consistently have very high EPA’s and some players consistently have low EPA’s, it could be an indicator of clutch goal scoring in football.

Like I said before, I’d love comments and suggestions on ideas for where to go next on the blog, via Twitter, or even email. 

Wednesday, July 27, 2011

Does More Possession=More Wins in the MLS?



In the past couple of blog posts I've looked at two common statistics and shown that they are not as meaningful as most people believe. shots on goal do not predict success very well, and assists favor players on better clubs. In keeping with this theme of misleading statistics in football, I decided to look at possession data. The commonly held notion is that the team that has the ball more (has a possession percent over 50) is more likely to win. This makes sense. A team with the ball more is more likely to score and less likely to concede. But does the data back it up? Does having more possession than your opponent mean you are more likely to win the game? I looked at the possession data from the MLS season so far. What I found goes completely against what most people would think. So far this season in the MLS, the average possession percentage for teams that have won the game is 48.5%. Teams that win actually posses the ball less. This means the average possession percentage for losing teams is 51.5%.

To get even more specific, I broke down the possession data further. Winning home teams average 50.9% possession, and winning away teams average 43.4% possession. On the other side, losing home teams average 56.6% possession and losing away teams average 49.1% possession. The histograms below illustrate these facts. I found that away teams, on average, have a possession percentage of 47.3%, and home teams have a possession percentage of 52.7%.


So what does all this mean? It seems possession percentage in the MLS does not predict success. Teams that possess the ball more don't win more; they actually lose more. Home teams also have a slight advantage in possession percentage compared with away teams.

What about teams that completely dominate possession? You might think that a team that had the ball much more often than their opponent would be much more likely to win. I defined "dominating possession" as having the ball more than 60% of the time. So far this season, teams that have dominated possession have a record of 10 wins, 19 losses, and 18 ties. Domination in possession? Yes. Domination in wins? No.

This analysis calls in to question statements like "the Union had the run of play, they possessed the ball more and deserved the win." It's apparent that in the MLS, possession is not all that important when it comes to winning games. So what's the problem with possession? One reason could be that the best teams do not play possession football. The teams with the most success may play kick and run. Another possibility is that possessing the ball simply doesn't lead to wins. Either way, having the ball more than your opponent does not mean much in the MLS.

Monday, July 25, 2011

Why We Shouldn't Put Much Value in Assists



Last week I wrote a post on why shots on goal are a misleading statistic. In keeping with the analysis of the problems with some commonly kept statistics in football, I decided to look at assists. 

If you think about it, assists are highly misleading. Simply playing with good players boosts your assist total. Similar to shots on goal, not all assists are the same. There are the assists where a player makes a short pass in the midfield that leads to a teammate dribbling through all the opposing defenders and finishing, and the assists where a player makes a beautiful cross where their teammate simply has to tap the ball in the open net. These obviously shouldn't be counted as the same value to the team, yet they are. Hell, I could probably record an assist eventually in the EPL if I played for one of the top teams (OK, maybe an exaggeration but you get the point.)

First, let's look at the assists data for all the teams in the EPL league. As the graph below shows, as the point value of a team increases (basically, the better the team is) the assist total also generally increases. This is no surprise. We would expect better teams to score more goals and thus have more assist totals.



Basically what this means is that the assist statistic should favor players on better teams. Players on better teams play with better teammates and should therefore have more opportunities for assists. Below is a screenshot from the EPL website of the players with the top 20 assist totals.



9 players from top 5 clubs are in the top 20 for assist totals. No players from bottom 3 clubs are in the top 20, with the exception of Blackpool's Charlie Adam who was just signed by Liverpool. It's easy to see assists totals are higher for players on better clubs.

A better statistic that is not influenced by the quality of your teammates are chances created. A chance created is defined as a pass that leads to a shot. These are obviously not as dependent on your teammates and give a more fair and true assessment of how much of a playmaker that player is for their team. 

The next time a club is looking to sign a player based solely on their assists totals, they should take a more in depth look. Assists can tell an inaccurate, or at the least biased, story.