Popular Posts

Tuesday, October 11, 2011

An Analysis of the Performance of Promoted Clubs

Joey Barton, of newly promoted QPR


An aspect of English football that I love that does not exist in American sports is the promotion/relegation aspect. It makes not just the race for first exciting, but also the race to avoid relegation entertaining. In American sports, last place teams often simply give up, a disappointment for fans. 

I wanted to see exactly how promoted/relegated teams fared throughout the season. Some statistical research has already been done on the subject: Omar Chaudhuri, writer of the 5 Added Minutes blog, looks at conversion rates of promoted teams and their corresponding ability to stay in the top flight here.  In part 1 of this post, I have looked at how promoted teams have done in their first season in the top flight. My original idea was that teams may struggle early in the season to adjust to the higher level of competition, and eventually even out as the weeks go on and the teams adjust. This also puts the performance of QPR, Swansea, and Norwich into perspective with past promoted team's performances. I use data from promoted teams from the 2003/2004 to 2010/2011 season. 

I've created 5 graphs to illustrate the performances of promoted teams. The first one, below, shows how all the promoted clubs' point totals have progressed over the 38 games. On average, promoted teams earn around a point per week. The greenish linear-looking line in the middle is the average. All the other jagged lines are the point totals over the season of promoted clubs. This graph isn't too informative, but is an interesting graphic nonetheless.


The next graph is the same as the one above, but only looks at the three promoted clubs this season in comparison to the average points line and the linear points line. To clarify,  the linear line shows is a line illustrating what would happen if a team earned the same points every week to end up with the average point total for promoted clubs. The average line shows the average points earned each week of the season. These may sound the same at first, but I will show in the next couple of paragraphs that there is an important distinction. Anyways, the graphic below illustrates that all 3 promoted clubs are faring about as well as the average promoted team does. QPR started off a little stronger, but has since returned to the average. Norwich and Swansea both started a little weaker, but have improved to end up just above the average 7 weeks in to the season. All 3 teams have 8 points so far, just above the point per week average of promoted teams.


Another way of looking at the first graph is by looking at points per game of promoted teams. The graph below shows this. Obviously, at first clubs' point per game total is a little spread out. As the season progresses, teams earn an average of 1 point per game, as mentioned above. Some clubs have done a little better, and some a little worse, as evident from the graph.


Next is the graph above, but again looking at the performance of the 3 promoted teams this season. Again, the graph shows that QPR started off the campaign a little stronger, but has since regressed to be even with Norwich and Swansea.


The final and most informative graph shows the cumulative points per game of promoted clubs. This graph answers my question of how promoted teams fare throughout the season. As you can see below, promoted teams seem to struggle up until week 7, where they turn it around and do better than their average point total up until around week 20, where they hover around the point per game mark until the end of the season. There could be a lot of explanations for this trend. Maybe clubs struggle at first, and then adjust to the higher competition? Maybe clubs transfer window acquisitions (think QPR) start to pay off around week 7? It would be tough to tell what the true factors driving the trend are really. However, the graph does highlight the interesting phenomenon. 

I'm still working on doing a similar analysis of clubs that are relegated at the end of the season to analyze how their performance fluctuates throughout the season.

Tuesday, August 30, 2011

Expected Points Added (EPA) Leaders Through Week 3

Below are the Expected Points Added (EPA) leaders for the EPL through week 3. The week 1 leaders can be found in an earlier post here. To reiterate, EPA weights goals based on how important they are to the team's chance of winning the game. This is based on the notion that a go ahead goal in the 90th minute is worth more than the 5th goal in a 5-0 win.


Some interesting things to point out...

  • While Rooney has 5 goals this season, Welbeck's 2 goals have actually been more beneficial to United. In fact, Rooney doesn't even make the top 15 list above considering most of his goals were in the recent Arsenal blowout.
  • Dzeko gets to the top of the list by scoring frequently and in important situations. His average goal weight is a solid .51 expected points added, but just because of the fact that he has scored 6 goals puts him at the top.
  • It's still early in the season. Arteta makes third on the list with only 1 goal (a late game winning goal). Soon we'll start to see the top dominated by players who have scored a lot, and in important situations.

Sunday, August 21, 2011

Expected Points Added (EPA) Data Through EPL Week 1

Before the season I promised to post Expected Points Added (EPA) totals after each week of the season. Here are the EPA totals from week 1. If you don't know what EPA is, check out a full explanation here.

To summarize it very basically, EPA is the total measure of how much each player's goals add to team's expected points total. That is why you see some EPA's of 0 below. These players scored goals that added nothing to the teams expected points total (for example, a team is up 3-0 and is already going to win, and a player scores a 4th in the 90th minute. This does not add to the team's chance of winning technically, because the team is already very likely to win.)

Average Goal Weight (AGW) is just EPA divided by the number of goals a player has scored. This measures how important, on average, a player's goals are. It can show us that a player consistently scores clutch goals (high AGW) or that they are scoring useless goals in blowouts (low AGW).


Dzeko has the highest EPA from his go ahead goal in the 57th minute. This equated to a little more than a point for City. Klasnic, Muamba, and Silva all scored goals that added no expected points for their team.

If you have any questions feel free to ask in the comment section. I'll be super busy this week between moving in to my apartment at school and 3-a-days for preseason but I'll try to keep some posts coming.

Wednesday, August 3, 2011

Refining The Win Probability Statistic


Last year I was planning on going to go to the Sloan Sports Conference but ended up not being able to make it. I was thinking about it again this year, and I decided it wouldn't be a bad idea to submit something for this year’s conference. At first I wasn’t going to, but why the hell not? Might as well go for it, I guess.

My win probability added statistic has generated some interest for people, and I think it gives some pretty interesting insight, so I’ve been working on expanding it. If you have no idea what win probability added is, check out my first post on win probability and another on win probability added. Anyways, thus begins my quest to refine and expand the win probability added statistic for submission to the sports conference. To make it a lot better, comments, criticisms, and suggestions are very much appreciated and would help a lot.

The first fix I made was change the name based on a simple fix. The problem with “win probability added” is that it doesn’t necessarily calculate the win probability added. That’s a little bit problematic. For example, if two teams are tied in the 90th minute, the win probability under my old calculations was .333 for both teams. This doesn’t really make sense, because each team has close to a 0% chance of winning the game, not 1/3. This comes from modeling the statistic after the similar calculation in professional baseball. My fix for the problem is extremely simple: multiply all the values by 3. This changes the statistic from win probability added, to the expected points added. It basically makes much more sense now. If a player scores a go ahead goal in the 90th minute, the Expected Points Added (easier to write EPA from now on) is going to be almost 2. If a player scores a tying goal in the 90th minute the EPA would be almost 1. Much simpler and easier this way (originally got the idea from @11tegen11’s similar analysis).

After this, I noticed the graphs were not nice easy curves. Even though I took a big sample size of games (about 10 years worth) there isn’t enough data to give a nice curve. To fix this, I just created lines of best fit for each game situation. The home and away graphs for each minute and goal differential are below. Before there were a few situations that didn’t give a realistic expected point total because there were so few game situations (like a 2 goal lead in the 5th minute). Making the nice smooth curves fixes this problem. It also allows me to use equations to calculate EPA instead of the annoying process of referencing a massive excel chart.





I think there’s a lot of possible paths to take from here. I’m going to recalculate the top goal scorer’s EPA using the equations. It won’t change much, but it’ll be nice to have some continuity because I’ll be calculating EPA week by week for every goal next EPL season.

I’m also working on creating a database of the top goal scorers in the last 10 years in the EPL with their goal totals and their EPA over the years. Looking at goals and EPA over time will hopefully give some insights in to clutch (or lack thereof) goal scoring. If some players consistently have very high EPA’s and some players consistently have low EPA’s, it could be an indicator of clutch goal scoring in football.

Like I said before, I’d love comments and suggestions on ideas for where to go next on the blog, via Twitter, or even email. 

Wednesday, July 27, 2011

Does More Possession=More Wins in the MLS?



In the past couple of blog posts I've looked at two common statistics and shown that they are not as meaningful as most people believe. shots on goal do not predict success very well, and assists favor players on better clubs. In keeping with this theme of misleading statistics in football, I decided to look at possession data. The commonly held notion is that the team that has the ball more (has a possession percent over 50) is more likely to win. This makes sense. A team with the ball more is more likely to score and less likely to concede. But does the data back it up? Does having more possession than your opponent mean you are more likely to win the game? I looked at the possession data from the MLS season so far. What I found goes completely against what most people would think. So far this season in the MLS, the average possession percentage for teams that have won the game is 48.5%. Teams that win actually posses the ball less. This means the average possession percentage for losing teams is 51.5%.

To get even more specific, I broke down the possession data further. Winning home teams average 50.9% possession, and winning away teams average 43.4% possession. On the other side, losing home teams average 56.6% possession and losing away teams average 49.1% possession. The histograms below illustrate these facts. I found that away teams, on average, have a possession percentage of 47.3%, and home teams have a possession percentage of 52.7%.


So what does all this mean? It seems possession percentage in the MLS does not predict success. Teams that possess the ball more don't win more; they actually lose more. Home teams also have a slight advantage in possession percentage compared with away teams.

What about teams that completely dominate possession? You might think that a team that had the ball much more often than their opponent would be much more likely to win. I defined "dominating possession" as having the ball more than 60% of the time. So far this season, teams that have dominated possession have a record of 10 wins, 19 losses, and 18 ties. Domination in possession? Yes. Domination in wins? No.

This analysis calls in to question statements like "the Union had the run of play, they possessed the ball more and deserved the win." It's apparent that in the MLS, possession is not all that important when it comes to winning games. So what's the problem with possession? One reason could be that the best teams do not play possession football. The teams with the most success may play kick and run. Another possibility is that possessing the ball simply doesn't lead to wins. Either way, having the ball more than your opponent does not mean much in the MLS.

Monday, July 25, 2011

Why We Shouldn't Put Much Value in Assists



Last week I wrote a post on why shots on goal are a misleading statistic. In keeping with the analysis of the problems with some commonly kept statistics in football, I decided to look at assists. 

If you think about it, assists are highly misleading. Simply playing with good players boosts your assist total. Similar to shots on goal, not all assists are the same. There are the assists where a player makes a short pass in the midfield that leads to a teammate dribbling through all the opposing defenders and finishing, and the assists where a player makes a beautiful cross where their teammate simply has to tap the ball in the open net. These obviously shouldn't be counted as the same value to the team, yet they are. Hell, I could probably record an assist eventually in the EPL if I played for one of the top teams (OK, maybe an exaggeration but you get the point.)

First, let's look at the assists data for all the teams in the EPL league. As the graph below shows, as the point value of a team increases (basically, the better the team is) the assist total also generally increases. This is no surprise. We would expect better teams to score more goals and thus have more assist totals.



Basically what this means is that the assist statistic should favor players on better teams. Players on better teams play with better teammates and should therefore have more opportunities for assists. Below is a screenshot from the EPL website of the players with the top 20 assist totals.



9 players from top 5 clubs are in the top 20 for assist totals. No players from bottom 3 clubs are in the top 20, with the exception of Blackpool's Charlie Adam who was just signed by Liverpool. It's easy to see assists totals are higher for players on better clubs.

A better statistic that is not influenced by the quality of your teammates are chances created. A chance created is defined as a pass that leads to a shot. These are obviously not as dependent on your teammates and give a more fair and true assessment of how much of a playmaker that player is for their team. 

The next time a club is looking to sign a player based solely on their assists totals, they should take a more in depth look. Assists can tell an inaccurate, or at the least biased, story.

Monday, July 18, 2011

Do Shots on Goal Matter?


The major point of this blog is to test commonly held notions in football for their validity. After watching the US women lose to Japan yesterday, I started to think about shots on goal. I don't have the exact numbers, but I'm pretty sure the US crushed Japan in the shots on goal category. This made me think, do shots on goal matter? Most people would quickly say yes. It would make sense that more shots on goal mean more chances to score and thus more goals. The only problem is that some things in football just don't make sense. I wanted to see if shots on goals equate to success in two categories: 1.) Do more shots on goal mean more success for a team as a whole? 2.) Do more shots on goal mean more goals for a specific player? To test these questions I used data from the MLS website. As an aside, mls.com has extensive statistics for every season in a bunch of categories. Great to see. Anyways, the data is from the 2010 MLS season.


First question: Do more shots on goal mean more success for a team as a whole?

If this was true, we would expect points to increase as shots on goal increase on a team level. In other words, teams that have more shots on goal would be more successful. The graph below tells us a different story.


The graph shows there is no real relationship between shots on goal and points. Most teams cluster around just under 140 shots on goal on the season. The line of best fit shows a positive relationship, but this relationship is not strong at all. The correlation of the graph is r=.1311. As a reminder, the correlation of a graph tells us how strong the linear relationship is between two variables. The correlation coefficient (the value of r) gives a numerical value of the strength of the relationship. A value of 0 means there is no linear relationship at all, and a value of 1 means there is a perfect positive linear relationship. In this case, the value is .1311, telling us there is a very weak linear relationship.


Second question: Do more shots on goal mean more goals for a specific player?

Similar story for this question: is there a linear increase in the amount of goals as the amount of shots on goal increases? The graph below gives us the answer.



This graph shows a stronger relationship compared with the graph above. However, the relationship is still not very strong. The value of r in this case is .4722, indicating that the relationship is stronger than the graph above. However, a correlation under .5 is generally considered to be a weak relationship. This means for individual players, shots on goals are not a very good indicator of goals.

Here's my best explanation for why shots on goal are not a very indicative statistic: Not all shots on goal are the same. There are 40 yard weak rollers that the goalie easily saves, and there are 5 yard shots that the keeper barely gets a hand on. There are weak attempts by a center back getting forward and there are breakaways by forwards. In the shots on goal statistic, in both cases the shots on goal are counted as equivalent. Obviously this makes no sense. A statistic that would be better indicative of goals scored for both questions I looked at above would be shots on goal inside the box. Shots on goal inside the box would get rid of the shots on goal that have no chance of going in. Not all shots inside the box are the same, so we have somewhat of the same problem as shots on goal. However, I assume there would be a much stronger correlation between shots on goal inside the 18 and points, and shots on goals inside the 18 and goals by an individual player. Unfortunately, I don't have the data to back up this claim (working on it). If/when I do get the data from shots inside the box I'll post the graph and the correlation between shots on goal in the box and goals.

Even without the data, the point I'm making is still clear: shots on goal do not equate to more success from a team perspective and do not correlate with goals for individual players very strongly like most people assume they do. There are better statistics than shots on goal. This means statements like "New England had 5 more shots on goal than New York, they dominated the game" and "Donovan had 4 shots on goal in the game, he was due for a goal" are not neccesarily valid. What if New England had a bunch of shots on goal from outside the 18 that never had a chance of going in? And what if Donovan's shots on goal all were weak rollers? Shots on goal are often misleading.