Popular Posts

Monday, June 20, 2011

An Accurate Formula for Points Using GD and GA

It may seem pretty obvious that goals for and goals against should be a good predictor of success in soccer. Teams that score a lot and concede a little should win more than teams that don't score often and concede a lot of goals. But how predictive is it? How accurately can we determine the standings based solely on the goals for and goals against of a team in a season?

Apparently, if the formula is tweaked enough, we can get it down to an average error of just above 3 points across all teams. Considering 3 points equates to only one win in most leagues, goals for and goals against can narrow down the error to only one win. Not bad.

The equation I used to do this is based on Bill James' Pythagorean Expectation formula. The formula is pretty simple: (Runs Scored)^2/((Runs Scored)^2 + (Runs Allowed)^2). Basically, Bill James calculates the winning ratio using the formula above.

Of course, soccer is not the same as baseball and a few adjustments have to be made. First, there are draws in soccer. How can the formula be adjusted to take in to account draws? If you think about it, a draw is basically equal to 1/3 of a win (a draw counts for 1 point, and a win counts for 3 points). Therefore, we can calculate the winning percentage of a team as (wins + draws/3)/(total number of games * 3). This formula is used to calculate winning percentage. We can then convert the winning percentage back in to points by multiplying it by 3.

The next change I made was to change the exponent. Using data from the tables from 2000-2010 for the MLS and 97-98 to 09-10 for the EPL, the exponent that minimized the average error was 1.4

Next, I added a coefficient to further minimize the average error. After fooling around with different exponents and coefficients, the combination of 1.4 for the exponent and .9 for the coefficient got the average error to right around 3 for both the MLS and the EPL. This gives an equation of .9*(GF^1.4)/((GF^1.4)+(GA^1.4))

Overall, the average error for the EPL was 3.21 points and the average error for the MLS was 2.85 points.

Interestingly enough, the equation works for both the EPL and the MLS. Doesn't seem to matter across these two leagues. I haven't looked, but I assume it would work for other leagues across the world.

Below is an example of a one of the charts I created. This one is using the tables for one year in the MLS.


3 comments:

  1. Hello.. your formula .9*(GF^1.4)/((GF^1.4)+(GA^1.4)) gives like results 0.xxxxx . i think we should multiply it by 100 to make sense as points. as an example GF=35 and GA=22 , when i put them into the formula , i take 0.5913 as a result. so i think i should multiply 0.5913 by 100 to make sense ? 59.13 ?

    ReplyDelete
  2. i didnt understand the point that what is the relation between winning percentage of a team and final points or standing ?

    ReplyDelete
  3. do you think can we construct a model by using both regression analysis between goal differential and points and also using poisson distribution for the number of goals in a game, and finally reaching the final standings by these ways? i think we have some powerful and useful statistical techniques in hand. i wonder your opinion.

    ReplyDelete