domenica, giugno 17, 2018

Football learns from chess and finally uses Elo!



Finally, we are there after 50 years: the FIFA introduces the Elo rating as a measure of national teams relative strength!

For the chess fans among us, a well-deserved recognition for a classical tool from the chess world. For for the serious football fans a huge opportunity to improve understanding of playing strength and to become more informed about the real performance of the beloved team.

Let us start our journey with the hero of this story Arpad Elo.

Arpad Elo

Let us clear this in the very beginning: Elo is a name, not an acronym. Specifically, Elo is the family name of an US American theoretical physicist and chess player of Hungarian origins, who, in the 1950es, invented a statistics-based system for rating the relative strength of chess players.

As a theoretical physicists is difficult to find traces of him. He seems to have done some research in analytical chemistry, but his activity was clearly focused in chess. Being a strong chess player on his own, he was involved in the United States Chess Federation (USCF) from the very beginning and convinced both USCF and then the international chess federation (FIDE) to adopt his system.
He died in 1992, aged 89.
He has a good wikipedia page, which is as usual a very interesting reading.
By the way, from this short introduction, it should be clear that Elo should be written like Elo, and not like ELO.

How does it work?

The idea behind the Elo-rating is to have for every chess player a number, the rating, with the following property: if we have to players A and B, with ratings r(A) and r(B), the average results of their games should be a function of the rating difference alone. This is very different from other types of ratings. Imagine, for instance, to use the season rankings from a typical football league as your rating. These are calculated by giving each team 3 points on a win, 1 point on a draw, 0 points on a loss. Since these numbers increase with the advancing season, so do their differences, although the relative strengths of the teams do not change (in first approximation).
In practice, the are several different implementations of Elo systems. They usually use some variation of following procedure: when two players have a match, one computes their rating difference, say D, and divide this by a scaling coefficient S. This is the expected result of the game. This is done for each of the two players, so each player will have an expected result. For chess the expected results will sum up to 1. This is because you get 1 point for a win, half for a draw and none for a loss. So always 1 point is awarded. After the game, one compares the expected results with the actual result for each player. This difference is multiplied with another number K and then added to the rating of the player.
There are many variations on this implementation. As usual, every rule calls for problems and exploits.

History of Elo 

Until the 90es the history of Elo rating has been quite straightforward. More countries would adopt it, more decisions in chess would be based on the Elo rating of the players. For instance, chess has titles, both on the national and international level. If you are a good player, you could achieve the title of National Master, awarded by your federation according to own rules: usually they include to have at least a certain Elo and to achieve a given Elo performance on 1 or more tournaments. What is Elo performance? Imagine you play a tournament: since Elo rating is about averages, you could average the Elo rating of the players you play against in the tournament, and consider your average performance in the tournament. If you played 6 games for 4.5 points, this would be 0.75. Comparing this results with the average rating (as we did before) yields the Elo performance for the tournament. Accordingly, FIDE awards the title of FIDE master, International master and International  Grandmaster based on similar criteria. World championship is, in contrast, awarded based on a lineal system.
Until the advent of internet, there was a problem: Elo would get updated rarely. In Italy, the country where I came from, every 6 months. Now, imagine that, for some circumstances, you would lose more Elo than usual in one term. Say, you got sick on the only two tournaments you played, performing way under your standard. You would then start the next term with a very low rating, making very easy for you to make gains by playing weaker players with the same rating as you. It is as if you would punch in a weight category lower than your natural one. If you now happen to play many tournaments in that term, you would be able to boost your rating way above your real strength. Interestingly, this has happened in 1997 in Italy, where Roberto Ricca could climb to the top of the Italian rating list in spite of being an average national master.
Modern online chess platforms, like chess.com updates your rating after each and every game, so this exploit is not possible anymore (just in case you would like to try).
Much research has been done in this topic: what if you want to estimate Elo from a static database of results? What if you want to estimate also the variance of the result and not only the average?

Elo in other disciplines

Football is not the first non-chess discipline to use Elo. Online based game-like platforms are a natural fit for this very easy system. Halo and Tinder seem to use it, too.
In fact, classical Elo implementation in chess is just a very simple data stream compatible machine learning algorithm for estimating the expected result of matches. Also shows that simple algorithms can go a very long way.
Interestingly, non official Elo-like ratings in football have been made available from freelancers since ages (like this), I remember to find one back in 2008, as I moved to computational neuroscience and tried to get an overview about machine learning and similar topics.
I hope you enjoyed this overview over the Elo system, if you would like to know more about Bayes Elo or have me dive into the difficulties of Elo for non-zero sum games like Football, let me know with a comment!

Nessun commento: