Explained: The 10 Commandments of football analytics (2024)

Last year, The Athletic’s Ben Baldwin wrote a piece detailing the 10 Commandments of numbers-based analysis of the other football. The one with the funny-shaped ball. The beautiful game lends itself to plenty of analysis using numbers, too but just because the data is there, it doesn’t mean that it’s always used correctly.

Advertisem*nt

This guide will give you a better appreciation of the context required when talking about teams and players, which numbers to focus on and how to better question what you’re seeing.

Here are The 10 Commandments.

1) Thou shalt not use save percentage to evaluate a goalkeeper’s shot-stopping ability

Example:Martin Dubravka has been the eighth best shot-stopper in the Premier League this season with a save percentage of 73.9 per cent”

Why it’s misleading: The equation for save percentage is shots saved/total shots faced. Straight away, there’s no accounting for the difference in the type and quality of shots that a goalkeeper faces, which will have a large impact on his ability to make a save, and therefore, his save percentage.

Goalkeeper X facing 10 shots from inside the six-yard box is going to have a tougher time making saves compared to Goalkeeper Y, who’s facing all of his ten shots from 30 yards out or more.

Expected Goals and its cousin, Expected Goals on Target, tell us that shots from further away are less likely to result in a goal and shots that are either right at the keeper or down the middle are more likely to be saved. Anyone reading who has watched enough football will, of course, tell you the same thing.

By equally weighting each shot to calculate save percentage, we are doing a disservice to Goalkeeper X and making Goalkeeper Y look better than they actually might be.

What to use instead: Comparing the quality of on-target shots by using Expected Goals on Target (or Post-Shot Expected Goals) to the number of goals conceded, which I’ve written about previously here, adds much needed context to a goalkeeper’s numbers.

Goals Prevented tells us how many goals a goalkeeper saved given the quality of shots he’s faced, compared to the average goalkeeper. Through doing this, Martin Dubravka looks far better than his save percentage says he is, and Vicente Guaita looks like a world-beater:

Explained: The 10 Commandments of football analytics (1)

2) Thou shalt not use distance or sprint stats to indicate effort

Example: “Mesut Ozil has run more than any other player for Arsenal today, clocking up 11.2km”

Why it’s misleading: Premier League clubs have had access to tracking data since 2013-14 and, as part of that deal, the media get access to derived outputs too. Up to this point, all we’ve really seen is distance and speed statistics.

Advertisem*nt

The reality is, these numbers are some of the most contextless around, yet they’re used frequently when analysing teams and players. The reasons for not using are plentiful.

Firstly, there’s no correlation between the distance you run and your likelihood of winning a game. The amount of distance covered in a finite amount of time is only useful in a time trial, which football is not. From last year’s UEFA Technical Report on the Champions League, Shakhtar Donetsk ran the furthest on average of all 32 teams in the competition, yet finished third in their group and crashed out of the Europa League in the round of 32. Manchester United ran the second-least on average, yet were still able to reach the quarter-finals. Distance doesn’t really tell us much.

Secondly, distance and sprints are going to be stylistic, as in, the numbers that players rack up will be linked to what’s asked of them, the system they play in, how the opposition sets up, game state, and various other factors. Without controlling for — or at least mentioning — these other factors, these numbers don’t give us much insight.

Finally, there’s also some evidence to suggest that running less actually can be beneficial — just ask Lionel Messi. Most players have the fitness levels to last a full game but the manipulation of space is what matters. Similarly, there have been plenty of quick players to have played the sport but the very best know when to use their pace. Very rarely do players need to beat another in a foot race but it’s quick bursts of speed to get past someone or latch on to the end of a loose ball that are key.

There’s value in this data but it’s on the athlete-management side and ensuring that the players are in the right condition to be playing. Football is a game of space and time, and the current tools to measure these are too blunt to be interesting right now.

What to use instead: There’s not really a great substitute here. Either these numbers need to be framed properly before using or we’re probably better off without them.

3) Thou shalt not use possession as an indicator of quality

Example: Tottenham had 79.8 per cent possession in their 0-1 defeat to Newcastle; the second-highest figure for a losing side in the Premier League since 2003-04.”

Why it’s misleading: As Marti Perarnau puts it in Pep Confidential (my pick for The Athletic’s list of favourite football books) “possession is only a means to an end. It’s a tool, not an objective or an end goal.” Leicester City won the league averaging 42.6 per cent of possession in 2015-16. Manchester City won the league last season averaging 67.7 per cent of the ball. In essence, it doesn’t matter how much you have — it’s what you do with it.

Advertisem*nt

Winning the possession battle doesn’t really tell us that much beyond how teams stylistically set up to play and in-game, can be entirely dictated by the scoreline. Take Atletico Madrid’s 1-0 victory recently against Liverpool in the Champions League. After a fourth-minute goal, Atleti set up shop, having just 27 per cent of possession. That figure may have looked entirely different had Atleti not scored early on.

What to use instead: Possession is still a useful nugget of information to understand which side had more of the ball — but just don’t use it to win any arguments that one team is better than another. Expected Goals is a far better indicator of the quality of a team, so if you want to argue about quality, see how good your team is at creating and preventing goalscoring chances.

4) Thou shalt not judge a player’s defensive ability on the number of tackles and interceptions they make

Example:Ricardo Pereira is the best defender in the Premier League, making 119 tackles this season”

Why it’s misleading: Not all the defending that a player does is tangible and the measurable output that can be counted is often biased by team style. Logically, if a team has less possession, they have more opportunities to defend, and vice versa.

For that reason, tackle and interception numbers are better indicators of defensive style (i.e. is the player passive or active) and not necessarily the defensive quality of a player. Virgil van Dijk attempts just 0.76 tackles per 90 minutes, yet no one would make the case that that makes him a poor defender.

In addition, because these defensive numbers are at the mercy of the style of team that a player plays in (mainly the frequency of time they are out of possession and therefore are called into action), it’s hard to compare one player to another.

What to use instead: To combat this, we can adjust defensive statistics for the number of times that they make these actions for every 1,000 touches that an opponent makes when on the field of play — an interpretable method of getting all players on a level playing field. Jordan Henderson’s 2.6 tackles per 90 is 15th best in the league but, when adjusting for possession, he jumps to 4.6 per 1,000 opponent touches, the fifth most defensively-active midfielder in the league.

Possession-adjusted defensive numbers give a more rounded view of defensive activity but these still only show style and not overall quality.

5) Thou shalt not use tackle win-rate to judge a player’s tackling ability

Why it’s misleading: I’m going to let you into a secret: tackles lost and tackles won are practically the same thing and ignore two other key outcomes when trying to make a tackle.

Tackles are usually split into two categories — those that are won and those that are lost. Winning a tackle consists of a player winning back possession when challenging for the ball, while losing a tackle sees a challenge take place but the ball isn’t won back. Losing a tackle could be due to the ball being poked out for a throw-in for the opposition, the ball knocked loose for the opposition to recover, or some other reason.

Advertisem*nt

Tackle win-rate is currently defined as tackles won/(tackles won + tackles lost). What this currently tells us is the proportion of tackles that a player makes where his team wins the ball back.

What’s the problem? Well, this currently ignores times when a player attempts a tackle and gets bounced off the player currently in possession, or when attempting a tackle, commits a foul. Of full-backs in the Premier League with the highest tackle win-rate, Martin Kelly is the best with 80 per cent of tackles won. The eye test tells us Aaron Wan-Bissaka should be amongst the top players, yet he’s only 11th. What gives?

What to use instead: True tackle win-rate can help avoid this error by incorporating these two missing categories, with the equation of total tackles/(total tackles+challenges lost+fouls when attempting a tackle). Through this metric, Wan-Bissaka is top with a 78.9 per cent true tackle win rate, and Martin Kelly is down in 29th — much better.

Explained: The 10 Commandments of football analytics (2)

6) Thou shalt not use goals minus expected goals as an indication of finishing ability in small samples

Example:Roberto Firmino has only scored eight goals from 12.7 xG, therefore he’s a poor finisher.”

Why it’s misleading: When it comes to understanding goalscoring ability, there are two crucial elements that need to be considered and judged in isolation. The first is a striker’s ability to generate chances for himself. Goals are a striker’s main currency and to score goals, strikers need to take shots. To measure the quality of these shots, we use expected goals. If a player consistently gets into good goalscoring positions, over time, goals will come.

It’s one thing taking shots, it’s another thing to finish them. In small samples — such as a whole season — a player’s goals and xG may not match up. Take Roberto Firmino. This season, he’s scored fewer than you’d expect given the chances he has but it is his best in terms of getting into great goalscoring positions.

Explained: The 10 Commandments of football analytics (3)

Firmino’s three prior seasons at Liverpool have seen him score above, below and on expectation. This isn’t enough data to give any concrete conclusions on his finishing ability.

What to use instead: Comparing expected goals (the chances players have) with expected goals on target (what they do with those chances) is one method of considering finishing quality in a very basic way. Even over larger samples, use with caution, and consider at least several hundred shots.

There’s a lot of debate in football analytics circles of whether finishing is a repeatable skill, though, so until there’s a proper answer, go ahead and rely on expected goals’ indication that, over time, most players score in line with their xG.

7) Thou shalt not judge a team’s performance with or without a given player

Example: “Arsenal’s win percentage this season without Mesut Ozil is higher without him (40 per cent) compared to with him (28 per cent)”

Why it’s misleading: With or without you (or WOWY, as it’s known in sports analytics circles) stats are intended to isolate the impact of a single player in a team to see how results change with that player involved compared to when they’re missing.

Advertisem*nt

These stats can work in sports with smaller segments to analyse such as basketball, which has more line-up changes and is far higher-scoring. In football, however, there are just way too many moving parts for this to be a good way of analysing if a player’s any good or not. There’s too much out of Ozil’s control that he gets penalised for in both situations.

Here’s just a sample of things that ideally should be taken into account but aren’t with WOWY: What was the quality of the opposition? What was the quality of the other players playing alongside Ozil? Was there a red card? Was Ozil subbed on?

Equally, you have the Burnley problem. Ben Mee and James Tarkowski have both played every minute of Premier League football this season. Which is better? We’ll never know.

What to use instead: It’s better to analyse players within the context of their position and focus on just what they can control. For Ozil and other creative midfielders, that’s chance creation, for strikers, it’s goalscoring and so on. Leave WOWY stats to sports played by big lads indoors.

8) Thou shalt not judge a player’s pass ability on his passing accuracy

Example: “Phil Bardsley is the worst full-back at passing in the Premier League, completing just 63.6 per cent of his passes”

Why it’s misleading: The degree to which a player’s passing is accurate or not depends a lot on what they’re being asked to do, and the choices they make when on the ball. Some teams, such as Manchester City, play the ball very short and in certain areas of the field, under little pressure. Due to this, they’ll have a high pass-completion rate. Others, like Burnley, look to hit the channels and user longer passes instead of shorter ones — passes that are, on average, less likely to be completed.

The passes may be, by the definition of the data, inaccurate, but that doesn’t tell the whole picture. Consider the example below, from a recent Leeds United game:

Here, Helder Costa’s pass goes down as a failed pass into the area but it’s largely due to the excellent recovery run of the Hull City defender. Here, we should care about possession retention and the progression that Costa has enabled. There are various other times when this situation takes place — possession being retained but the pass incomplete — which players get unfairly judged on.

Advertisem*nt

What to use instead: I’ll write more on other options on this in future, as currently I don’t think that there are many metrics that properly cater to this issue. Expected Pass completion rates may give a more rounded view of why a player’s pass completion rates are lowbut that data is relatively sparse in the public domain.

9) Thou shalt not judge players if they fail a lot

Example:Trent Alexander-Arnold has made more unsuccessful passes than any other outfielder in the Premier League”

Why it’s misleading: The Athletic’s Michael Cox wrote at length back in January on what being a “failure” in the Premier League means, so I won’t go into too much depth here. The Golden Boot winner every season will fail to score more times than they succeed in doing so. But if we want to find out the most clinical finisher, we’d look at conversion rate and therefore need goals.

What to use instead: In most cases, if the focus is on how many times a player has failed, it’s worth turning that into a percentage to add more context. Have they failed a lot, or is it that they’re tried something far more than other players?

10) Thou shalt not compare players with differing numbers of minutes played

Example: “Trent Alexander-Arnold and James Maddison are the joint second-best chance-creators in the league, with 75 each”

Why it’s misleading: Players who play more minutes have more chances to do things on the field that are counted. By not putting all players on a level playing field in terms of minutes played, it means that those who have played less will nearly always look worse.

I’m probably building some sort of a reputation for always fighting Emi Buendia’s corner but by adjusting for minutes played, Buendia is actually the second-best chance-creator in the league on a per 90 basis (3.3 per 90).

Explained: The 10 Commandments of football analytics (4)

What to use instead: By adjusting stats per 90 minutes played (that is, dividing the stat by minutes played/90), players who have played differing numbers of minutes can have their numbers compared, and more fair comparisons can be made.

(Photo: Getty Images)

Explained: The 10 Commandments of football analytics (2024)
Top Articles
Latest Posts
Article information

Author: Carlyn Walter

Last Updated:

Views: 5923

Rating: 5 / 5 (70 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Carlyn Walter

Birthday: 1996-01-03

Address: Suite 452 40815 Denyse Extensions, Sengermouth, OR 42374

Phone: +8501809515404

Job: Manufacturing Technician

Hobby: Table tennis, Archery, Vacation, Metal detecting, Yo-yoing, Crocheting, Creative writing

Introduction: My name is Carlyn Walter, I am a lively, glamorous, healthy, clean, powerful, calm, combative person who loves writing and wants to share my knowledge and understanding with you.