Football’s next frontier: the rise of detailed analytics

Football’s next frontier: the rise of detailed analytics

Let’s introduce the cast. The narrative suppliers, predominantly journalists and mainstream media outlets, are tasked with the most day-to-day part of football coverage: match reports, press conferences, transfer news, and so on. They make up the bulk of the coverage, but their importance has waned.

The analysts are more niche content creators; their pieces are mainly on the internet, and are as much a product of this new habitat as a part of it. They diverge predominantly into two methodological camps – tactics, the study of football strategy, and statistics, the application of numerical methods to the sport.

There is little, if any, overlap between the three camps. This may be a product of the context. The demand for football coverage has never been as big as it is now, and it has never been as diverse in its preferences. While the internet chokeholds traditional outlets with its new clickbait revenue model one on hand, it opens the door to a host of new types with the other.

You don’t have to look hard for an argument that this sort of specialisation may not be a bad thing. Adam Smith has been immortalised as the first to theorise an economic division of labour, but the concept is one that can be traced throughout history, all the way to Plato’s Republic: “Well then, how will our state supply these needs? It will need a farmer, a builder, and a weaver, and also, I think, a shoemaker.”

To an extent, specialisation is natural, and with it comes an ensuing boost in efficiency. In the context of football, our narrative suppliers and analysts are likely to be better at their respective tasks than if their pool of responsibilities were larger.

But the benefits of specialisation hinge on a potentially fatal caveat: all floors of a car factory can produce different parts of the end product, but someone needs to know how to put it together. In football, this extreme division has produced fake trade-offs.

You can either have a matter of fact based match report, a tactical analysis, or a statistical summary. This makes it appear like the approaches are substitutes for one another, alternative ways of approaching the same puzzles. Niche pieces are pigeonholed and presented as such – ‘The stats that tell you …’, ‘Tactical analysis of …’ – in a way that perpetuates these artificial dichotomies.

Read  |  Jürgen Klopp and the subtle art of Gesamtkunstwerk

The trade-offs are fake because any persuasive football analysis would necessarily involve tactical theory, empirical methods, and the real footballing context. These are not different attempts at the same puzzle, but the building blocks with which we can build a bigger picture.

Let’s take an example that begins with the narrative creator – Louis van Gaal has just been sacked, and you have to write an article about it. To do so, you’ll probably contextualise the rationale: van Gaal football is seen as ‘boring’. You may not expand on this, but in doing so you have made a comment that rests both on tactical and empirical foundations.

Boring alludes to the style of play, tactical decisions that have an impact on the way that United’s football has been perceived. But the team can only be boring relatively, in comparison to others, and how do we know if they are actually less entertaining?

To convincingly argue that Manchester United are boring, we’re going to need to prove that they don’t satisfy our criteria of entertaining football. We know our ideas of entertainment revolve predominantly around the amount of goals and style of football. After scraping past Sheffield United in the FA Cup, Paul Scholes said: “We haven’t seen anything different now for the last six months, that’s the way this team plays football and LvG will be happy with the 1-0 win.”

Averaging the total expected goals, a measure of chance quality produced by a statistical model, per match (as in for and against), and evaluating a team’s directness, what proportion of the passes they play take them closer to the opposition’s goal, we can create a proxy for this colloquial entertainment concept. It’s clear why Manchester United were resoundingly and unanimously lauded as boring:


Credit: Opta and Analytics FC

Only Aston Villa matches had less chances for the teams in them, plus United play markedly less direct football than anyone else in the league. At the other end of the spectrum, we have fairy-tale winners Leicester, who were by far the most direct and whose matches contained the most chances. If the Leicester of this season are the embodiment of our entertainment concept, Manchester United have been the opposite.

That argument, was it statistical, or tactical? The answer, unavoidably, is both.

When you look at it in plain terms, every decent argument will be, because epistemologically speaking, there are no a priori tactical truths. For example, abstractly determining which of zonal and man marking is better as an idea is impossible – theoretically, if enacted perfectly, both are of equal merit.

It may yet turn out that one is drastically easier to teach, or one tends to concede less goals, or their suitability depends on other tactical factors, but these conclusions can only come from practice and empiricism. Our trade-off is fake because analytical thinking and practice is as necessary to tactical strategy as econometrics is to economic theory.

Take another example, this time beginning in the world of the stats analyst. You’ve got a decent expected goals model which you’re using to assess team quality, and you’re wondering if there’s any reason Leicester over-performed other than variance: your model has them ranked fifth best in the league by expected goal difference, and yet they finished first.

To do this, you’ll need to see if there’s anything advantageous to the way Leicester play that your model isn’t picking up. From the earlier plot, we can clearly see that Leicester’s tactical system is the most direct in the league; maybe their unusual directness is indicative of other tactical idiosyncrasies that could be advantageous.


Credit: Opta and Analytics FC

We see that Leicester attacked more quickly than anyone else in the league. If the current model didn’t include attack pace as an explanatory variable, this is the point at which it would be worth thinking about testing its inclusion. But we still don’t know why this sort of set-up might have helped Leicester – this is a question that necessitates theory.

Marti Perarnau answers it articulately and intuitively: “The idea behind the counterattack is to take advantage of the fact that the opponent is in the transition phase. If one team is transitioning from defence to attack, the other team is transitioning from attack to defence. Attacking in the moment means the counterattacking team can take advantage of aspects that can be found when facing a disorganized defence.”

These insights about Manchester United and Leicester might have been expressed differently, but this was the easiest shortcut to my important qualifier – being convincing. A lot of tactical analysis uses snapshots or short videos to illustrate patterns of play without ever really establishing the macro-level patterns themselves. It is easy to illustrate why this is problematic.

Take the above gif, from one of United’s more traumatically dull attempts at football this season (thanks to Judah Davies for referring me to the match). Ander Herrera cycles the ball to Daley Blind on the right-hand side of the pitch. There are five players in the box. Blind shoots, wildly.

What insight can be gleaned from this? None. But if I told you Blind regularly shot from outside the box, and was frustratingly poor at cycling the ball into a normal transition, it would become an illustration of that observation. These snapshots can only be valuable if the pattern they’re serving to highlight has been sufficiently established; in much of tactical coverage, this necessity is severely neglected. Blind only shot twice outside the box in the league this season. The snapshot is useless.

Randomness is often touted as the Achilles heel of statistical methods being applied to the beautiful game, a sport whose idioms and clichés deny its predictability: “anything can happen in football.” And it is its weakness, but randomness means something slightly different to the empiricist – in a regression model, the ‘random’ error term is the portion of variation that you cannot explain.

Of course this is problematic, but it is also an automatic stabiliser on the hubris of football analytics. Though Martin Samuel and Neil Ashton columns will regularly prescribe arrogance to the Moneyball revolution in the sport, it is one inherently aware of its own limitations.

In stark contrast, the rest of our cast fail to even acknowledge the unexplainable. Mainstream analysis is particularly disgraceful, where columnists hired on the basis of their popularity barely even bother to justify their arguments.

Until tactical analysis properly accepts randomness and denounces abstract reason, it is cursed to be unable to convince meaningfully. By randomness, I mean the inability to explain everything; what is meant by ‘abstract reason’ is slightly more nuanced.

At first watch, Johan Cruyff explaining his diamond formation may seem like abstract tactical theory, proposing a certain system over others independent of the players within them. It isn’t. Cruyff is constantly referring to the players that he used within the system, and his perhaps dogmatic preference for midfield threes is only because the vast majority of the opponents he faced were playing 4-4-2. ‘Abstract reason’ would be proposing a system over all others, dealing in unfounded absolutes. Johan Cruyff does no such thing.

Contrast this with an analysis of the resurgence of the 4-4-2 by Danny Higginbotham:

“Football is a simple game which has been overcomplicated. What has always been true is this: 4-4-2 is the best formation.”

This is an extreme example, one that serves to emphasise the difference between grounded tactical analysis and abstracted dogma. Football is not a simple game; if anything, it has been oversimplified.

Let’s crystallise the reliance of meaningful tactical analysis on empiricism. You are a tactical analyst tasked with illustrating the difference in Liverpool’s counter-pressing since Jürgen Klopp has taken over at Liverpool. First, you must outline the theory.

Rene Maric defines gegenpressing: “[It is a] means to press the opposition right after losing possession, i.e. to press as an organized unit the moment you transition to defense. The entire team hunts the ball and, in the ideal case, immediately wins it back from the opponent. The aim is twofold; to prevent the opponent’s counterattack and to win the ball.”

Next, you must test whether or not the amount that Liverpool do so has changed since Klopp took over. There is no possible way to do this without some empirical methods. It could be that you watch every match and note successful counter-presses rather than scrawl through other data, but this would just be a less efficient way of doing the same thing. You could watch two Liverpool games, one on each side of the Klopp appointment, but this would fail to be convincing.

After Klopp’s arrival at Liverpool, 3 percent more of their attacks started from them regaining the ball within four seconds of losing it. This proportion is normally relatively stable over a season – the median change for all teams over the same periods was about 1.7 percent, with Liverpool’s change putting them in the upper quartile of difference since Klopp’s appointment.


Credit: Opta and Analytics FC

We’ve established what counter-pressing is, and that Liverpool – to a fairly arbitrary definition of recovering the ball within four seconds – do more of this since Klopp’s appointment. It is here that the tactical analysis can develop, and snapshots can be used to illustrate how Liverpool are counter-pressing differently. Had their proportion of attacks started from a counter-press gone down since Klopp’s appointment, it might not have made sense to. The broader check is the process through which the more granular analysis can be validated.

The fake trade-off between tactics and stats is as prevalent in actual football as it is in its coverage. A scout for a Premier League side informed me that opposition reports tend to be based on the previous three to five games of a future opponent, and real playing decisions – set pieces, tactical choices, and so on – are based on this small sample. Where someone appropriately qualified could identify relevant information over a far larger sample with data in a way that allows the scout to focus on more specific trends, it is almost depressing.

“I think we’re just scratching the surface in terms of [the possible union of] tactical analysis and statistics,” says Ted Knutson, analytics expert and former employee of Brentford and FC Midtjylland. “You’ll never get away from watching video for analysis, nor would you want to. However, one of the big benefits of adding stats into the mix is that you can make everyone’s time more productive. You can start by breaking down the larger trends ahead of time and then watching the film.”

Arsène Wenger is a recent convert, praising the advantage that empiricism lends him in translating findings to players: “The weight is greater if you can say, ‘look, this team has conceded 70 percent of their goals from the left.’”

“Ideally, the stats guys also sit down with the coaches and analysts regularly, find out how they normally analyse matches, learn exactly what information they want from each and every game, and then cook up special recipes of stats breakdowns to give them what they need,” continues Knutson.

Football’s specialisation is damaging, but there is another barrier to the progression of analysis: conservatism. The general animosity towards tactical theory and analytical methods stems from the attitude of an uninitiated old guard. Perhaps this is natural.

Read  |  What is ‘good’ football? The role of aesthetics in the modern game

Or maybe there is genuine reason to be sceptical of the complexity of football. Bill Shankly once said “football is a simple game based on the giving and taking of passes, of controlling the ball and making yourself available to receive a pass. It is terribly simple.” The devil’s advocate may easily opine to the analyst that the actual players will have no comprehension of this complexity – “explain expected goals to Wayne Rooney.”

The flaw in this argument is equating the understanding of laws and dynamics with the existence of them at all. In a famous 1953 article, Milton Friedman argued that the theory of economics cannot be divorced from its positive (i.e. verifiable) counterpart. In it, he describes expert billiard players and a mathematical attempt to model their shots.

It wouldn’t be unreasonable to suggest that it may be a good idea to presume that the players were approximately optimising their shots subject to the mathematical constraints. This hypothesis, though, is very different to saying that the billiard players perform the actual calculations in their heads: “unless in some way or other they were capable of reaching essentially the same result, they would not in fact be expert billiard players.”

It is easy to draw a parallel analogy in football. In Soccermatics, David Sumpter analyses the physics of that Zlatan Ibrahimović goal against England. He finds that the erratic Swede had effectively maximised the margin of error in the way he kicked the ball relative to the 40 degree angle he hit it at: “if he had hit the ball harder, for example at 20 metres per second [instead of the actual 16 m/s], the error margin would have been much smaller.  Even when upside down, Zlatan minimised the probability of making a mistake.”

Modelling the best strategy in football may be a lot harder than billiards, and is almost definitely impossible to do perfectly. But the charge often raised by the old guard, that analysis aims to completely describe a sport or ‘boil it down to numbers’, is ridiculous for the same reason that football isn’t a simple game – as was mentioned earlier, empiricists can only know what they can predict in unison with what they cannot, and they are completely open about how there’s a lot of the latter. Were football a simple game, we would have solved it. That we haven’t is what makes it interesting.

The next frontier for football analysis may not be far away. There is a duality to media that is sometimes ignored – it can be said that mainstream coverage is simple because that is what readers want, but it is also true that they can only read what they get. The growing demand for niche analysis is indicative of a desire for nuance and detail that is lacking in the mainstream: These Football Times has 26,000 followers on Twitter, Spielverlagerung has 6,000, StatsBomb has 11,000, and Analytics FC has 4,000.

The biggest barrier to data analysis, getting actual data to play with, will probably be eroded by a shift in profitability from the data collection to the actual using of it. Trade-offs will be revealed to be fake by healthy communication between coaches and data analysts. Eventually, there is always a change of the guard.

By Bobby Gardiner. Follow @BobbyGardiner

No Comments Yet

Comments are closed