There was one question the nerds kept having to answer. Yes, the traditionalists would say, stats may well be useful in a stop-start game like baseball. The pitcher pitches, the batter hits, and that event provides oodles of clear data for nerds to crunch. But surely football is too fluid a game to measure?
There are many obvious irrationalities in football: goalkeepers, such as Brad Friedel, have longer careers than forwards, yet earn less
Forde responds: “Well, I think it’s a really genuine question. It’s one that we ask ourselves all the time.” However, the nerds can answer it. For a start, good mathematicians can handle complex systems. At Chelsea, for instance, one of Forde’s statisticians has a past in insurance modelling. Football – a game of 22 men played on a limited field with set rules – is not of unparalleled complexity.
Second, in recent years the fluid game of basketball has found excellent uses for data. Beane says: “If it can be done there, it can be done on the soccer field.” And third, a third of all goals in football don’t come from fluid situations at all. They come from corners, free kicks, penalties and throw-ins – stop-start set-pieces that you can analyse much like a pitch in baseball.
The new nerds could point to so many obvious irrationalities in football, especially in the transfer market, so many areas where smart clubs could clean up. For instance: goalkeepers have longer careers than forwards, yet earn less and command much lower transfer fees. Clubs often sign large players but actually tend to use the smaller ones, having belatedly realised that they have overvalued size. And few clubs have asked themselves even basic questions such as: do they earn more points when certain players are on the field?
Given that you can hire perhaps 30 statisticians for the £1.5m that the average footballer in the Premier League earns each year, you’d think it might be worth paying some nerds to study these questions. Nonetheless, to some degree football’s suspicion of numbers persists. “Letting even a top-level statistician loose with a more traditional football manager is not really the right combination,” Forde once told me. He himself looks like a football man: trim, greying, regional accent, nice suit. That helps him sell numbers to old-style football men. But, in many clubs, the nerds are only slowly gaining power. Probably every club in the Premier League now employs analysts, but some of these people get locked in computer-filled back-rooms and never meet the manager.
That’s why the data revolution was led by clubs where the manager himself trusted numbers. Arsenal and Allardyce’s Bolton began to value players in much the way that financial investors value cattle futures. Take Bolton’s purchase of the 34-year-old central midfielder Gary Speed in 2004. On paper, Speed looked too old. But Bolton, said Fleig, “was able to look at his physical data, to compare it against young players in his position at the time who were at the top of the game, the Steven Gerrards, the Frank Lampards. For a 34-year-old to be consistently having the same levels of physical output as those players, and showing no decline over the previous two seasons, was a contributing factor to say: ‘You know what, this isn’t going to be a huge concern.’” Speed played for Bolton until he was 38.
Football’s shrewdest number-crunchers have always understood that data can only support a decision about a player. They cannot determine it. Biermann tells the story of how Wenger in 2004 was looking for an heir to Arsenal’s all-action midfielder Patrick Vieira. Wenger wanted a player who could cover lots of ground. He scanned the data from different European leagues and spotted an unknown teenager at Olympique Marseille named Mathieu Flamini, who was running 14km a game. Alone, that stat wasn’t enough. Did Flamini run in the right direction? Could he play football? Wenger went to look, established that he could, and signed him for peanuts. Flamini prospered at Arsenal before joining Milan to earn even more.
Conversely, the clubs that stuck with “gut” rather than numbers began to suffer. In 2003, Real Madrid sold Claude Makélélé to Chelsea for £17m. It seemed a big fee for an unobtrusive 30-year-old defensive midfielder. “We will not miss Makélélé,” said Madrid’s president Florentino Pérez. “His technique is average, he lacks the speed and skill to take the ball past opponents, and 90 per cent of his distribution either goes backwards or sideways. He wasn’t a header of the ball and he rarely passed the ball more than three metres. Younger players will cause Makélélé to be forgotten.”
Pérez’s critique wasn’t totally wrong, and yet Madrid had made a terrible error. Makélélé would have five excellent years at Chelsea. There’s now even a position in football named after him: the “Makélélé role”. If only Real had studied the numbers, they might have spotted what made him unique. Forde explained: “Most players are very active when they’re aimed towards the opposition’s goal, in terms of high-intensity activity. Few players are strong going the other way. If you look at Claude, 84 per cent of the time he did high-intensity work, it was when the opposition had the ball, which was twice as much as anyone else on the team.”
If you watched the game, you could miss Makélélé . If you looked at the data, there he was. Similarly, if you looked at Manchester City’s Yaya Touré, with his languid running style, you might think he was slow. If you looked at the numbers, you’d see that he wasn’t. Beane says, “What stats allow you to do is not take things at face value. The idea that I trust my eyes more than the stats, I don’t buy that because I’ve seen magicians pull rabbits out of hats and I just know that rabbit’s not in there.”
Yet by the mid-2000s, the numbers men in football were becoming uneasily aware that many of the stats they had been trusting for years were useless. In any industry, people use the data they have. The data companies had initially calculated passes, tackles and kilometres per player, and so the clubs had used these numbers to judge players. However, it was becoming clear that these raw stats – which now get beamed up on TV during big games – mean little. Forde remembers the early hunt for meaning in the data on kilometres. “Can we find a correlation between total distance covered and winning? And the answer was invariably no.”
Tackles seemed a poor indicator too. There was the awkward issue of the great Italian defender Paolo Maldini. “He made one tackle every two games,” Forde noted ruefully. Maldini positioned himself so well that he didn’t need to tackle. That rather argued against judging defenders on their number of tackles, the way Ferguson had when he sold Stam. Forde said, “I sat in many meetings at Bolton, and I look back now and think ‘Wow, we hammered the team over something that now we think is not relevant.’” Looking back at the early years of data, Fleig concludes: “We should be looking at something far more important.”
. . .
7
That is starting to happen now. Football’s “quants” are isolating the numbers that matter. “A lot of that is proprietary,” Forde told me. “The club has been very supportive of this particular space, so we want to keep some of it back.” But the quants will discuss certain findings that are becoming common knowledge in soccer. For instance, rather than looking at kilometres covered, clubs now prefer to look at distances run at top speed. “There is a correlation between the number of sprints and winning,” Daniele Tognaccini, AC Milan’s chief athletics coach, told me in 2008.
That’s why Fleig cares about “a player’s high-intensity output”. Different data companies measured this quality differently, he said, “but ultimately it’s a player’s ability to reach a speed threshold of seven metres per second.” If you valued this quality, you would probably have never made the mistake Juventus did in 1999 of selling Thierry Henry to Arsenal. “For Henry to reach seven metres per second, it’s a relative coast,” said Fleig admiringly. The Frenchman got there almost whenever he ran.
Equally crucial is the ability to make repeated sprints. Tévez, Manchester City’s little forward, is a bit like a wind-up doll: he’ll sprint, briefly collapse, then very soon afterwards be sprinting again. Fleig said, “If we want to press from the front, then we can look at Carlos’s physical output and know that he’s capable of doing that for 90 minutes-plus.”
Just as clubs have learned to isolate sprints from other running, they have learned to isolate telling passes from meaningless square balls. On the screen in Carrington, Fleig flashed up a list of City’s players, ranked by how many chances each had created. One name stood out: David Silva had passed for a third more goal-scoring opportunities than any of his teammates.
The new wash of data has made it easy to compare players to players, and clubs to clubs. Wigan, for instance, were recently conceding a greater proportion of their goals from crosses than any other team in the Premier League. If you’re playing Wigan, that’s handy to know. Increasingly, clubs are acting on the data. A quant has controlled Arsenal for 15 years now, but last autumn the numbers guys took over another English giant. The Boston Red Sox’s owner John Henry, who in 2002 had tried to hire Billy Beane, bought Liverpool and immediately hired Beane’s mate Comolli to do a “Moneyball of soccer”.
From his perch at Anfield, Comolli often chats to the father of Moneyball 5,000 miles away. Beane says: “You can call him anytime. I’ll e-mail him and it will be two in the morning there and he’ll be up, and he’ll e-mail me and say, ‘Hey, I’m watching the A’s game’, because he watches on the computer. The guy never sleeps.” At Liverpool, Comolli has genuine power. He has said that data informed the club’s recent purchases of Andy Carroll and Luis Suarez for a combined £60m.
And football’s data revolution has only just got going. Fleig thinks there is an exciting future in sociograms: who passes to whom, who tends to start a team’s dangerous attacks? If you play Barcelona, that man is obviously Xavi. But in another team, the data may show that the launcher of attacks is someone unexpected. If you know the zones where he puts his key passes, you can try blocking them.
Someone who has thought harder than most about the future of soccer stats is the director of baseball operations at the Oakland A’s. Farhan Zaidi is a round MIT economics graduate with a sense of humour. He’s the sort of guy you’d expect to meet late one night in a bar in a college town, after a gig, not at a professional sports club. For work, Zaidi crunches baseball stats. But he and Beane spend much of their time at the Coliseum arguing about their other loves: the British band Oasis, and soccer. In 2006, in the middle of the baseball season, they travelled to the soccer world cup in Germany together. Zaidi chuckled: “We spend so much time together, that if all we ever talked about was the numbers on these spreadsheets, we would have killed each other a long time ago.”
Because Zaidi knows where the data revolution in baseball has gone, he can make predictions for soccer. The sport’s holy grail, he thinks, is a stat he calls “Goal Probability Added”. That stat would capture how much each player’s actions over his career increased the chance of his team scoring (for instance, whenever he successfully passed the ball five yards forward from the halfway line), or decreased it (for example, whenever his pass was unsuccessful). I asked Zaidi whether one day pundits might say things like, “Luis Suarez has a Goal Probability Added of 0.60, but Carroll’s GPA is only 0.56.”
Zaidi replied, “I tend to think that will happen, because that’s what happened in baseball. We talk now about players in ways that we wouldn’t have dreamed of 10 or 15 years ago.”
In their ancient battle against the jocks, the nerds are finally taking revenge.