Tag Archives: statistics

Why Always Me – Mario Balotelli

Mario Balotelli’s apparent imminent signing at Liverpool looks like the most surprising transfer of the summer, especially after Brendan Rodger’s explicitly denied any interest in the Italian during the summer tour of the United States. So why have Liverpool reneged on their initial public denial?

I have detailed figures going back to 10/11 so I can take a fairly comprehensive look at what he will bring to an all ready formidable Reds attack. So what is Balotelli’s style? Well he won’t be a forward who will create a lot of chances for his fellow teammates. The average key passes per 90 figure for a forward in my 10 season database is 1.4 per 90. Balotelli has hit somewhere between 0.7 and 1.6 in his time between City and Milan per season. He’ll probably create at around the rate an average forward would so it’s not really a big tool in his attacking toolbox. His expected assists over the last 4 years has ranged between 0.10 and 0.13 per 90 with the average amongst strikers being at 0.14.

Balotelli - Key Passes 13/14

Balotelli – Key Passes 13/14

Does he get involved in the attacking play? Will he make something happen in the final third? Well the average number of attempted passes per 90 into the final third for top strikers is around 14, Balotelli will give you circa 15 per 90. He likes to drift out to the right and come inside, but that’s pretty standard for a right-footed attacker to drift to that side and come in on his favourite shooting foot.

These average figures are in no way a reflection of the quality of the player, it’s just a stylistic guide, if you will. His dribbles per 90 are a little more interesting though. Whether it was tactical or not, he had a crazy first half year from the January transfer window at Milan, where he hit 6.2 attempted dribbles per 90. In his City days he was averaging around 3.3 dribbles per 90, then last season at Milan he hit 4.9 dribbles per 90. It would be interesting to know from somebody more clued up on Italian football whether this was more of a systemic issue or an individual/psychological one. Did he simply get more confidence?


I’ve seen a lot of tweets on social media regarding Balotelli’s very poor shot conversion over a number of years. Shot conversion isn’t repeatable (expected goals is, somewhat) and thus what a player converts at in year N hasn’t much bearing on what a player might do in year N+1. Is 7% shot conversion bad? Well maybe, but in simple terms, think of this, 7% of 100 shots gives you a different goal return as 7% of 300 shots. So is 7% conversion still bad? As always, it depends on the context.

I have expected goal numbers for Balotelli over his last 4 seasons. So I’ll divide that time up into time he spent at Man City and time he spent at AC Milan. Looking at his numbers for City in the 12/13 season (before he was transferred to Milan in the January transfer window) seems a little pointless, as he played less than 600 minutes, and there’s just too much sample size issues, not to mention strength of schedule bias.

Expected Goals EPL Per 90 10/11 = 11/12

Expected Goals EPL Per 90 10/11 = 11/12

It’s worth remembering Balotelli was 20-21 years of age in these two seasons in the Premiership. Ok so in his first season he played less than 30% of the available minutes, but scored 0.34 goals P90, expected goals per 90 was at 0.43, and shots per 90 at 3.8, these are all very good baseline numbers for a 20 year old, and kind of gave you the feeling something big was about to happen. And it did.

Balotelli Shots 11/12 - Larger=goals

Balotelli Shots 11/12 – Larger=goals

Things really took off for Balotelli in the 11/12 season, which of course was Man City’s dramatic tittle-winning year. Again though, the problem here was he just didn’t get enough minutes. Having said that these are some elite numbers for a striker. In the last 4 season in the Premier League, of the 486 players to play more than 900 minutes and take >30 shots only 6 of those players had a better expected goals per 90 greater than 0.69. Both his goals per 90 and shots per 90 also went to an elite level in 11/12, which really was an indication that Balotelli’s career was on an upward curve. Onward to Milan.

Expected Goals Serie A 12/13 & 13/14

Expected Goals Serie A 12/13 & 13/14

In 12/13 something happened to Balotelli’s shot volume. He starting hitting 5.6 shots per 90. Over the last 2 seasons in the top 5 leagues only Suarez can better that number in a single season. But why had Balotelli suddenly become a shot monster? It’s difficult to figure out whether this was part of his natural progression as a striker or that it was something more systemic that brought it out in him at Milan. Milan weren’t very good last year. It doesn’t look like it was brought about by position either, as I could only find 5 occasions in his Milan career where he started slightly wider of another striker, incidentally those times the system used was a 4-1-2-1-2 diamond (Brendan Rodgers take note). Though this doesn’t take into account positional changes during matches, so might be a touch misleading.

Balotelli maintained his expected goals per 90 but his Xgoals per shot dramatically decreased from his time at City. Plummeting from a high of 0.126 expected goals per shot to just 0.08. On a per shot basis Balotelli had lower value chances, but was able to maintain his XGoals per 90 numbers by way of increasing his shot volume. He went from taking around 40% of his shots from prime at City to taking just 20% of his shots from prime at Milan. In that same year at Milan he took an incredible 75% of his shots from outside the box. Having done all of that he still kept his goals per 90 at a very decent 0.47.

Balotelli - Shots 12/13 (larger=goals)

Balotelli – Shots 12/13 (larger=goals)

Again there was a similar pattern last season. Expected goals was maintained above 0.4 per 90, not elite in itself, but a decent return for a striker, considering you’d expect your striker to outperform Xgoals in probably 3 out of every 4 seasons. In the context of a full season, if he played 38 90’s that would garner him 15 goals. His shots per 90 increased again in 13/14 to nearly 5.8 per 90, which is the highest of any player playing more than 900 minutes in the last 2 seasons in the top 5 leagues. And for the first time in 4 seasons Balotelli had managed to play more than 2,000 minutes in a single season. Again he took a measly 20% of his shots from prime, and a massive 65% of shots from outside the box. Except this time he outperformed his expected goals from outside the box due to scoring 4 goals from 41 free kicks. It’s unclear whether this was skill or luck as the previous season seen just a 1 goal return from 34 free kicks.

Balotelli Shots - 13/14 (larger=goals)

Balotelli Shots – 13/14 (larger=goals)

I am always wary when I see player score a number of goals from outside the box. So I tend to check their past record to see if they’ve previously shown any history of scoring regularly from outside. Balotelli’s done it just once in the last 4 seasons, which suggests to me he might have got a bit lucky with those long range efforts last season.

So in conclusion, what are the numbers telling us. He won’t create for his teammates at a high level. He will attempt a lot of dribbles and try to make something happen himself, and while he won’t get involved in the build up play to the extent of a striker like Suarez, he will get involved. He’s become a shot monster over the last 2 seasons, my instinct tells me this is just a natural progression for him rather than a systemic one brought about by Milan’s tactics or deficiencies. Systemic or not though, it’s a worrying trend only 20% of his shots came from dangerous areas and on average at Milan 70% of his shots came from outside the box. That’s not where you want your strikers taking shots from. Lastly on the negative side, for whatever reason, he’s played less than 50% of the available minutes to him over the last 4 seasons. This is a big worry.

On the plus side, and I feel this is a major plus, he’s regularly managed greater than 0.4 expected goals per season. In my database I could only find one other player who managed that, and it was Van Persie. Neither Suarez or Sturridge could. A caveat applies to Balotelli’s lack of minutes in some of those seasons though. Apart from 10/11 at City, he’s also managed greater than 0.4 goals per 90 in each of his 3 other seasons. So his output is there, and this is really promising.

Weaknesses: reliability and consistency in getting minutes on the pitch. Too many shots from low value areas.

Strengths: Dribbles, shot volume (but needs to be proportioned better), consistent in expected goals and goals per 90.

Verdict: There’s a very, very good player in there. The question is can Brendan Rodgers and Liverpool bring it out of him at a consistent level. Personally, he’s never really impressed me when I’ve seen him play, I always thought, hmm “much ado about nothing”. Maybe I watched the wrong games though. But at 24 years of age, and at a good price the risk to reward ratio is very positive. If I was asked for one word to describe his career to date? Erratic. And therein lies the crux of the matter.

Testing Repeatability – Player Level

So yeah, this is just going to be a quick post to deal with some house-keeping. I’ve run a series of tests to check the repeatability of the various metrics I use. These are all done at player level, I plan on doing the same at team level at some stage. There will be no fancy Tableau graphics here! Just plain old Excel scatter plots. So here is a rundown of what I found. These may, or may not be useful for somebody.

GPS – Goal Probability Per Shot per 90

GPS (Expect goals/non-pen Shots)

Expected Goals per 90

Expected Non-Pen Goals Per 90

Expected Goal Difference Per 90

EXPGoalDiffP90 (Actual Goals-EXPGoals) Top 4 Leagues

Expected Goals From Shot Placement per 90

EXPGoals From Shot Placement (on target shots)

Expected Goals Shot Placement Difference Per 90

XGSPDiff P90 (Actual Goals-XGSP)

Expected Goals Shot Placement per Shot per 90

XGSP_GPS (XGSP/non-pen shots)

Shot Placement Extra Goals per 90 (SPEG)



Expected Goals – Shot Placement

After the Premier League season ended last year I was wondering why there aren’t more shot placement models out there. There has been some work done on it over at www.statsbomb.com but nothing I could find of note since. I was surprised by this, because if you want to measure finishing skill, isn’t shot placement (along with technique and other variables) a rather large part of a player’s goal scoring armoury. There doesn’t seem to be ‘technique’ data available, at least not in the public domain, and I don’t even know if OPTA (or anyone else) collect data regarding how a player hits the ball? e.g. toe poke, instep, volley etc. It strikes me that EXPGoals is just a “quality of chance from shot location” measurement, and doesn’t directly deal with finishing skill. Indirectly you can measure the difference between actual goals scored and expected goals (which I have done, and gives you an EXPGoalDiff + or -) which could indicate whether or not a player is better than the average player at converting his chances into goals, but for me that’s taking a big leap forward, without understanding why one player scores more than the average. Like others I have found no year on year correlation for over-performing in expected goals, with an R2 of just 0.002. So EXPGoalDiff can tell you what may have happened in a particular season, but has no predictive powers of what might happen in the next season.

XGDiffP90 (Actual Goals-EXPGoals)

EXPGoals deals with variables up until the moment the player touches the ball to shoot. But a lot can happen between touching the ball and ending up in the back of the net. How the ball is hit, with bend, without bend, velocity, shot placement and external factors such as weather and opposition player positions etc etc. Even if EXPGoal difference was repeatable, it could INDICATE finishing ability, but it won’t tell us why, and I like to understand things, so the why really bugs the hell out of me.

Shot Placement With all these other factors I mentioned I think shot placement data is the only variable that is in the public domain, and even then only the top 5 Leagues over the last 2 seasons. So after the EPL season finished last year I started collecting shot placement data. That was quickly put on hold during the World Cup, but since then I’ve been beavering away. I managed 4 of the top 5 Leagues, France will have to wait, I just didn’t have the staying power. Sorry France. Upon finishing I got to work on the shot placement model and connecting the data between EXPGoals and shot placement. My idea was, that I wanted to control for the exact same variables as the EXPGoal model. That way I’d could compare the same shot from both perspectives. i.e. I’d have an expected goal value just before the shot was struck, and an expected goal value after the shot was struck. I could then see the difference between the two values and by that, know how much any individual player had increased/decreased their chances of scoring, just by where they placed the ball in the goal. I’d also be controlling for a whole host of actions related to shooting and thus hopefully get some decent outputs. And as I’m writing, I tweeted about shot placement models and have just been tweeted this, which is a piece by Devin Pleuler; http://www.optasportspro.com/about/optapro-blog/posts/2014/on-the-topic-of-expected-goals-and-the-repeatability-of-finishing-skill.aspx And there I was thinking I had an original idea.

EXP Goal Zones

Obviously off target shots can’t be scored and as such have an expected goal value of zero so won’t be included. I took all on target shots, and controlled for the same inputs (location, type of shot etc) as my EXPGoal model, with the added qualifier of separating each instance into separate parts of the goal.

Goa Sections

I divided the goal up into 6 boxes, see above, and got an EXPGoal value for each location in the goal. Why these boxes specifically? I needed at least 6 to delineate from central and corners, but couldn’t go any more than 6 as I’d run into sample size issues. Ideally you’d probably want at least 10 areas, an extra 2, top and bottom, either side of the central boxes. But like I said, sample size issues, and each box added creates a mountain of extra work. Let’s just take a quick example of an instance: one specific instance could be, all non-headed on target shots taken from Zone C and placed in the top right corner of the goal, which are converted at 60%. (Or an XGSP value of 0.60) I done the same for each section of the shot placement area, top left, top centre and so on. The same for headed shots in Zone C, and for every other zone marked on the pitch above. This, took a lot of bloody time, and I have to admit I nearly gave up on more than one occasion. Now each shot on target has an expected goal value before the shot is struck and after the shot is struck.

On to those messy acronyms. For want of a better name, I’m going to call it Expected Goals Shot Placement or XGSP for short. Lets first take a look at whether XGSP-P90 correlates to GoalsP90.


A pretty strong correlation at 0.771, which is what you would expect, the better your shot placement the more goals you should score.

Shot Placement Extra Goals Now I’m going to introduce another pesky new acronym, SPEG, or Shot Placement Extra Goals, which is just the difference between expected goals (from on target shots, pre-shot – based on location, type of shot etc) and expected goals from shot placement (post-shot – based on all of the variables in EXPGoals, with shot placement added in). I’ve leant away from using ‘finishing skill’ as a name, because for me it’s not finishing skill, as I believe finishing skill incorporates a whole host of different skills, and shot placement is just one of those skills.

So at a basic level, over a full season, if we look at each shot a player takes, give it an EXP goal value pre-shot, then give an Exp goal value post-shot, based on shot placement, and if that player can show that they have increased their probability of scoring, just by their shot placement, doesn’t that show some skill at putting the ball in the back of the net? It should do, but we could run into the same problems as EXPGoalDiff and things like Shot Conversion %. They just aren’t very repeatable year on year. I ran two tests, firstly on just the EPL alone (because I needed to test before I continued collecting data for other leagues) in the last two seasons, where R2=0.47 and then I tested the Top 4 leagues, (EPL, La Liga, Bundesliga, Serie A) where player x had >=10 shots in year N and year N+1 and here’s what I found.

Shot Placement Extra Goals

An R2 of 0.427, while probably not a good result in any other type of metric is significant enough when it comes to conversion/goal scoring. Certainly enough to warrant more investigation. Ideally I’d like to go back at least 5 seasons to test it, but still, there is some shot placement skill evident, and these are just my initial findings, so I haven’t had much time to digest the implications. I also decided to do so some further visual tests to see if things are what they seem. As a side note, the huge outlier at 0.9 is Morata. I was wondering the same myself.

Visual Tests If you follow me on Twitter you’ll know I like to post these scatter plots which I call dashboards. I like the fact that they can show 4 or 5 different metrics at any one time. I mostly plot them with similar type metrics that give some context to all the metrics as a whole. Here I’ve plotted EXGoalsP90 on the vertical axis and SPEGP90 (shot placement extra goals) on the horizontal. GPS, or expected goals per shot is coloured, and goalsP90 (output is also important!) is referenced by the size of the coloured circles.


Visually, SPEGP90 looks good, the players who you’d expect to do well are doing well. It’s encouraging that the likes of Messi, Ronaldo, Suarez, Dzeko, and Sturridge all appear above 2 standard deviations in both metrics for both seasons.

Edge Case – Mertens Ok, so that’s good, lets take a look at some edge cases (apologies – that’s the programmer in me coming out) or outliers and see what we find. First of all, Dries Mertens. Colour-wise he’s in the kind of blue-green range which means on a per shot basis he has low expected goals, and on a per 90 basis he’s also going to be low. His shots per 90 are at 3.9 so that’s quite high. So lots of shots, but low value chances of converting, which usually means shots from outside the box. But his SPEG-P90 is above 2 standard deviations which indicates that by way of his shot placement he’s increased his expected rate of scoring somewhat. Visually, lets see what that looks like. First his shot chart from last season, remember, it’s heat map orientated, the hotter the shot the higher chance of converting and vica-versa for the colder shots. Larger dots represent goals, X’s represent headers.

Mertens Non-Pen Shots

Pretty much as expected based on his GPS and XGP90 on the scatter plot. Lots of shots from outside the box, that have an obvious low scoring probability. Next let’s take a look at his shot placement.

Mertens Shot Placement

Before we even consider the numbers, visually, if you look at the sheer volume of his low value chances on his shot chart above (blue shots), then compare his shot placement it looks quite good. Only 5 of his 36 shots on target where placed down the centre. 26 of his on target shots had an expected goal value (pre-shot) of less than 0.06, yet after the shot was taken 30 of those on target shots had a SPEG value of greater than 0.089. So yeah, in this instance you could say his SPEG numbers match what is happening visually.

Edge Case – Destro Lastly lets take a quick look at another outlier. Destro in the 12/13 season, he’s in the top left of the plot above. Here’s his shot chart.

Desto Non-Pen Shots

GPS and EXPGoals both indicated high value chances and it’s clear from his shot chart that most of his shots came from prime central in the danger zone. Only 2 of Destro’s shots came from outside the box and 12 of his 22 shots on target had an expected goal value greater than 0.30. High quality chances indeed. But his SPEGP90 indicates he increased is expected probability of scoring by 0.167 per 90, whilst the average increase over the plot is 0.133. So he’s slightly above average, which is not really that good. Let’s look at his shot placement chart.

Destro Shot Placement

Again visually, it seems clear that the reason his increase from expected goals (pre-shot) to SPEG (post shot) is low, is because he hit most of his shots low centre, which is really goalkeeper territory and has a much lower chance of being scored. It’s still early days, but it’s nice to know the model is working as it should be, and that the numbers, for now, pan out visually.

Future Improvements Well the inputs in the model probably won’t be improved much as I can’t sub-divide the categories any further without running into sample size issues. Not to mention the enormous amount of work it would involve to tinker with it in that way. In fact, I spent so much time on it I’m fed up looking at the numbers at this stage. For now it will interest me, just to to use it for the coming season and see what I can learn and what it’s best application is. I have no formal statistical training or background, so this is a hobby, and a very time-consuming one at that. I’ll continue collecting the data and input it into the model for the coming season, but if it comes too much of a burden I’ll have to stop.

In a visual sense, I would like to connect both shot charts and shot placements in the goal to show the increase before and after the shot has been taken. The holy grail would be in some sort of 3D environment, but that would take an awful lot of coding and again I’m not sure I have the time.

What I would like to do before the season starts is look at SPEG at a team level. I’m aware though that shot placement is really an individual based skill, but I think it might be interesting to discover what it says at a more macroscopic level. In particular, SPEG conceded, and maybe SPEG total shot ratio. Though I’m not that hopeful of either being that predictive, nonetheless, it will be fun to find out. I think.

Feedback welcome, as I got so caught up with this I might have missed something that’s just plain obvious.