After the Premier League season ended last year I was wondering why there aren’t more shot placement models out there. There has been some work done on it over at www.statsbomb.com but nothing I could find of note since. I was surprised by this, because if you want to measure finishing skill, isn’t shot placement (along with technique and other variables) a rather large part of a player’s goal scoring armoury. There doesn’t seem to be ‘technique’ data available, at least not in the public domain, and I don’t even know if OPTA (or anyone else) collect data regarding how a player hits the ball? e.g. toe poke, instep, volley etc. It strikes me that EXPGoals is just a “quality of chance from shot location” measurement, and doesn’t directly deal with finishing skill. Indirectly you can measure the difference between actual goals scored and expected goals (which I have done, and gives you an EXPGoalDiff + or -) which could indicate whether or not a player is better than the average player at converting his chances into goals, but for me that’s taking a big leap forward, without understanding why one player scores more than the average. Like others I have found no year on year correlation for over-performing in expected goals, with an R2 of just 0.002. So EXPGoalDiff can tell you what may have happened in a particular season, but has no predictive powers of what might happen in the next season.
EXPGoals deals with variables up until the moment the player touches the ball to shoot. But a lot can happen between touching the ball and ending up in the back of the net. How the ball is hit, with bend, without bend, velocity, shot placement and external factors such as weather and opposition player positions etc etc. Even if EXPGoal difference was repeatable, it could INDICATE finishing ability, but it won’t tell us why, and I like to understand things, so the why really bugs the hell out of me.
Shot Placement With all these other factors I mentioned I think shot placement data is the only variable that is in the public domain, and even then only the top 5 Leagues over the last 2 seasons. So after the EPL season finished last year I started collecting shot placement data. That was quickly put on hold during the World Cup, but since then I’ve been beavering away. I managed 4 of the top 5 Leagues, France will have to wait, I just didn’t have the staying power. Sorry France. Upon finishing I got to work on the shot placement model and connecting the data between EXPGoals and shot placement. My idea was, that I wanted to control for the exact same variables as the EXPGoal model. That way I’d could compare the same shot from both perspectives. i.e. I’d have an expected goal value just before the shot was struck, and an expected goal value after the shot was struck. I could then see the difference between the two values and by that, know how much any individual player had increased/decreased their chances of scoring, just by where they placed the ball in the goal. I’d also be controlling for a whole host of actions related to shooting and thus hopefully get some decent outputs. And as I’m writing, I tweeted about shot placement models and have just been tweeted this, which is a piece by Devin Pleuler; http://www.optasportspro.com/about/optapro-blog/posts/2014/on-the-topic-of-expected-goals-and-the-repeatability-of-finishing-skill.aspx And there I was thinking I had an original idea.
Obviously off target shots can’t be scored and as such have an expected goal value of zero so won’t be included. I took all on target shots, and controlled for the same inputs (location, type of shot etc) as my EXPGoal model, with the added qualifier of separating each instance into separate parts of the goal.
I divided the goal up into 6 boxes, see above, and got an EXPGoal value for each location in the goal. Why these boxes specifically? I needed at least 6 to delineate from central and corners, but couldn’t go any more than 6 as I’d run into sample size issues. Ideally you’d probably want at least 10 areas, an extra 2, top and bottom, either side of the central boxes. But like I said, sample size issues, and each box added creates a mountain of extra work. Let’s just take a quick example of an instance: one specific instance could be, all non-headed on target shots taken from Zone C and placed in the top right corner of the goal, which are converted at 60%. (Or an XGSP value of 0.60) I done the same for each section of the shot placement area, top left, top centre and so on. The same for headed shots in Zone C, and for every other zone marked on the pitch above. This, took a lot of bloody time, and I have to admit I nearly gave up on more than one occasion. Now each shot on target has an expected goal value before the shot is struck and after the shot is struck.
On to those messy acronyms. For want of a better name, I’m going to call it Expected Goals Shot Placement or XGSP for short. Lets first take a look at whether XGSP-P90 correlates to GoalsP90.
A pretty strong correlation at 0.771, which is what you would expect, the better your shot placement the more goals you should score.
Shot Placement Extra Goals Now I’m going to introduce another pesky new acronym, SPEG, or Shot Placement Extra Goals, which is just the difference between expected goals (from on target shots, pre-shot – based on location, type of shot etc) and expected goals from shot placement (post-shot – based on all of the variables in EXPGoals, with shot placement added in). I’ve leant away from using ‘finishing skill’ as a name, because for me it’s not finishing skill, as I believe finishing skill incorporates a whole host of different skills, and shot placement is just one of those skills.
So at a basic level, over a full season, if we look at each shot a player takes, give it an EXP goal value pre-shot, then give an Exp goal value post-shot, based on shot placement, and if that player can show that they have increased their probability of scoring, just by their shot placement, doesn’t that show some skill at putting the ball in the back of the net? It should do, but we could run into the same problems as EXPGoalDiff and things like Shot Conversion %. They just aren’t very repeatable year on year. I ran two tests, firstly on just the EPL alone (because I needed to test before I continued collecting data for other leagues) in the last two seasons, where R2=0.47 and then I tested the Top 4 leagues, (EPL, La Liga, Bundesliga, Serie A) where player x had >=10 shots in year N and year N+1 and here’s what I found.
An R2 of 0.427, while probably not a good result in any other type of metric is significant enough when it comes to conversion/goal scoring. Certainly enough to warrant more investigation. Ideally I’d like to go back at least 5 seasons to test it, but still, there is some shot placement skill evident, and these are just my initial findings, so I haven’t had much time to digest the implications. I also decided to do so some further visual tests to see if things are what they seem. As a side note, the huge outlier at 0.9 is Morata. I was wondering the same myself.
Visual Tests If you follow me on Twitter you’ll know I like to post these scatter plots which I call dashboards. I like the fact that they can show 4 or 5 different metrics at any one time. I mostly plot them with similar type metrics that give some context to all the metrics as a whole. Here I’ve plotted EXGoalsP90 on the vertical axis and SPEGP90 (shot placement extra goals) on the horizontal. GPS, or expected goals per shot is coloured, and goalsP90 (output is also important!) is referenced by the size of the coloured circles.
Visually, SPEGP90 looks good, the players who you’d expect to do well are doing well. It’s encouraging that the likes of Messi, Ronaldo, Suarez, Dzeko, and Sturridge all appear above 2 standard deviations in both metrics for both seasons.
Edge Case – Mertens Ok, so that’s good, lets take a look at some edge cases (apologies – that’s the programmer in me coming out) or outliers and see what we find. First of all, Dries Mertens. Colour-wise he’s in the kind of blue-green range which means on a per shot basis he has low expected goals, and on a per 90 basis he’s also going to be low. His shots per 90 are at 3.9 so that’s quite high. So lots of shots, but low value chances of converting, which usually means shots from outside the box. But his SPEG-P90 is above 2 standard deviations which indicates that by way of his shot placement he’s increased his expected rate of scoring somewhat. Visually, lets see what that looks like. First his shot chart from last season, remember, it’s heat map orientated, the hotter the shot the higher chance of converting and vica-versa for the colder shots. Larger dots represent goals, X’s represent headers.
Pretty much as expected based on his GPS and XGP90 on the scatter plot. Lots of shots from outside the box, that have an obvious low scoring probability. Next let’s take a look at his shot placement.
Before we even consider the numbers, visually, if you look at the sheer volume of his low value chances on his shot chart above (blue shots), then compare his shot placement it looks quite good. Only 5 of his 36 shots on target where placed down the centre. 26 of his on target shots had an expected goal value (pre-shot) of less than 0.06, yet after the shot was taken 30 of those on target shots had a SPEG value of greater than 0.089. So yeah, in this instance you could say his SPEG numbers match what is happening visually.
Edge Case – Destro Lastly lets take a quick look at another outlier. Destro in the 12/13 season, he’s in the top left of the plot above. Here’s his shot chart.
GPS and EXPGoals both indicated high value chances and it’s clear from his shot chart that most of his shots came from prime central in the danger zone. Only 2 of Destro’s shots came from outside the box and 12 of his 22 shots on target had an expected goal value greater than 0.30. High quality chances indeed. But his SPEGP90 indicates he increased is expected probability of scoring by 0.167 per 90, whilst the average increase over the plot is 0.133. So he’s slightly above average, which is not really that good. Let’s look at his shot placement chart.
Again visually, it seems clear that the reason his increase from expected goals (pre-shot) to SPEG (post shot) is low, is because he hit most of his shots low centre, which is really goalkeeper territory and has a much lower chance of being scored. It’s still early days, but it’s nice to know the model is working as it should be, and that the numbers, for now, pan out visually.
Future Improvements Well the inputs in the model probably won’t be improved much as I can’t sub-divide the categories any further without running into sample size issues. Not to mention the enormous amount of work it would involve to tinker with it in that way. In fact, I spent so much time on it I’m fed up looking at the numbers at this stage. For now it will interest me, just to to use it for the coming season and see what I can learn and what it’s best application is. I have no formal statistical training or background, so this is a hobby, and a very time-consuming one at that. I’ll continue collecting the data and input it into the model for the coming season, but if it comes too much of a burden I’ll have to stop.
In a visual sense, I would like to connect both shot charts and shot placements in the goal to show the increase before and after the shot has been taken. The holy grail would be in some sort of 3D environment, but that would take an awful lot of coding and again I’m not sure I have the time.
What I would like to do before the season starts is look at SPEG at a team level. I’m aware though that shot placement is really an individual based skill, but I think it might be interesting to discover what it says at a more macroscopic level. In particular, SPEG conceded, and maybe SPEG total shot ratio. Though I’m not that hopeful of either being that predictive, nonetheless, it will be fun to find out. I think.
Feedback welcome, as I got so caught up with this I might have missed something that’s just plain obvious.