And now, a few words about the Player Efficiency Rating, or PER.

As a statistics guy, I am generally wary of how statistics are used in sports. This is not a matter of not believing in what I do, it's more that I know where the numbers come from, so I know what they can say and what they can't. And it drives me a little batty to see some statisticians—people who I think should know better—take too much stock in their statistics, especially if it's statistics that they had a hand in crafting.

Take, for instance, the PER, which has its roots in the Sabermetric movement in baseball and is the basketball equivalent of OPS (On-Base Percentage plus Slugging Average). Roughly speaking, we can divide all basketball statistics into two broad groups. One group consists of raw observables, such as steals, blocks, minutes played, three-pointers attempted and made, and so forth. PER does not fall into this category.

PER falls in the second category of aggregate statistics, which are combinations (often but not always linear combinations) of other statistics. As a way of accounting for all the various things that a player might do to help his team out, PER combines a slew of raw observables into a formula, which reduces to a single number. There is no unique PER formula, but the most popular one was developed by John Hollinger. Its output is normalized, so that the league average is 15. Hollinger has developed a heuristic for judging players based on PER:

As a statistics guy, I am generally wary of how statistics are used in sports. This is not a matter of not believing in what I do, it's more that I know where the numbers come from, so I know what they can say and what they can't. And it drives me a little batty to see some statisticians—people who I think should know better—take too much stock in their statistics, especially if it's statistics that they had a hand in crafting.

Take, for instance, the PER, which has its roots in the Sabermetric movement in baseball and is the basketball equivalent of OPS (On-Base Percentage plus Slugging Average). Roughly speaking, we can divide all basketball statistics into two broad groups. One group consists of raw observables, such as steals, blocks, minutes played, three-pointers attempted and made, and so forth. PER does not fall into this category.

PER falls in the second category of aggregate statistics, which are combinations (often but not always linear combinations) of other statistics. As a way of accounting for all the various things that a player might do to help his team out, PER combines a slew of raw observables into a formula, which reduces to a single number. There is no unique PER formula, but the most popular one was developed by John Hollinger. Its output is normalized, so that the league average is 15. Hollinger has developed a heuristic for judging players based on PER:

- A Year for the Ages: 35.0
- Runaway MVP Candidate: 30.0
- Strong MVP Candidate: 27.5
- Weak MVP Candidate: 25.0
- Bona Fide All-Star: 22.5
- Borderline All-Star: 20.0
- Solid 2nd Option: 18.0
- 3rd Banana: 16.5
- Pretty Good Player: 15.0
- In the Rotation: 13.0
- Scrounging for Minutes: 11.0
- Definitely Renting: 9.0
- On the Next Plane to Yakima: 5.0

Forget for the moment the bottom end of the ranking. By definition, the average player has a PER of 15, and since there are so many apparently average players in the NBA, there should be a lot of players around 15, and there are.

What about the top end? We would expect that there would be precious few players beyond a PER of 25 for any given season, and that turns out to be true. Doesn't that on its own mean that PER is a good measure of player performance?

On its own, no. The PER formula is not derived from first principles; it's an individual attempt to capture the effectiveness of a player, and as such is a carrier of the arbitrary priorities of the PER designer. One could also design PER to positively weight turnovers, missed shots, and personal fouls, and still have most of the players in the league around 15, and a precious few above 25. Only now it would be the very worst players who would show up at the top. That's an extreme example, of course—no one would actually design PER that way—but all that means is that the arbitrary nature of PER is more constrained.

To see what I mean, suppose for the sake of simplicity that we're only interested in capturing two raw observables: points and rebounds. Let's look at a few hypothetical players.

Wade James: 30 points, 5 rebounds

Howard Williams: 20 points, 14 rebounds

Chris Bryant: 28 points, 8 rebounds

And suppose also that the league average for points is 10 and the league average for rebounds is 5. So one possible formula for PER would be points + rebounds. It's easy to see that the league average for this PER would be 10 + 5 = 15. By this measure, Wade James has a PER of 35, Howard Williams a PER of 34, and Chris Bryant a PER of 36. So Bryant has the highest PER. But it's close.

It's so close, in fact, that it's almost an incidental consequence of the way we designed PER. If we wanted a higher weighting for rebounds and a lower one for points, we could have another formula for PER: 0.5 × points + 2 × rebounds. In that case, the PERs would be 25 for James, 38 for Williams, and 30 for Bryant. Here, it's a runaway for Williams. Or, we could do the reverse, and make the formula 1.25 × points + 0.5 × rebounds. Then the PERs would be 40 for James, 32 for Williams, and 39 for Bryant, and James now has the highest PER. In all these cases, the league average PER is 15, and yet any of the three superstars could end up on top, depending on which PER formulation you choose.

There is, in mathematics, the notion of vector domination. In these terms, one player dominates another if none of his statistics are lower than the other's, and at least one is higher. For instance, 20 points and 6 rebounds dominates 14 points and 4 rebounds, and is in turn dominated by 25 points and 6 rebounds. None of them dominates, or is dominated by, 28 points and 3 rebounds. It can be shown that with any sensible definition of PER, in our limited context where we're only interested in points and rebounds, if one player dominates another, his PER is guaranteed to be higher. That's not surprising, since there should be no doubt that he's better, if we only care about points and rebounds.

Note that none of our three hypothetical players is dominated by any of the others. That's almost inevitable when you're comparing superstars. Because they're superstars, chances are good that each one does one thing better than all the rest, which means that no superstar can dominate another. Superstars will dominate the majority of players in the league, but not each other. As a result, one can define PER in such a way to put almost any given superstar on top, and which one ends up on top says as much (if not more) about the PER designer's predilections for skills as it does about the top players.

The crazy thing is that PER is probably very good indeed for comparing journeyman players, and Hollinger routinely uses it for that. But most PER fans don't seem to be interested in that. They only want to compare the top players with PER, and as you've just seen if you read this far, I think it's a pretty subjective way to do that. But most people associate statistics with objectivity, and people with subjectivity, with the end result that (a) fans of the player that ends up with the highest PER lord over fans of the other stars, and (b) those fans of the other stars start hurling invectives and accusations of bias at the PER designer (usually Hollinger). I can't count the number of times Hollinger has been called a Lakers hater just because Kobe Bryant doesn't end up with the highest PER.

To be fair, I think Hollinger brings some of that on himself, since he himself uses PER to compare the top players. Although I think he should know better than to do that, I don't really blame him; if I designed a PER, I'd probably use it for that, too.

Which is the very first reason I've never been tempted to design a PER.

EDIT: Here's a graphical representation of the various PER formulas in our hypothetical scenario. (Click to enlarge.) Points are plotted along the vertical axis, and rebounds along the horizontal axis. The green lines represent "iso-PERs": lines along which the initial PER is constant, at either 15, 25, or 35. Red lines represent the rebound-heavy iso-PERs, and blue lines represent the scoring-heavy iso-PERs.

Take that Hollinger!!

ReplyDeleteHeh. I'm sure he doesn't give a darn what I think about his use of the PER.

ReplyDeleteAs long as ESPN is still cutting the checks...

ReplyDeleteSo if an effort is made to determine the relative importance of stats to winning, wouldn't that matter?

ReplyDeleteRelatedly, any thoughts on Dave Berri's approach?

@Westy: What you're suggesting is essentially the approach taken by Berri. It's essentially a PER (although I'm not *certain* it's linear), with the weights normalized to equate with wins, rather than a league average of 15.

ReplyDeleteIt's different in that the aggregate statistic is expressed in terms of something concrete--wins--rather than an arbitrary unit, as with Hollinger's PER. The problem still exists, though, that the weightings are underdetermined. There are lots of different weightings that equate to wins appropriately, and you can still have different people with highest win shares. They've chosen one that they believe works well, and I can respect that. Nonetheless there is still a degree of arbitrariness there.

More fundamentally, the problem with these aggregate statistics is that they don't incorporate enough information about the context: teammates, opponents, offensive or defensive approach, etc. They also rely largely on statistics that are somewhat outdated from a sports-o-metric perspective, with dubious reliability (especially with something like assists).

There's a kind of duality between PER and WS like statistics on one hand, and APM and related statistics on the other. Neither really shows the whole picture. I talk about that in my post on APM.

Ultimately (and this is something I was chatting about with Henry Abbott), you'll see something where the game is instrumented and encoded far more extensively than a human scorer can do--at least in real time. Analysis applications will be able to go through the database of plays and determine with increased fidelity what a given player actually contributed to a given play. You'll still have box scores as a concise way of expressing what happened in a game to humans (although some of the statistics might evolve), but computer analysis, for the purpose of teams trying to assess talent and design game strategies, will operate on a different plane. That day is coming for sure; it's just a question of when.

@wondahbap: Too cynical for me. :)