Tuesday, March 23, 2010

A Beginning, a Middle, and an End

One thing I alluded to in my previous post, but never made entirely explicit, is the notion that there are distinct phases to a basketball game (and indeed to many sports competitions), which we might call—by analogy to chess—the opening, the midgame, and the endgame. The difference between the opening and the midgame is pretty ill-defined, and in my conception is based on the feeling that teams like to start games by trying out the various things they've worked on in practice, but within a general framework, and by the time they've gotten some ways into the game (after the first set of substitutions, say), they've got an idea for what's going to work in this game, and put it into practice in earnest. As I say, it's not a clear-cut distinction and we could argue endlessly (and, I think, pointlessly, though I'd be happy to be proved wrong) about where the exact division is.

But in my opinion, from a stats geek point of view, there is a clear-cut distinction between the midgame and the endgame. And the strategies are, empirically, different in the two parts of the game.

The whole objective of a basketball game (and in most games that involve points) is to outscore your opponent. And as basketball consists primarily of a sequence of alternating possessions, the goal should be to score more in each possession than your opponent does, by and large. That's why statistics such as points per possession are supplanting others like points per game, and rightly so. The former accounts for the fact that a game consists of a rather arbitrary but evenly matched number of possessions for each team, and the latter doesn't.

In fact, I'd argue that that objective—outscoring your opponent on a per-possession basis—is exactly the definition of the midgame. During this phase, which lasts for most of the game, you are trying to be as efficient as you can on the offensive end, while preventing your opponent from doing the same. Makes sense, doesn't it?

The question that you might be asking, though, is why this isn't your objective the entire game, why this is only the goal for the midgame. And the answer to that (you knew I had one coming, didn't you?) is that during practically any game, there comes a point where the actual scoring margin outweighs average efficiency.

Perhaps the simplest example is the decision about whether or not to shoot a two-point shot (a "deuce") or a three-point shot (a "trey"). Suppose the shooting percentage on the former is x percent, and on the latter is y. In the midgame, where all you're concerned about is the average number of points scored on the shot, you prefer the deuce if 2x > 3y, and you prefer the trey otherwise (ignoring offensive rebounding and the like, which we shouldn't do in a more extensive example).

In the endgame, however, it can be quite different. Suppose you're down two, and you have the ball with the shot clock off. You're going to hold for the final shot. The question is, what shot should that be?

If you shoot the deuce and you make it, you'll tie the game and go into overtime, where you'll win about half the time (studies apparently show that any apparent "skill" at winning overtime games is just a matter of small sample size). The winning probability is therefore x/2. On the other hand, if you shoot the trey and make it, you'll win the game outright, with probability y. So in this case, in the endgame, you prefer the deuce only if x > 2y (a strictly stronger condition than in the midgame), and you prefer the trey otherwise. (And as the defensive team, you probably want to shift more of your attention to the three-point line than you would during the midgame.) The point of this little example is that your objective is shifted, from efficiency in the midgame, to winning probability in the endgame.

The next question: When does this shift take place?

There's no one right answer, but I think one place to start is one I mentioned in connection with a rule of thumb I came up with for determining when a game is mostly out of reach. (Not to put too fine a point on it, a fellow by the name of Bill James also came up with the same rule.) To first order, I think, that same epoch in the game is where the switch between midgame and endgame happens (or "ought" to happen). After that point, the team that's trailing tries tactics that are not the most efficient (and therefore wouldn't be used during the midgame) but nevertheless maximize one's chances of winning the game; the team that's ahead plays to prevent their opponents from utilizing their preferred endgame tactics.

There's a bit of a catch, though, in that my rule (OK, Bill James's and my rule), strictly speaking, applies only to evenly matched teams. For the most part, that's not a stretch in the NBA, but you could imagine a game between an NBA team and a college team, even a very good college team. If both teams just try to be as efficient as they can, the NBA team will blow out the college team. In order to win, the college team would have to play their endgame practically from the opening jump, by employing some kind of gimmick, such as a non-stop trapping defense. Lest you think this is some kind of merely theoretical possibility, such a ploy has been tried in some circles, to some success.

And it likely has some statistical validity, for inferior teams can generally win only by introducing more chaos into the game (in the non-technical sense), which increases scoring variance. And there's no question gimmicks usually do that. Most of the time, they still won't work, but they'll give you a puncher's chance.

What's the point, in the end? As a kind of pie-in-the-sky proposal, since the objectives in the various phases are different, analyze them differently. Collect or synthesize different statistics for them. And maybe, as a result, you learn something new about why some teams can finish, and others can't.

Thursday, March 11, 2010

Unifying Statistics

As a sometime scientist, I love to unify things—that is, discover that two things that look completely different are actually intimately related at some abstract level. Without unification, science is largely stamp collecting, to paraphrase Ernest Rutherford. (Actually, he said that all science is either physics or stamp collecting, but I like to think that by "physics," he really meant unification, so it's all the same.)

The state of basketball statistics is one of substantial disunion. The box score is a hodgepodge of parameters with little or nothing tying them together. Points, rebounds, assists, steals, blocks, turnovers, fouls, etc.: These all clearly have some role to play in a team's overall goal—to outscore its opponent—but comparing one to another is impossible from those statistics alone. It would be useful if all of these aspects of performance could be put on equal footing. That would enable a proper assessment of the relative importance of the box score statistics.

Maybe, even, it would enable something else: That "equal footing" might just be able to stand on its own two feet as an independent statistic.

This thought grew out of a couple of recent posts I found on ESPN's TrueHoop blog. One was Henry Abbott's take on Kobe Bryant's crunch-time performance, which by subjective standards has been through the roof this year, but certainly (one would think) well above the average in any year, given his long history of hitting game winners. By most objective quantifiers thus far, however, Kobe is human—a good, but by no means great, clutch player. Abbott has a fair point to make against these quantifiers: His pedestrian shooting percentage at the ends of games might not be an indicator of substandard crunch-time shooting, but that his skill allows him to fight his way to shots that lesser players would never even be able to take. The same shots that lower his endgame shooting percentage (but which give his team a puncher's chance to win) are ones that never end up in the box score at all for other players.



Abbott's solution to this statistical problem is to find video of any situation where big-time players have the ball in crunch time, whether they hit, miss, or even fail to get a shot off at all, and watch it all. That certainly would give a better visceral idea of how stars perform at the ends of games, but it doesn't quite help in quantifying endgame performance.

The second post was an examination on Hardwood Paroxysm of a new way to view assists. In the box score, all assists are created equal, whether they lead to a highly contested three that just happened to swish through, or to an automatic, wide open dunk. Tom Haberstroh's suggestion is to weight those assists based on the expected scoring from the shot. So an assist to a dunk that scores 60 percent of the time would be worth 1.2, while one to a long deuce that scores 40 percent of the time would be worth 0.8, and one that goes to a wide open trey that scores 35 percent of the time would be worth 1.05. And so on.

My immediate thought on this proposal was that it sort of leaves unsuccessful attempted assists out in the cold. Suppose Chris Paul puts the ball on a dime to David West at the rim ten times throughout the course of a game, and West scores four times on those passes. (We'll assume for the sake of simplicity that he never gets fouled on these.) By the traditional count, CP3 gets 4 assists. By Haberstroh's count, he gets 4 times 1.2, or 4.8 adjusted assists. He gets a boost for having made West's job easier; West just didn't make very many of them. But why should Paul get penalized for West's misses? There was, plausibly, no real difference between the passes that led to scores and the ones that led to misses. Shouldn't they all count the same?

My not-so-immediate thought was that one could unify all this by putting it on a consistent statistical foundation. The foundation? Expected scoring at the beginning of any usage, where a usage is the period of time during which the ball is in a player's possession. Put aside, for the moment, all notions of personal points, assists, rebounds, etc. Define a usage to start when a player gains possession of a ball. He can optionally dribble it for some period of time. That usage ends when he releases the ball, which is either a shot (and goes in or it doesn't, in which case it ends with either defensive or offensive possession), a pass to a teammate, or a turnover. There are some interesting corner cases to deal with, but let's ignore that for the sake of discussion.

The statistic I'm proposing is, what is the expected points scored on this possession when a player starts his usage, and what is the expected points scored on the possession when he ends it? The difference between those two is a measure of his offensive value for that usage.

Example: Chris Paul dribbles the ball up court, with everybody already set in a halfcourt stance. In this scenario, the Hornets score, let's say, 0.8 points per possession on average. (Lower than their typical points per possession because all the high-value transition points are eliminated.) He dribbles around, and locates David West open underneath the basket, and gets the ball to him, whereupon the Hornets expected scoring at this juncture is 1.5 points. (Not exactly 2.0 because maybe he geeks the dunk, gets fouled, or whatever.) Let's suppose West actually does score the basket. The ledger for this possession is as follows:

Initial expected scoring: 0.8
Increment by Chris Paul: +0.7
Increment by David West: +0.5
Actual score: 2.0

Let's take another, somewhat more complicated case. Jason Williams comes up the floor in semi-transition. The Magic's expected score in this situation is, let's say, 1.1 points per possession. He dribbles around for a few seconds, however, and doesn't locate anything easy, so he pulls the ball back out and passes it to Vince Carter on the left wing with 16 seconds left on the shot clock. Williams hasn't done anything terribly negative with the ball (no turnover), but he hasn't broken anyone down, and in the meantime he's frittered away 8 seconds, and that lowers the expected score for the possession to 0.7 points. Vince shot fakes a few times, then takes it toward the baseline, drawing a few defenders to him, and then passes to Dwight Howard in the lane. Doing so increases the Magic's expected score up to 1.2 points. Howard dribbles left, fakes, goes back to his right, then tosses up a right hand hook that bounces off the rim and is rebounded by the other team. Final score on this possession is, of course, 0.0 points. So the ledger looks like this:

Initial expected scoring: 1.1
Increment by Jason Williams: -0.4
Increment by Vince Carter: +0.5
Increment by Dwight Howard: -1.2
Actual score: 0.0

On average, the initial expected scoring equals the actual score, so the typical player would score an average increment of 0.0. (For instance, suppose that 60 percent of the time, Howard makes that shot and scores an increment of 0.8; then, 40 percent of the time, he misses it and scores an increment of -1.2. Those two balance each other out exactly.) Higher is better, naturally, and lower is worse. This approach dispenses with the coarse categorization of basketball actions into scores, turnovers, assists, rebounds, and non-box-score actions, and assesses every single usage in terms of its contribution to the final score. I think it would be much more representative of everybody's activity. (One thing that is left out: screens.) One could also rate defense this way, to a certain extent, although zone defenses and double teams definitely make things challenging.

The drawback is that it's tremendously more work to encode all this information about the game. But diagnostically it might be worth it for teams to pay someone to do it; if you could figure out what a player is doing when his increment is 0.4 lower than average, that'd be very useful information. One benefit to this approach is that it only cares about what happens when the ball changes hands. Whatever a player does throughout his usage can be discarded as far as this statistic is concerned, so that would reduce the burden of encoding information.

The application to crunch-time shooting? I think it's pretty obvious. You've got 3.4 seconds left, down two, inbounding the ball 40 feet from the basket. In this case, you're in the endgame, not the midgame, so your objective is not to maximize scoring, but to maximize chance of winning. (A two-pointer is better than a three-pointer in midgame if it succeeds more than one and a half times as often, but it's only better in a two-point endgame if it succeeds about twice as often.) When you start this possession, your probability of winning is, let's say, 0.15. You get the ball, and you can the trey. Your actual winning probability is 1.0 (you won the game). Your win increment is therefore +0.85. If you had missed it, it'd been -0.15. So, when the situation looks dire, success is rewarded much more than failure is penalized.

Now, on the other hand, suppose you went for the deuce. If you miss it, the winning probability still goes to 0.0 and the increment is -0.15, but if you make it, the increment is only +0.35 (assuming you have a 50 percent chance of winning in OT). You've improved matters significantly, but you still haven't won the game. By this analysis, the cold-blooded assassin quality that Kobe Bryant supposedly personifies is not only bravado, but potentially sound tactical thinking, and this aspect would be captured by compiling expected win increments.


You could even go so far as to assess the impact on winning the title (much as Hollinger's playoff calculator does). By that metric, LeBron's fadeaway three against Hedo Turkoglu in Game 2 of last season's ECF was an absolute monster. Assuming that the Cavaliers would have been even money against the Lakers in the NBA Finals, that shot (which took the Cavaliers from at best a 0.1 win to a 1.0 win) was worth in the neighborhood of 0.1 to 0.2 of a title, an incredible value for a pre-Finals make. The fact that the Cavaliers did not go on to even make the Finals is immaterial in this valuation, as it couldn't have been known at the time. On the other side of the balance sheet would be Frank Selvy's miss at the end of regulation in Game 7 of the 1962 Finals, which ended up being worth an increment of about -0.2 or -0.3 of a title, as instead of winning the title outright on the shot, the Lakers had to go on to play OT, where they eventually lost.