Friday, June 4, 2010

Say It, You Know You Want To

I think it's safe to say the time has arrived.


You should be able to get a somewhat larger version by clicking on the above image.

Friday, April 30, 2010

Bending Over Backwards

One of my favorite science bedtime stories (didn't you have those when you were a kid? or now, if you're still one?) involves the French physicist Prosper-René Blondlot (1849-1930), whose principal claim to fame, sadly, was a non-discovery.

In this particular case, Blondlot was working in his laboratory in the wake of a flush of discoveries concerning radioactivity and X-rays. Apparently, he was trying to polarize X-rays (a tricky task owing to their high frequency and short wavelength), and as part of his attempt he placed a spark gap in front of an X-ray beam. After a few experiments with this set-up, it seemed to him that the spark was brighter when the beam was on than when it was off.

He attributed this to a new form of radiation, which he called N-rays after his home town and university of Nancy. He may have been influenced by all the work on radioactivity and X-rays then going on, but at any rate, he set about immediately to investigate attributes of the new radiation. It appeared, he said, to be emanated by many objects, including the human body. It was refracted by prisms made from various metals, although these had to be specially treated in order to prevent them from radiating N-rays themselves.

It was all very interesting, and for some time, there was a burst of scientific activity on N-rays. The problem was, the N-rays themselves were very shy and retiring, and many physicists had trouble reproducing the results obtained by Blondlot and his staff. But Blondlot always maintained either that they had inferior equipment, or inferior perception.

You see, there was no objective recording of N-rays. All one had was a subtle brightening of a spark, which Blondlot and his colleagues were already prepared to see. To lend at least some notion of objectivity to the research, Blondlot took photographs of the sparks and other N-ray phenomena, but this merely replaced subjective judgment of a live spark with subjective judgment of a photograph. Means for measuring the light output were not sufficiently reliable or accurate yet to resolve the matter.

What did resolve the matter in the end was a visit to Nancy by the American physicist Robert Wood (1868-1955). Wood had tried himself to detect N-rays and had failed signally. Frustrated at his wasted efforts, and curious as to the differences between Blondlot's staff and equipment and his own, he travelled across the ocean to see for himself.

Wood had by this time in his career established himself as something of a debunker, a sort of turn-of-the-century James Randi. But Blondlot was no charlatan; on the contrary, he was firmly convinced of his own discovery. So he had no misgivings about demonstrating his N-rays before Wood and others. He darkened the laboratory (the better to see the increase in brightness). He set his aluminum prism on a platform to refract the N-rays, made some measurements, rotated the platform a bit, made some more measurements, and so on, all the while casually detecting the N-rays. For his own part, Wood could see nothing of what Blondlot was describing. But he kept quiet, waiting for the experiments to conclude.

When they did, and the lights were turned back on, there was general astonishment, for despite all the careful measurements on the refraction of N-rays, there was no aluminum prism sitting on the platform. Wood had, it turned out, pocketed the prism early on in the experiment. The entire time, Blondlot and his staff had been obtaining gradually changing measurements of an unchanging experimental set-up. That spelled the end, for all intents and purposes, of N-rays.

What happened? Intentional deception can be ruled out rather easily, since Blondlot would have known that careful experimentation would eventually disprove N-rays; it would have been a most temporary fame. Nor was he a shoddy scientist. Before the N-ray affair, he was known for having measured both the speed of light and the speed of electricity through wires, a task that had stymied others, and which established that the two were very close (though not quite the same).

Consensus today is that Blondlot had simply wanted to believe in N-rays, expected and wanted to see the predicted brightening, so much that he really did see it, sincerely. It has been suggested that he may have been motivated by nationalism; X-rays were discovered by the German physicist Wilhelm Roentgen, and Germany had recently taken a sizable chunk of France, so that Nancy was now uncomfortably close to the French-German border. But my own feeling is that it almost doesn't matter. At some point, the desire to see his discovery of N-rays vindicated became its own driving force.

The N-ray affair is often cited in support of what is, in my opinion, a central—perhaps even the central—insight of scientific discovery: The easiest person to fool is yourself. And fooling yourself is a necessary prelude to fooling others; charlatanry would have been easier to expose. Exhibit A in support of this position is the sad fact that although N-rays essentially died a hard death in 1904, Blondlot lived on for another quarter century, continued to be productive in science, and took his belief in the existence of N-rays to his death.


It is because it is so easy to fool oneself that science is, and must be, an essentially social activity. It is often said that in science, experimental data rules the day. That's overstating it a bit. Experimental data is indeed necessary for science to progress, but that data means little without scientific theory to organize it (and vice versa). It's not that the data is more important than the theory, but that it validates it, makes it less likely to fool yourself or anyone else. And there's a strong social pressure, within the scientific community, for one to bend over backwards in an attempt to subject one's theories to as much scrutiny as possible. It's that intense examination, which eliminates many theories but marks the ones that survive with an imprimatur of robustness, that distinguishes science from so many other human activities (ahem, politics?) and has made it one of the most successful endeavors of all.

Wednesday, April 14, 2010

Back, Slash, Back!

Remember this post? Probably not, but now I have some company and/or vindication. Check out this xkcd comic, drawn by the redoubtable Randall Munroe.


Observe: Friends don't let friends say "backslash" in their URLs.

Tuesday, March 23, 2010

A Beginning, a Middle, and an End

One thing I alluded to in my previous post, but never made entirely explicit, is the notion that there are distinct phases to a basketball game (and indeed to many sports competitions), which we might call—by analogy to chess—the opening, the midgame, and the endgame. The difference between the opening and the midgame is pretty ill-defined, and in my conception is based on the feeling that teams like to start games by trying out the various things they've worked on in practice, but within a general framework, and by the time they've gotten some ways into the game (after the first set of substitutions, say), they've got an idea for what's going to work in this game, and put it into practice in earnest. As I say, it's not a clear-cut distinction and we could argue endlessly (and, I think, pointlessly, though I'd be happy to be proved wrong) about where the exact division is.

But in my opinion, from a stats geek point of view, there is a clear-cut distinction between the midgame and the endgame. And the strategies are, empirically, different in the two parts of the game.

The whole objective of a basketball game (and in most games that involve points) is to outscore your opponent. And as basketball consists primarily of a sequence of alternating possessions, the goal should be to score more in each possession than your opponent does, by and large. That's why statistics such as points per possession are supplanting others like points per game, and rightly so. The former accounts for the fact that a game consists of a rather arbitrary but evenly matched number of possessions for each team, and the latter doesn't.

In fact, I'd argue that that objective—outscoring your opponent on a per-possession basis—is exactly the definition of the midgame. During this phase, which lasts for most of the game, you are trying to be as efficient as you can on the offensive end, while preventing your opponent from doing the same. Makes sense, doesn't it?

The question that you might be asking, though, is why this isn't your objective the entire game, why this is only the goal for the midgame. And the answer to that (you knew I had one coming, didn't you?) is that during practically any game, there comes a point where the actual scoring margin outweighs average efficiency.

Perhaps the simplest example is the decision about whether or not to shoot a two-point shot (a "deuce") or a three-point shot (a "trey"). Suppose the shooting percentage on the former is x percent, and on the latter is y. In the midgame, where all you're concerned about is the average number of points scored on the shot, you prefer the deuce if 2x > 3y, and you prefer the trey otherwise (ignoring offensive rebounding and the like, which we shouldn't do in a more extensive example).

In the endgame, however, it can be quite different. Suppose you're down two, and you have the ball with the shot clock off. You're going to hold for the final shot. The question is, what shot should that be?

If you shoot the deuce and you make it, you'll tie the game and go into overtime, where you'll win about half the time (studies apparently show that any apparent "skill" at winning overtime games is just a matter of small sample size). The winning probability is therefore x/2. On the other hand, if you shoot the trey and make it, you'll win the game outright, with probability y. So in this case, in the endgame, you prefer the deuce only if x > 2y (a strictly stronger condition than in the midgame), and you prefer the trey otherwise. (And as the defensive team, you probably want to shift more of your attention to the three-point line than you would during the midgame.) The point of this little example is that your objective is shifted, from efficiency in the midgame, to winning probability in the endgame.

The next question: When does this shift take place?

There's no one right answer, but I think one place to start is one I mentioned in connection with a rule of thumb I came up with for determining when a game is mostly out of reach. (Not to put too fine a point on it, a fellow by the name of Bill James also came up with the same rule.) To first order, I think, that same epoch in the game is where the switch between midgame and endgame happens (or "ought" to happen). After that point, the team that's trailing tries tactics that are not the most efficient (and therefore wouldn't be used during the midgame) but nevertheless maximize one's chances of winning the game; the team that's ahead plays to prevent their opponents from utilizing their preferred endgame tactics.

There's a bit of a catch, though, in that my rule (OK, Bill James's and my rule), strictly speaking, applies only to evenly matched teams. For the most part, that's not a stretch in the NBA, but you could imagine a game between an NBA team and a college team, even a very good college team. If both teams just try to be as efficient as they can, the NBA team will blow out the college team. In order to win, the college team would have to play their endgame practically from the opening jump, by employing some kind of gimmick, such as a non-stop trapping defense. Lest you think this is some kind of merely theoretical possibility, such a ploy has been tried in some circles, to some success.

And it likely has some statistical validity, for inferior teams can generally win only by introducing more chaos into the game (in the non-technical sense), which increases scoring variance. And there's no question gimmicks usually do that. Most of the time, they still won't work, but they'll give you a puncher's chance.

What's the point, in the end? As a kind of pie-in-the-sky proposal, since the objectives in the various phases are different, analyze them differently. Collect or synthesize different statistics for them. And maybe, as a result, you learn something new about why some teams can finish, and others can't.

Thursday, March 11, 2010

Unifying Statistics

As a sometime scientist, I love to unify things—that is, discover that two things that look completely different are actually intimately related at some abstract level. Without unification, science is largely stamp collecting, to paraphrase Ernest Rutherford. (Actually, he said that all science is either physics or stamp collecting, but I like to think that by "physics," he really meant unification, so it's all the same.)

The state of basketball statistics is one of substantial disunion. The box score is a hodgepodge of parameters with little or nothing tying them together. Points, rebounds, assists, steals, blocks, turnovers, fouls, etc.: These all clearly have some role to play in a team's overall goal—to outscore its opponent—but comparing one to another is impossible from those statistics alone. It would be useful if all of these aspects of performance could be put on equal footing. That would enable a proper assessment of the relative importance of the box score statistics.

Maybe, even, it would enable something else: That "equal footing" might just be able to stand on its own two feet as an independent statistic.

This thought grew out of a couple of recent posts I found on ESPN's TrueHoop blog. One was Henry Abbott's take on Kobe Bryant's crunch-time performance, which by subjective standards has been through the roof this year, but certainly (one would think) well above the average in any year, given his long history of hitting game winners. By most objective quantifiers thus far, however, Kobe is human—a good, but by no means great, clutch player. Abbott has a fair point to make against these quantifiers: His pedestrian shooting percentage at the ends of games might not be an indicator of substandard crunch-time shooting, but that his skill allows him to fight his way to shots that lesser players would never even be able to take. The same shots that lower his endgame shooting percentage (but which give his team a puncher's chance to win) are ones that never end up in the box score at all for other players.



Abbott's solution to this statistical problem is to find video of any situation where big-time players have the ball in crunch time, whether they hit, miss, or even fail to get a shot off at all, and watch it all. That certainly would give a better visceral idea of how stars perform at the ends of games, but it doesn't quite help in quantifying endgame performance.

The second post was an examination on Hardwood Paroxysm of a new way to view assists. In the box score, all assists are created equal, whether they lead to a highly contested three that just happened to swish through, or to an automatic, wide open dunk. Tom Haberstroh's suggestion is to weight those assists based on the expected scoring from the shot. So an assist to a dunk that scores 60 percent of the time would be worth 1.2, while one to a long deuce that scores 40 percent of the time would be worth 0.8, and one that goes to a wide open trey that scores 35 percent of the time would be worth 1.05. And so on.

My immediate thought on this proposal was that it sort of leaves unsuccessful attempted assists out in the cold. Suppose Chris Paul puts the ball on a dime to David West at the rim ten times throughout the course of a game, and West scores four times on those passes. (We'll assume for the sake of simplicity that he never gets fouled on these.) By the traditional count, CP3 gets 4 assists. By Haberstroh's count, he gets 4 times 1.2, or 4.8 adjusted assists. He gets a boost for having made West's job easier; West just didn't make very many of them. But why should Paul get penalized for West's misses? There was, plausibly, no real difference between the passes that led to scores and the ones that led to misses. Shouldn't they all count the same?

My not-so-immediate thought was that one could unify all this by putting it on a consistent statistical foundation. The foundation? Expected scoring at the beginning of any usage, where a usage is the period of time during which the ball is in a player's possession. Put aside, for the moment, all notions of personal points, assists, rebounds, etc. Define a usage to start when a player gains possession of a ball. He can optionally dribble it for some period of time. That usage ends when he releases the ball, which is either a shot (and goes in or it doesn't, in which case it ends with either defensive or offensive possession), a pass to a teammate, or a turnover. There are some interesting corner cases to deal with, but let's ignore that for the sake of discussion.

The statistic I'm proposing is, what is the expected points scored on this possession when a player starts his usage, and what is the expected points scored on the possession when he ends it? The difference between those two is a measure of his offensive value for that usage.

Example: Chris Paul dribbles the ball up court, with everybody already set in a halfcourt stance. In this scenario, the Hornets score, let's say, 0.8 points per possession on average. (Lower than their typical points per possession because all the high-value transition points are eliminated.) He dribbles around, and locates David West open underneath the basket, and gets the ball to him, whereupon the Hornets expected scoring at this juncture is 1.5 points. (Not exactly 2.0 because maybe he geeks the dunk, gets fouled, or whatever.) Let's suppose West actually does score the basket. The ledger for this possession is as follows:

Initial expected scoring: 0.8
Increment by Chris Paul: +0.7
Increment by David West: +0.5
Actual score: 2.0

Let's take another, somewhat more complicated case. Jason Williams comes up the floor in semi-transition. The Magic's expected score in this situation is, let's say, 1.1 points per possession. He dribbles around for a few seconds, however, and doesn't locate anything easy, so he pulls the ball back out and passes it to Vince Carter on the left wing with 16 seconds left on the shot clock. Williams hasn't done anything terribly negative with the ball (no turnover), but he hasn't broken anyone down, and in the meantime he's frittered away 8 seconds, and that lowers the expected score for the possession to 0.7 points. Vince shot fakes a few times, then takes it toward the baseline, drawing a few defenders to him, and then passes to Dwight Howard in the lane. Doing so increases the Magic's expected score up to 1.2 points. Howard dribbles left, fakes, goes back to his right, then tosses up a right hand hook that bounces off the rim and is rebounded by the other team. Final score on this possession is, of course, 0.0 points. So the ledger looks like this:

Initial expected scoring: 1.1
Increment by Jason Williams: -0.4
Increment by Vince Carter: +0.5
Increment by Dwight Howard: -1.2
Actual score: 0.0

On average, the initial expected scoring equals the actual score, so the typical player would score an average increment of 0.0. (For instance, suppose that 60 percent of the time, Howard makes that shot and scores an increment of 0.8; then, 40 percent of the time, he misses it and scores an increment of -1.2. Those two balance each other out exactly.) Higher is better, naturally, and lower is worse. This approach dispenses with the coarse categorization of basketball actions into scores, turnovers, assists, rebounds, and non-box-score actions, and assesses every single usage in terms of its contribution to the final score. I think it would be much more representative of everybody's activity. (One thing that is left out: screens.) One could also rate defense this way, to a certain extent, although zone defenses and double teams definitely make things challenging.

The drawback is that it's tremendously more work to encode all this information about the game. But diagnostically it might be worth it for teams to pay someone to do it; if you could figure out what a player is doing when his increment is 0.4 lower than average, that'd be very useful information. One benefit to this approach is that it only cares about what happens when the ball changes hands. Whatever a player does throughout his usage can be discarded as far as this statistic is concerned, so that would reduce the burden of encoding information.

The application to crunch-time shooting? I think it's pretty obvious. You've got 3.4 seconds left, down two, inbounding the ball 40 feet from the basket. In this case, you're in the endgame, not the midgame, so your objective is not to maximize scoring, but to maximize chance of winning. (A two-pointer is better than a three-pointer in midgame if it succeeds more than one and a half times as often, but it's only better in a two-point endgame if it succeeds about twice as often.) When you start this possession, your probability of winning is, let's say, 0.15. You get the ball, and you can the trey. Your actual winning probability is 1.0 (you won the game). Your win increment is therefore +0.85. If you had missed it, it'd been -0.15. So, when the situation looks dire, success is rewarded much more than failure is penalized.

Now, on the other hand, suppose you went for the deuce. If you miss it, the winning probability still goes to 0.0 and the increment is -0.15, but if you make it, the increment is only +0.35 (assuming you have a 50 percent chance of winning in OT). You've improved matters significantly, but you still haven't won the game. By this analysis, the cold-blooded assassin quality that Kobe Bryant supposedly personifies is not only bravado, but potentially sound tactical thinking, and this aspect would be captured by compiling expected win increments.


You could even go so far as to assess the impact on winning the title (much as Hollinger's playoff calculator does). By that metric, LeBron's fadeaway three against Hedo Turkoglu in Game 2 of last season's ECF was an absolute monster. Assuming that the Cavaliers would have been even money against the Lakers in the NBA Finals, that shot (which took the Cavaliers from at best a 0.1 win to a 1.0 win) was worth in the neighborhood of 0.1 to 0.2 of a title, an incredible value for a pre-Finals make. The fact that the Cavaliers did not go on to even make the Finals is immaterial in this valuation, as it couldn't have been known at the time. On the other side of the balance sheet would be Frank Selvy's miss at the end of regulation in Game 7 of the 1962 Finals, which ended up being worth an increment of about -0.2 or -0.3 of a title, as instead of winning the title outright on the shot, the Lakers had to go on to play OT, where they eventually lost.

Friday, January 22, 2010

The Suspension of Belief

You may think I've mistitled that, but no, not really. Suppose I put to you two ways to say a common sentiment:
  1. All that glitters is not gold.
  2. Not all that glitters is gold.
Now, put aside all notions of poetic rhythm or provenance. (Or that the original version in Shakespeare's Merchant of Venice had "glisters" instead of "glitters." The former comes from Dutch, while the latter comes from Norse. In our day, the Norse version has entirely displaced the Dutch version, but in Shakespeare's day, they both had currency. Or at least so Shakespeare would have us believe.) Does either of these seem "righter" to you than the other?

I've put little quizlets like this to various people and they seem to fall mostly into two groups. One group of people can't see anything at all to recommend one over the other. Moreover, when the particular distinguishing feature is pointed out, they either don't see it or can't see why anyone would care. (You might, if you fall into this group, see if you can figure out before reading on what this distinguishing feature is, if you don't already know.)

The second group, of course, sees a logical distinction between the two and what's more, they're irritated that there's a mismatch between intent and wording. What's still more, they're irritated that the first group doesn't acknowledge this. To this group, the above two sentences are logically equivalent to the following:
  1. All glittery things are non-gold.
  2. Some glittery things are non-gold.
A quick glance at the script for Merchant of Venice indicates that Shakespeare chose the first wording ("All that glitters...") but his meaning is clearly the second. Does this bother you?

OK, that's not really all that important, as we all know what Shakespeare meant. Here's another one:
  1. I don't believe we have a coherent plan for the Middle East.
  2. I believe we don't have a coherent plan for the Middle East.
Obviously, when it's presented this baldly, it's clear what the difference between this two (especially, I hope, in light of the previous example), but I can't count the number of times that people have interpreted #1 (or minor variations thereof) as #2. And honestly, I don't think it's because they can't think logically. I think it's because they're impatient with disbelief.

Nowhere is this more evident than in politics. It's practically a cliché to demand politicians give their position on some issue or another, to the point that it's considered a weakness if they can't immediately spit one out. While I'm all for politicians being prepared for new situations (and as a by-product, for questions from the press), is having a response for all such questions really preferable to being able to suspend belief when the situation warrants? We've seen the dangers that feigned certainty can bring. And it's not as though suspension of belief necessarily means suspension of action. We can act rationally on uncertainty just as well as we can act on strong belief.

As prominent as it is in politics, though, this rejection of uncertainty permeates our whole world, including science, where it has no business. Political truths may last for a generation or two (think about how long the Democratic party has been on the side of civil rights), but scientific truths, once verified, last essentially for eternity, subject only to occasional refinement. Given that, what's the rush to judgment? Why not suspend belief until we know for sure? Impatience with uncertainty is fine as long as it motivates us to reduce it, but not if it forces belief before we're ready.

Monday, January 11, 2010

Cutting Your Losses

I was standing at the vending machine at work today, buying some chips with lots of small coins (nickels and dimes). And as I often do, I carefully inserted the nickels first, then the dimes; if I had used any quarters, they'd have come last.

You may—assuming you've read this far—wondered why this is. To be fair, having done this for a long time, I wondered myself for a moment. And then I remembered.

See, when I first started doing this, I was in college. I was living in the dorms. The dorms had vending machines, which were balky, much like anything in the dorms. They would, occasionally, find something objectionable about your change. They were even particular about the way you inserted your change; sometimes, it would take six or seven tries for you to get it to accept a specific dime. I would bring extra change just in case, if I had any, but sometimes even that would run out. So there I would be standing, with 45 cents that the machine was refusing to take, and more money back in the dorm room that I could try out on the Keeper of the Fizzies. But in order to get that money, I'd actually have to back to the dorm room. Away from the vending machine.

I'd run downstairs, get the change, run back upstairs, and hope that in the meantime, no dormitory Grinch had decided to get a 30-cent discount on his Coke.

Because, as it happens, sometimes they would. I'd get back and there would be no credit at all in the vending machine. You might suppose that Whoever It Was would at least leave the credit they had benefited from in change on the side, but noooooo.

That's when this business with inserting change in ascending order of value started. It was a way of cutting my losses. You might think that it would be simpler for me to just push the coin return and withdraw my change before heading downstairs, but in the first place, the coin return lever was balky, like everything else, and in the second place, it had often taken me lots of effort to get those coins in and I was reluctant to relinquish those hard-won gains.

Eventually, I managed to obtain a small dorm fridge and thereafter bought my drinks at the market. But this was before all that. Just the same, I continued my coin-sorting practice even to the present day, where (I daresay) my co-workers are far less likely to stiff me out of a handful of change than my dormmates were.

You know me, always looking for something mathy about the situation, so here's the question: Suppose that I only used n nickels and d dimes (no quarters), that I foolishly brought exact change, and that the vending machine refuses to take exactly one coin, randomly and uniformly selected from all the coins. On average, how much less money did I place at risk going nickels first than I did going dimes first?

The answer: The average reduction in risk was equal to the value of the nickels multiplied by the fraction of coins that were dimes.

I had thought to try to tie this story to something deeper, but I just can't bring myself to do it.