Friday, December 11, 2009

Square Roots, Lasers, and Mobilization

I promised (threatened) that I would say more about square roots, and so I am. This is me, talking about square roots again. In typical fashion, though, I'm going to start with something else that will seem, for a time, completely unrelated.

Galileo, he of the telescope, the balls rolling down inclined planes (and probably not in actuality from the Tower of Pisa), the sotto voce thumbing of the nose at the Inquisition—Galileo also discovered, or more likely rediscovered, that pendulums mark out roughly even time, no matter how far they swing. It isn't perfectly even time, owing to friction and to the circular track of the pendulum bob (although both of those can be—and were—accounted for, starting with Huygens's employment of cycloid guides). But it's pretty close.

Since the pendulum keeps fairly even time, that must mean that if the pendulum swings in twice as big an arc, it must also be moving twice as fast, in order to keep beating out even time. Now, as it's defined in Newtonian physics, the kinetic energy of the pendulum bob—that is, the energy of the bob due to its motion—goes as the square of its velocity:

KE = ½ mv²

So, twice the arc, twice the velocity, four times the kinetic energy; three times the arc, three times the velocity, nine times the kinetic energy. And so on.

That swinging motion of the pendulum bob is an example of periodic or wave motion, so called by virtue of it swinging back and forth as a water wave swings up and down, if you were to watch it passing by a buoy. Wave motion is primarily characterized by two parameters: its frequency, which is how often it returns to its starting point; and its amplitude, which is how wide it swings. So the arc through which the pendulum bob swings is essentially its amplitude. (Actually, for historical reasons, the amplitude is defined as half of that arc, from the center point of the swing to either of its extremes, but this won't affect our discussion.) So we can say that the pendulum's energy is proportional to the square of its amplitude.


This turns out to be common to many different kinds of waves—including light waves. Light is a wave. (It's also a particle, in many ways, but we'll ignore that for now.) And being a wave, it has an amplitude, which is the extent to which the light oscillates. What is it that's oscillating, anyway? In the case of water waves, it's water, and in the case of sound, it's the molecules in the air. You can't have water waves without the water, and you can't have sound waves without the air; that's why sound doesn't travel in a vacuum. But light does travel in a vacuum, so what's waving the light, so to speak? Well, the answer is that the light itself is waving, or less opaquely (heh heh), the electromagnetic fields that permeate space are waving.

In any event, like other waves, light waves also carry energy that is proportional to the square of the light's amplitude. If you double the amplitude, you quadruple the energy; triple the amplitude, and the energy goes up nine-fold. And so on.

How would light's amplitude be doubled, though? You might imagine that if you put two flashlights, the amplitude of the two together would be twice that of each individual flashlight, and the combined light output—the energy of the two together—would be four times that of each flashlight. But I think, intuitively, we know this to be false, that the combination is only twice as bright as each flashlight. And if you measure the light carefully, in a dark room, this turns out to be perfectly true.

What happened? Light waves, like other waves, have a secondary property, called phase. Two waves of the same frequency are said to be in phase if they swing in the same "direction" (in some not altogether well-defined sense); imagine two pendulums swinging in unison, so that when one swings left, the other does, too. They are out of phase if when one swings left, the other swings right, and vice versa. Or, they may be partly in phase, partly out of phase.

When you combine two light waves of the same frequency and the same amplitude, you get for all intents and purposes a single wave that is the two original waves added together. If they're in phase, the peaks get peakier and the valleys get, err, valleyier, and the amplitude of the waves is in fact doubled. On the other hand, if they're out of phase, the peaks of one get cancelled out by the valleys of the other (and vice versa), and the resultant wave has no amplitude at all.

More typically, though, the two waves are partly in phase and partly out of phase, and the resulting wave's amplitude is somewhere in between zero and two times the original. On average, one can show that the amplitude is the original times √2 . What's more, if you add three waves together at random phases, the amplitude of the sum is the original times √3 . And so on. Aha, the square root!


And since the energy of the final wave is the square of the amplitude, what comes out has two, three, or whatever times the original energy. Which is, of course, exactly what you'd expect. And good thing, too, because if it came out otherwise, we'd have a violation of the conservation of energy. Clearly, it takes n times as much energy to run n flashlights as it does to run one, and if their combined output were something other than n times the original, we'd have to seriously rethink our physics.

You might wonder if there isn't a way to get the waves to line up properly in phase so that the amplitudes do add up in the normal way, and you get a dramatic ramp up in energy. And there is; it's called a laser. A laser essentially gets n individual photons to line up in phase so that what comes out is a sort of super-photon (or super-wave, equivalently) with n² times the energy of any of the input photons. The physics-saving catch is that it takes more energy to line up, or lase, the light than you get as a result.

Nevertheless, that single photon or wave, coordinated as it is, can do things that you couldn't do with the individual photons separately. You can shine a bunch of flashlights at your eye and nothing will happen, other than a rather annoying afterimage and perhaps a headache. But even a modest laser can be used to reshape your cornea and render your eyeglasses superfluous. Of course, it should go without saying that it's not such a great idea to randomly shine lasers into your eye!

Or out, for that matter.

I see in this a kind of metaphor for human nature, and I hasten to say it's only that; as far as I know, one can't really take this and apply it rigorously in any scientific sense. But I think it's a useful metaphor all the same. I like to say that religion, among other things, is a laser of people. What on earth do I mean by that? A single human being can do a certain amount of work (in physics, work is defined as energy applied in furtherance of a force). What happens if you get two human beings together? Well, if they work against each other—if they're out of phase, in other words—less work gets done. Maybe none, if they spend all their time squabbling. Even if they're not exactly out of phase, if they're not particularly coordinated, their combined output is rather less than you might think, like the drunkard making slow and halting progress homeward because he can't put one foot directly in front of the other.

On the other hand, if they cooperate—if they're in phase—they can do twice the work. In fact, maybe they can get even more done, for there's no arguing that a coordinated combination of two people can do things that each individual person couldn't do, even adding their results together. Two people can erect a wall, for instance, that neither person could individually. Maybe, in some sense, those two people can do what it would take four people, working randomly, to achieve. And perhaps three coordinated people can do what it would take nine randomly working people to. And so on.

But it's pretty straightforward to get two or three people to work together, if they're of a mind to. But what about a hundred, or a thousand, or a million? That's where ideologies can be enormously effective; through them, a thousand can achieve what would otherwise require a million. And there may be no ideology better suited for the purpose than religion, although other ideologies—sociological, fiscal, even autocratical—may suffice. That's not to say that all that these various ideologies achieve is beneficial: for every great liberation, there may be a dozen pogroms. But they are part and parcel of a society's capacity for achievement; without them, we get only as far as a drunkard's walk will take us.

Monday, December 7, 2009

Square Roots and Great Comebacks

From the time I learned about them, I've been fascinated (probably to an unseemly amount) by the square root. I remember reading about a method for calculating square roots by long hand. There's no point, really; we have calculators to do that for us. (If you have some spare time and you enjoy this sort of thing, see if you can figure out the algorithm from the example at left.)

What use are square roots, anyway, aside from solving math problems about the diameters of circular lawns? (Have you ever seen any of those? They must encircle those conical swimming pools we dealt with in calculus class.) Here's one use: They can tell you when how big a lead your favorite basketball team needs to be secure in a win.

A few years ago, I derived a rule for determining when a lead was safe in a basketball game—specifically, an NBA game. (It matters, because the shot clock is different between an NBA game and a WNBA game and a NCAA men's game and a NCAA women's game.) You take the square root of the number of seconds left, and add three. For instance, if there's 3:45 left in the game, that's 225 seconds. Square root of 225 is 15, and you add 3, so an 18-point lead is pretty darned safe with 3:45 left. The "add 3" is for a trey at the buzzer. Go ask the Miami Heat about that 'round about now.


Pretty keen, huh? Although—not to put too fine a point on it—well-known sports statistician Bill James also came up with this very same rule. We'll call it independent discovery, at least on my part. I have no idea whether James stole it from me. Give him the benefit of the doubt, though.

But why? Why should this rule work? Why isn't it just the time remaining divided by some rate at which the team that's behind catches up? If a team can make up a 15 points in 225 seconds and then cap that with a trey to make up the 18, why can't it make up 33 points in 7:30? Or 63 points in 15:00?

And the sort-of answer to that is, it can. It's just terribly unlikely. Of course, it's already unlikely that a team can make up 15 points in 3:45, but it's still in the realm of possibility. Asking a team to do that twice in a row is just too much. If it was 100 to 1 against doing it once, doing it twice in a row would be 10,000 to 1 against. On the other hand, making up the same 15 points in twice the time is obviously easier. So in twice the time (7:30, natch), you should be able to make up some deficit in between. According to both me and Bill James—and honestly, are you going to go against both of us?—that deficit is 15 times the square root of 2. That's about 21, and if you add the 3 at the end it makes it 24.

Where on earth does this come from? One place is the drunkard's walk, otherwise known as the random walk (but I think "drunkard's walk" is more evocative). In this mathematical scenario, the eponymous drunkard starts off at some placemark—a lamppost, say. Each moment in time, he takes a step, but in a completely random direction. Might be in the same direction as the last step, might be in the opposite direction, might be anything. So after a bunch of steps, he might end up back at the lamppost where he started...or he might be home.

Odds are, though, he'll be at some intermediate distance. How far from the lamppost? Well, the first step is going to take him one step away for sure. We'll represent this by saying that d(1) = 1, where d(t) is the distance of the drunkard from the lamppost at time t. OK, now what about d(2)? Before that second step, he's one step away from the lamppost. His second step might take him two steps away, if he walks in the same direction, or zero steps away, if he walks in the opposite direction (back toward the lamppost). On average, though, he'll walk in some intermediate direction: let's say, perpendicular to his current progress from the lamppost. The Pythagorean theorem says then that

[d(2)]² = [d(1)]² + 1² = 1² + 1² = 1 + 1 = 2

or, in other words, d(2) = √2. We can go further. We've already got two examples where d(t) = √t and we'd like to get more. To do that, we'll use a process called induction. Suppose that you have a value of t for which d(t) = √t ; we'll now try to show that d(t) = √(t + 1) . Using the same argument as before—that the drunkard walks in some intermediate direction—we get

[d(t + 1)]² = [d(t)]² + 1² = [√t]² + 1² = t + 1

and then we directly get d(t + 1) = √(t + 1) . So as long as we can find a t where d(t) = √t , we're set; it's true for all greater values of t. But we already have such a value: t = 1! (And t = 2, for that matter.) It turns out, then, that the drunkard's walk, after time t, takes him a distance √t away from the lamppost.


Now, a couple of things. First, this isn't anything like a rigorous demonstration of the square root property of the drunkard's walk. You can look that up if you like. But if you work at it a little, it gives you an inkling of the intuition behind it. Secondly, though, and here we're back on track a bit: What has all this got to do with basketball games?

A basketball game is an alternating sequence of possessions. In each possession, the team with the ball is of course trying to score, and the other team is of course trying to prevent it from scoring. When the ball changes hands, the roles are reversed. In each individual possession, the effect on the score is biased: Only the team with the ball can score, usually. But in each pair of possessions, that bias cancels out, since both teams get a chance with the ball. The margin in the game can move in any direction—just like the drunkard's walk.

If the drunkard starts off 50 steps from home, he could conceivably get home in just 50 steps. But it's ridiculously unlikely: Each of those 50 steps would have to be in exactly the right direction. The square root property tells us he'll probably be just a bit over 7 steps from the lamppost; it would take 2500 steps to get him, on average, 50 steps from his starting point. After those 2500 steps, is he guaranteed to be home? Nope. He still has to be walking in the right direction. But it's at least plausible now.

In the same way, a basketball team that's down 18 points could conceivably make that up by scoring six three-pointers in a row while holding their opponents scoreless. If they did that by fouling and their opponents obliged by missing all of their free throws, the whole deficit could be made up in half a minute or so. But that's as unlikely as the drunkard walking 50 steps in exactly the right direction. Instead, a team will make up its deficit in halting fashion, sometimes making up three points, but other times giving up a point, or staying even, in any particular pair of possessions. The drunkard's walk, in other words, and that's why the square root rules great combacks.

I was going to follow this up with a discussion of sociology and mobilizing people, but this post is getting long (see, I do notice it!) and I'll defer that till next time.

Wednesday, December 2, 2009

Basketball Math Fail

Today's miscreant: The Boston Globe's Celtics Notebook. The offending paragraph reads:
Rasheed Wallace has eight technical fouls in 18 games, which would equate to 36 over a full season. That number is astronomical, of course, especially since the NBA suspends players one game for each technical after the 16th.
First of all, the NBA does no such thing. It does suspend players one game for every other technical, starting with the 16th. (See the NBA Rule Book, Rule 12, Section VII.) But that's not the math fail in this instance. The math fail is figuring that Wallace would get 36 technicals over a full 82-game season, when by their own admission, he wouldn't even play 82 games because of the suspensions for all those technicals.

So how many technicals would he get, if he were to get them at the same rate for the rest of the season, and he didn't miss any games to injury or other reasons besides the suspensions from the technicals?

At the current rate, Wallace would pick up his 16th technical in his 36th game, meaning he would be suspended for the team's 37th game. (We'll assume that Wallace doesn't appeal any of his technicals or suspensions.) He would then pick up four technicals in every nine games he played in thereafter. For the sake of argument, let's say he picks up the even-numbered technicals (16th, 18th, 20th, etc.) in the fifth and ninth game of every cycle of nine games he plays in. Since each of those technicals would carry with them a one-game suspension, these cycles would actually span 11 games for the Celtics.

As a result, Wallace would pick up those even-numbered technicals in the Celtics' 42nd, 47th, 53rd, 58th, 64th, 69th, 75th, and 80th games, in each case being suspended for the next game. That technical in the 80th game
his 32ndwould be his last, since he'd only be able to play in one additional game, and we'll charitably assume that he wouldn't get called for a technical in that one. So he'd get suspended for nine games in all, drawing 32 technicals in 73 games.

Tuesday, December 1, 2009

Analogies for Better or for Worse

Douglas Hofstadter wrote about the relationship between analogies and intelligence in the September 1981 installment of his Scientific American column series Metamagical Themas, entitled "Analogies and Roles in Human and Machine Thinking." His central point is that being able to see similarities between different situations and to capitalize on those similarities to make predictions is core to the nature of human intelligence (and by extension, to fruitful research on machine intelligence as well). "Being attuned to vague resemblances," he writes, "is the hallmark of intelligence, for better or for worse."

As if to highlight the "worse" side of the ledger, somewhere toward the middle of the column, he discusses the pitfalls of taking analogies too far. Ultimately, situations don't map perfectly onto each other, and the greater the demands placed on any given analogy, the more likely it will stretch so far it snaps.

Analogies are particularly useful for teaching purposes. Students seem often to learn something better when it is explained in terms of something they already know. We might learn about electrons orbiting an atomic nucleus by analogy with planets orbiting the Sun, for instance. To the extent that principles in one domain apply to the other, we can understand and explain behaviors in the new, unfamiliar domain in terms of the old, familiar one.

There are dangers to this path to learning, though. The famed Caltech physicist Richard Feynman—surely one of the great physics teachers of all time—was extremely conscientious when it came to teaching by analogy. He avoided analogies that he found misleading or circular. It might be natural to think of electromagnetism as being mediated by imaginary "rubber bands," he said, but in the first place, rubber bands draw things together more the further apart they get, whereas electromagnetism gets weaker with distance, and secondly, rubber bands themselves work through electromagnetism interactions at the molecular level, so any understanding students derived through this analogy must needs be circular.

Care must be taken, too, not to stretch the analogy beyond its limits. The fact is that electrons don't orbit the nucleus in neat circles (or even ellipses) like planets orbiting the Sun. If we study further, we find that although planets can apparently orbit the Sun at any distance whatsoever, electrons are constrained to orbit the nucleus only at specific distances, which we can characterize as those distances which allow an integral number of electron waves to circle the nucleus. If we study still further, we find that electrons don't travel in any kind of orbit at all, but instead can be found at any location around the nucleus according to a probability distribution (or, equivalently, are simultaneously at all different points according to that distribution—at least prior to observation).

The problem is that analogies are so darned appealing. The good ones yield correct answers to our questions so often that we lose track of where the limits of the analogies are, or even that there are any. We simply trust the analogies, often to our detriment. It's tempting to understand the budgetary situation of, say, the United States in relation to our personal budget; after all, there are many similar concepts and relationships: income, expenses, debt, balance, and so forth. It's tempting, but it's often misleading. But because we do understand many things correctly using that analogy, we become overconfident in areas where the analogy was never going to hold water.

My pet peeve in this regard is the rubber sheet analogy for general relativity. Given that general relativity was one of the major developments of 20th-century physics, you'd expect that there'd be significant time spent in explaining it to the lay public. I mean, even people who only vaguely have a notion of what physics is about have heard of Albert Einstein and "warped space."

Gravity is everywhere; we feel its effects all the time. And we've sort of internalized the Newtonian theory of gravity, which is that any two particles exert a gravitational force on each other, no matter how far apart they are; although the degree of force drops off quite rapidly with distance, it never quite shrinks down to zero. We've internalized it so well that we hardly ever wonder how that force is mediated. How does that force get exerted across all that distance? By the Newtonian theory, I wiggle my finger here, and my finger's gravitational influence on the most distant galaxy, however faint, oscillates with the same frequency as my wiggling finger. Newton himself felt this conundrum most keenly, never mind his insistence that he did not "feign hypotheses."

Einstein's general theory of relativity ostensibly resolves all of that. It posits space not simply as a theater in which gravitational interactions take place, but a physical, almost tangible thing that is affected by masses and in turn affects them. The usual term for this is curved space—a term that is justified in a mathematical sense but which is almost certain to mean nothing directly to anyone who isn't already a physicist. I imagine that the most common response is mute incomprehension.

So we explain what we mean by "curved space" by analogy. First of all, we should really be calling it "curved space-time," since in Einstein's theory time and space are interwoven almost irrevocably. With three dimensions of space and one of time—well, that's a lot of dimensions. People don't visualize four dimensions very well. So we abstract away two of them: one of the spatial dimensions, and the one time dimension, leaving two spatial dimensions. The one spatial dimension is OK, probably, but already there are problems. You've lost the one temporal dimension you have; it's possible that you might lose something essential there!

But we're pressing on. We lay down an infinite rubber sheet, typically marked with grid lines. We plop down a big heavy ball, like a bowling ball. This is the Sun, we are told. It bends or curves or warps space. Sure enough, the rubber sheet is seen to dimple significantly. Then, we roll a smaller ball around the bowling ball, and because of the warping caused by the bowling ball—err, Sun—the smaller ball (representing the Earth, say) sweeps around in a neat circular or elliptical orbit. Just like the real planets.

This is an enormously popular representation of general relativity; even Carl Sagan's Cosmos, my favorite science documentary series of all time, uses it. And yet, in my opinion, it's fatally flawed. In the first place, it's circular, just like Feynman's rubber bands. We're told that the effect of the Sun's gravity can be interpreted in terms of the Sun's warping of nearby space, by analogy with the warping of the rubber sheet caused by the bowling ball. But what is it that causes the bowling ball to warp the rubber sheet? Gravity itself! We can't rightly claim to understand gravity if gravity is involved in the explanation as well.

Even that would be excusable for pedagogical purposes if the analogy were actually accurate. But it's not. In all of the rubber-sheet depictions of general relativity I've seen, and I've seen quite a few, only one includes a disclaimer that demonstrates what's wrong with it—a little-known primer on relativity written by Lewis Carroll Epstein called, appropriately enough, Relativity Visualized. (I heartily recommend it.) He makes the following point: In space, there is no universally preferred direction up or down; those directions are only understood in reference to some gravitational field. So the rubber sheet analogy, if it's really right, should work just as well if you flip the rubber sheet upside down, so that the warp goes upward (like a volcano) rather than downward. After all, it's not supposed to be the bowling ball itself that makes the other ball go 'round, but the warp. But if you roll the smaller ball toward the volcano, what happens? As any miniature golfer knows, it certainly doesn't orbit the volcano; instead, it either goes into the volcano, or it veers away from it, never to return.

But even that's not the worst of it. The irony of this analogy is that even though it's not a very accurate depiction of general relativity, it's a dead-on match for Newtonian potential energy wells. That's right: This immensely popular analogy, which is supposed to highlight how general relativity differs from Newtonian gravity, is instead a much better illustration of the very theory general relativity was intended to supplant! I was so struck by this that I wrote up an exposition of general relativity for my astronomy Web site, which (on the off chance you've actually read this far) you can find here. In it, you'll find an analogy to general relativity which is hopefully understandable but hits much closer to the mark. (I even asked a physicist!)

But does anyone care? Nooooo, I'm sure we'll continue to see the rubber-sheet analogy trotted out at regular intervals on the Discovery Channel, with no disclaimer regarding its appropriateness.

Thursday, October 22, 2009

Something to Do With Math, Right?

In my last post, I mentioned that scoring differential has been shown to be a better predictor of future wins than even past wins are. What this referred to, specifically, is the so-called Pythagorean expectation (PE), a creation of baseball statistics guru Bill James. It's called that because of the form of the PE formula: If you let RS be runs scored by the team, and RA be runs scored against the team, then a good estimator for the winning percentage—at least in baseball—is

WP = RS2 / (RS2 + RA2)

So, for instance, if over the course of a season a team scores 800 runs, but only gives up 600, then the PE formula predicts that their winning percentage will be about 8002 / (8002 + 6002) = 0.640.

Actually, there's nothing magical about the exponent 2 in this formula; as it turns out, an exponent of 1.81 matches actual winning percentage better than 2 does. What I'd like to do in this post is say a few words (well, who are we kidding here, more than a few words) about where this exponent comes from, and an interesting correlation.

Baseball, like any sport, can be treated like a combination of strategy, tactics, and random events. The strategy and tactics represent those things that are under the control of the two teams, while the random events are things that are out of their control, such as where the baseball hits the bat, how it bounces off the grass, and so forth. Technically, as I've said before, these aren't actually random, but they happen so quickly that they're essentially random for our purposes; we can't perfectly predict how they'll go. All we can do is assign probabilities: e.g., such-and-such a player will hit it up in the air 57 percent of the time, on the ground 43 percent of the time, stuff like that.

As a result, the outcome of games aren't perfectly predictable, either; as they say, that's why they play the games. Again, we can assign probabilities—probabilities that a team scores so many runs, or gives up so many runs, or that they win or lose a particular game. The PE formula is an attempt to relate the probability distribution of runs scored and runs given up, to the probability distribution of winning and losing.

The probability distribution can only be specified mathematically, but we can get an inkling of how it works by sketching it out schematically.


In the diagram above, the horizontal axis measures runs given up, and the vertical axis measures runs scored. The diagonal dotted line represents the positions along which the two measures are equal, so if you're above that line, you win the game, and if you're below it, you lose the game.

The red blob depicts the probability distribution of runs scored and given up for a hypothetical team. Each point within the blob represents a possible game outcome. Games in the lower left are pitcher's duels, while those in the upper right are shootouts. Those in the other corners are games in which the team either blew out their opponent or were blown out themselves. Any outcome within the red blob is possible, but they're more likely to be clustered in the center of the blob, where it's a darker red. The particular way in which the games are clustered around that middle is known as the normal or Gaussian distribution. Such a distribution is predicted by something called the central limit theorem, and is also borne out by empirical studies.

From this diagram, we can estimate what the team's winning percentage is: It should be the fraction of all the red ink that shows up above the diagonal dotted line. Since the team scores, on average, a bit more than it gives up, more of the blob is above that line than below it, and their winning percentage should be somewhat above 0.500—say, 0.580, maybe. What Bill James found out was that if you compute the "red ink fraction" for a variety of different values of runs scored and runs given up, the results were essentially the same as those yielded by the formula given above.

Now, as it so happens, if you try to apply the same formula to, say, basketball, it doesn't work very well at all. Practically any team will end up with a predicted winning percentage between 0.450 and 0.550, and we know very well that isn't so: Usually there's at least one team over 0.750, and often times one over 0.800 (Cleveland did that this past season). The reason can be seen if we take a look at the corresponding "red ink" diagram for basketball.


Baseball scores runs, and basketball scores points, but the principle is the same. What isn't the same, however, is the degree of variation in the scores, relative to the total score. Basketball teams show much less variation in the number of points they score than baseball teams do. Basketball teams rarely score twice as much in one game as they do in any other; by comparison, baseball teams are occasionally shut out and occasionally score 10+ runs.

In consequence, a baseball team that scores 10 percent more runs than it gives up will still lose a fair number of games, because the variation in scores is much more than 10 percent a lot of the time. In contrast, a basketball team that scores 10 percent more points than it gives up will win a huge fraction of the time, because the variation in scoring is so much less. As you can see above, the red blob is in approximately the same place in both diagrams, but because the blob is smaller (less variation), practically all of the blob is now above the diagonal line, corresponding to a winning percentage of, oh, let's say 0.850.

This property can be addressed by using James's PE formula, but with a much higher exponent. Estimates vary as to how much higher, but the differences are relatively minor: Dean Oliver suggests using 14, whereas John Hollinger uses 16.5. Either of them will give a good prediction of the winning percentage of the applicable team.

It would be nice not to have to guess at the right exponent, though. So, since there seems to be a pretty obvious correlation between the size of the blob and the size of the exponent, I decided to investigate exactly what that correlation was. It seems likely that someone else has done it before, but a Web search didn't turn up any obvious results, so I'm sharing mine here.

To begin with, there's something else in statistics called the coefficient of variation, which basically gives in this case the size of the blob, relative to how far it is from either axis. In case you're following along on your own paper, it's defined as the ratio of the standard deviation of the distribution to the mean. So, in baseball, the c.v. is relatively large; and in basketball, it's relatively small.

What I did was to figure out, from numerical computations, what the "red ink" fraction was for various c.v.'s and scoring differentials, and to see if a formula of James's basic structure, with the right exponent, would fit those fractions. (My tool of choice was the free and open-source wxmaxima, in case you're interested.) They did, very well. In fact, I found it startling how well they fit, assuming that scoring was normally distributed. In most cases, the right exponent would fit winning percentages to within a tenth of a percent.

For instance, for a c.v. of 0.5, an exponent of 2.26 fit best. The numerical computation showed that a team that scored 20 percent more than it gave up would win 60.1 percent of the time; so did the formula. As the c.v. went down, the exponent went up, just as you would expect. The actual values:

c.v. = 0.5, exp = 2.26
c.v. = 0.3, exp = 3.78
c.v. = 0.2, exp = 5.67
c.v. = 0.1, exp = 11.7

I found these results startling: the product of c.v. and exp is almost constant, at about 1.134. (I propose calling this the Hell relation.) In other words, the right exponent is almost exactly inversely proportional to the c.v. of the scoring distribution. Therefore, we would predict that the c.v. of baseball games is 1.134/1.82, or 0.623; that of basketball would be 0.081 or 0.069, depending on whether you trust Oliver or Hollinger. I've heard that Houston Rockets GM Daryl Morey once determined an exponent of 2.34 for the NFL, which would correspond to a c.v. of 0.485.

Obviously, this is a consequence of the particular scoring model I used, but the normal distribution is broadly applicable to a lot of sports, most of which have games that are long enough to allow normalcy to show up. Given how well the basic structure of James's formula holds up, I suspect the underlying assumptions are fairly valid, although it would be interesting to see that verified.

EDIT: Here's an article from a statistics professor on just this very topic, with a rigorous derivation of the various formulae.

Monday, October 19, 2009

Adjusted Plus or Minus (More or Less)

I spent some time a while back discussing PER and its limitations. Today I'll take a similar look at adjusted plus-minus, or APM.

One of the weaknesses of PER is that it's a rather arbitrary linear combination of basketball statistics. As I pointed out, one can come up with alternate combinations that put any number of players on top of the PER list. In math nerd terms, any player on the convex hull of the statistics space can end up on top, given the right PER formula. With as many dimensions in that space as there are component statistics, that could end up being a lot of players.

And anyway, the bottom line of the game is winning, and there's no clear evidence that maximizing team PER (however you define that) maximizes your chances of winning. (It must be emphasized, by the way, that that's all any statistical approach can do: maximize chances. Basketball may be played on the floor, not on a piece of paper, but the small contingencies that lead to winning or losing are so complex and so numerous that the only thing we can do with them is treat them as essentially random events. Nothing is ever really certain in any practical sense.)

APM is a completely different approach to player assessment that attempts to remedy this weakness. Its purpose is to determine how much a player contributes to his team's scoring margin versus the opponents, which has been shown, to varying degrees of certainty, to be a good predictor of future winning percentage—better even than past winning percentage. It does this by calculating how much the team outscores its opponents with that player on the court. There's a few ways we could do this (just as there are multiple ways to define PER); I'll just be discussing one of them.

As its name implies, APM is an adjusted form of raw plus-minus, which we can call RPM for the moment. The difference between the two can best be illustrated using a simplified example. Suppose some Lakers players (Kobe, Pau, and Lamar) are participating in a two-on-two tournament, with substitutes allowed. Games are 48 minutes long. Let's say that in a particular game, Kobe and Pau open the game and play for 16 minutes, outscoring the opponents by 8. Pau and Lamar play the next 16 minutes, outscoring the opponents by just 2. Finally, Kobe and Lamar close the last 16 minutes, and outscore the opponents by 4. For the sake of simplicity, let's assume for now that the opponents have no sub and play the entire game with the same two players.

During the 32 minutes that Kobe's on the floor, his team outscores the opponents by a total of 12 points. Over a full 48-minute game, that would work out to a RPM of +18 (a 48-minute game is half again as long as Kobe's 32 minutes). Similarly, Pau's 48-minute RPM is +15, and Lamar's is +9.

However, you might ask, for instance, how much of Pau's RPM is due to his own contribution, and how much is due to sharing the court with Kobe? This is the question that APM seeks to answer. It attempts to account for the teammates one plays with, as well as the opponents one plays against (though we're keeping those constant for now).

One might compute the APMs of the three players as follows: Let Kobe's, Pau's, and Lamar's APM be represented by k, p, and l, respectively. From the first 16 minutes, we extrapolate that if Kobe and Pau played the entire game, they'd have outscored the opponents by 24 points. That could mean that both players have APMs of +24, or perhaps Kobe's is +28 and Pau's is +20, or maybe vice versa. There's not enough information to determine for sure. However, at any rate, they add up to 48:

k + p = 48

Similarly, we can write for the other two 16-minute segments

p + l = 12
k + l = 24

I'm not going to go through the gory algebra (I'm assuming you can do that yourself if you've read this far), but these three equations in three variables yield a unique solution: k = +30, p = +18, l = - 6. By way of interpretation, if you had two Kobes play against two average players for an entire game, the Kobes would win by 30 points. (Various versions of APM scale this so that you can just add up the APMs to determine the expected final winning margin. There's no significant difference between this and what we derived; they would just differ by a constant factor—the number of players—so that the scaled APMs would be +15, +9, and - 3, respectively.)

Note that nowhere in all of this computing did we say anything about scoring, rebounds, assists, steals, blocks, fouls, etc.—any of the statistics that make up aggregate parameters like PER. APM is entirely agnostic about what makes players valuable to their team; it simply measures that value. In a way, this is useful, because it completely short-circuits any assumptions about what makes players valuable in general; on the other hand, it sure would help if you knew why your player was valuable. APM can't really answer that. It is, in a very real sense, the holistic yin to PER's reductionistic yang.

Incidentally: What happens if the opponents do use different line-ups? Suppose the Lakers are playing the Magic, with Dwight Howard, Vince Carter, and Rashard Lewis. We'd use d, v, and r to represent their APMs, and assuming they played those line-ups in the same 16-minute segments as the Lakers did, we'd write out something like the following equations:

(k + p) - (d + v) = 48
(p + l) - (v + r) = 12
(k + l) - (d + r) = 24

Note that we now have three equations in six variables, which means that the scenario is said to be underdetermined: there won't be a unique solution to the equations, but multiple solutions (an infinite number, in fact). In general, there will be some kind of mathematical mismatch like this: There are as many variables as players, but as many equations as there are matchups, and those usually won't be equal. Since the number of matchups is larger than the number of players, though, you'll typically have overdetermined scenarios: there won't be any exact solutions at all; any combination of numbers will violate one equation or another.

That sounds bad, but in a sense, it's better than being underdetermined, because we can use statistical methods to determine the best near-solution to the equations—"best" in this case defined by how little the equations are violated as a whole. We can justify this by observing that players aren't robots—their performance varies up and down over the course of a game or a season—so some error in the equations is expected. Typically, the statistical method used is some form of linear regression, which is the same method used to identify likely correlations in all manner of scientific studies. In general, such methods work very well indeed.

I am, however, going to go off the reservation a little: I'm claiming that it might not work so well for basketball.

The key sticking point is hinted at by that name, linear regression, but it's present even in the deterministic case we worked out when Kobe, Pau, and Lamar were taking out their aggression on some hapless two-man team with a constant line-up. I said, for instance, that if Kobe and Pau both had APMs of +24, then they'd outscore the opponents, over an entire game, by those 24 points. Not so earthshattering; if they had in fact played the whole game, that's exactly the APM they'd have ended up with.

But then I also suggested that their APMs might be different: Kobe's could be higher and Pau's lower, or the other way around. And most crucially, I suggested that if one was higher, then the other must be lower by the same amount, so that they always add up to 48. In technical terms, we assume that APM combines linearly. That hidden assumption is part and parcel of the APM calculation; it is what allows us to make the determination that although Kobe's APM and Pau's could be any values individually, they must add up to 48. Without the linearity assumption, we can't write any equations at all; we can't compute APM, statistically or otherwise.

If you think about it, though, what justifies this addition of APMs? What makes us think that we can just add players willy-nilly, like numbers? I personally can't think of a thing that justifies that in anything close to a rigorous way. On the contrary, there's every possibility that they don't always add that way. If two players are both offensive powerhouses but defensive milquetoasts, they might both have good APMs because they spend all of their time playing with teammates that cover for their defensive weaknesses. Put them together, though, and since there's only one ball to score with, their collectively miserable defense might make them a net minus. (EDIT: Wayne Winston's version of APM, at the very least, tries to account for this. Look closely at Winston's answer to Question 5 here, and you'll see that his model includes an "interaction" factor that is a function of a pair of players. As a result, you have an affine relation instead of a linear one, and at least some of the first-order issues with linearity are taken care of.)

The linearity assumption is so seductive because it seems natural and jibes with lots of our experience. If I can grade 20 exams per hour, and you can too, then together we can grade 40 exams per hour. But in any endeavor that requires lots of teamwork and collaboration, the assumption becomes more tenuous. That doesn't unfortunately make it any less critical to the validity of things like APM. It simply has to be demonstrated for us to have any legitimate confidence in the value of APM; it isn't incumbent on anyone else to show that the linearity assumption doesn't hold, but for APM proponents to show that it does.

More insidiously, because linearity seems so natural, we are likely to miss its pivotal role in statistical measures like APM. Perhaps someone somewhere has done a study to validate the linearity assumption for APM. But if so, I haven't seen it, and I bet neither have most APM adherents. If you have, please share it!

Thursday, October 1, 2009

Inconsequence (A Jazz Tune)

Something a little different. A test of the video embedding, I guess. (Could it have picked a more objectionable thumbnail?)


An original composition. In my Walter Mitty fantasy world, this is part of a stage musical and is performed twice; the reprise has slightly different lyrics. For my own nefarious purposes, I have Frankensteined the two into one.

Here we are, you and I,
Face to face, eye to eye.
Shouldn't time give a soul
Who while wondering was blundering
A chance to be whole...?

...Hold that thought, just a mo,
Never mind, let it go.
Doesn't matter what we do
From here on, from here on I'll smile
In consequence of you.

This song is Copyright © 2009 by Brian Tung. All rights reserved. Product may have settled during shipping. Do not incinerate. Objects in mirror may be closer than they appear. Operate in a well-ventilated environment. Handle with care. Do not taunt Happy Fun Ball. Contents under pressure. Do not inhale.

Thursday, August 27, 2009

Stardust Memories

When I was ten, my dad took a couple of friends and me to see a movie. My friends and I had the choice of watching Rollercoaster, which was about a terrorist attempting to extort money from amusement parks by blowing up sections of rollercoaster track just as the coaster gets to them, or this new science fiction film that had recently opened and was getting good notices. As you've no doubt guessed, we chose poorly, while my dad went to the other film, which was (as you've probably also guessed) Star Wars. Meanwhile, one of my friends threw up on the car ride home.

I saw Star Wars in the theater four times, which to this date remains the last time I ever saw a film multiple times in the theater. Early in the film, right after the text crawl, but before the rebel ship comes on screen, you're treated to a view of a star field. In fact, here it is (click to enlarge):


When I saw the film again recently, there was something vaguely unsettling and unnatural about the look of the stars in this scene. For the sake of comparison, here's a real star field, with roughly the same level of detail (again, click to enlarge):


What strikes me now (although I was oblivious to it back in 1977, at least consciously) is how much more regular the star field is in the Star Wars frame than it is in the real photograph. There isn't much variation in the stars in the movie frame, with the top fifty or so being about the same brightness; in contrast (no pun intended), there are many more dim stars in the real photograph, and they fade out gradually, suggesting that there are plenty of stars that are in the field of view, but just beyond the limits of detectability, in this photograph at least. And there are, in fact. For some reason, that sense of infinity, which isn't in the movie frame, appeals to me greatly.

You can sort of see the reasoning behind this if you imagine for the moment that all stars are of the same intrinsic brightness, and that the only reason that some appear brighter and some appear dimmer is that they're closer or further away. (Sort of the way that most adults are of about the same height, but appear to be different sizes because they're at different distances.) And because there is more space far away than there is close up, there are more stars that are far away and therefore dim than there are stars that are close up and therefore bright.

Now, as it happens, stars do vary in actual brightness—sometimes dramatically—but the basic explanation still holds, and is supported by actual counts of bright stars versus dim stars. And I think that through long association with the night sky, we gain an appreciation for that kind of aesthetic. Once upon a time, every human on the planet with reasonably good vision had that association. Nowadays, it's less common. But the potential is still there within each of us, and in my case, it expressed itself in, among other things, my preference for the real star image rather than the Star Wars movie frame.

And this set me to wondering whether a sense for this kind of aesthetic could be mechanized in any way. In a very naïve way, it surely could. The way that the star counts vary by brightness follow a fairly well-understood formula, and a star field could easily be scanned for how well it matches that formula. But I think it's a common feeling that that would fall well short of a genuine sense of aesthetics. There would have to be a larger framework for that kind of aesthetic sense.

Could such a framework lie in fractals? Fractals are, generally speaking, patterns that are self-similar; that is, the appearance of the whole at a large scale is repeated in small parts of the pattern at smaller scales. Examples of fractals range from prosaic snowflake patterns:


to the sublime Mandelbrot set:


Fractals have been used to describe natural patterns as varied as the sound of wind through trees and the coastline of Great Britain. And they can be used to describe the appearance of star fields as well. A star field looks quite the same if you zoom in and increase the brightness. The details are different, so in that sense it is not quite like the snowflake fractal or even the Mandelbrot set. But statistically, the close-up shot and the wide-angle shot are essentially identical.

I cannot say exactly what it is about the "fractality" of these patterns that is appealing. And it does seem as though a certain sense of variation (absent in the snowflake, present to an extent in the Mandelbrot set, and rampant in real star fields) is vital to maintaining visual interest. But I can't escape the notion that self-similarity is something that people generally find captivating and inviting, once they recognize it, and is a large part of why looking up at the night sky is such a natural thing to do.

Seventh Night

Last night was Seventh Night (七夕), the seventh night of the seventh month in the lunisolar calendar followed traditionally by the Chinese. Because the Chinese calendar usually starts with the second new moon after the winter solstice, Seventh Night usually falls sometime in August in the western calendar.

Seventh Night is associated in Chinese tradition with the story of the Cowherd and the Weaver Girl. In one common telling of the story, a young cowherd by the name of Niulang (牛郎) came across a fairy girl bathing in a lake—a girl named Zhinü (織女). Fascinated by her beauty, and emboldened by his companion, an ox, he stole her clothes and waited by the side of the lake. When she came out looking for her clothes, Niulang swept her up and took her back home. In time, they were happily married with two children. But when the Goddess of Heaven found out that a fairy girl had married a mere mortal, she grew furious and sent Zhinü into the sky, where she became the bright star Vega, in the constellation of Lyra the Lyre. (Watercolor by Robin Street-Morris, 2007.)

When Niulang discovered that his wife had disappeared, he searched high and low for her, but was unable to find her. Eventually, the ox told Niulang that if he killed him and wore his hide, he would be able to ascend the heavens to find Zhinü. Niulang did as the ox suggested, and took his two children with him to find his wife, becoming as he did the star Altair. Find her he did, but the Goddess of Heaven, angered once more by Niulang's impertinence, drew a river of stars—the Milky Way—forever separating Niulang (the star Altair) from Zhinü. Their two children became Tarazed and Alshain, the two dimmer (but still bright) stars that flank Altair in the constellation of Aquila the Eagle. But apparently the Goddess of Heaven was not entirely heartless, for once a year, on the seventh night of the seventh month, she sends a bridge of magpies (鵲橋) to connect the two lovers, for just one evening. And so Seventh Night is associated with romance (and also, interestingly, with domestic skills).

The celestial setting for the entire tale can be found in the Summer Triangle, which is bounded by three stars: Altair, Vega, and Deneb (in the constellation of Cygnus the Swan, also known as the Northern Cross). The Summer Triangle can be found in the night sky throughout summer and autumn; at this time of year, it passes nearly overhead at about ten in the evening. (Photograph by Bill Rogers of the Sa-sa-na Loft Astronomical Society, 2009; click to enlarge.)


Wednesday, August 26, 2009

How Random is Random?

We all think that we know when something is random. But how random is random?

Part of the aim of mathematics is to unify concepts. It's what makes mathematics more than just a collection of ways to figure things out. As a side effect, though, mathematics definitions tend to be a bit counterintuitive. For example, I think we all know what the difference between a rectangle and a square is: A square has all four sides of equal length, and a rectangle doesn't.

Except that a mathematician says that squares are rectangles, because to a mathematician, it's inefficient and non-unifying to say that a rectangle is a four-sided figure with four right angles, except when all four sides have the same length. It makes more sense, from a mathematical perspective, to make squares a special case of rectangles.

So hopefully it won't come as too much of a surprise if I say that a completely deterministic process, such as flipping a coin that always comes up heads, is still considered a random process to mathematicians who study that sort of thing. So is a coin that comes up heads 90 percent of the time. Or 70 percent. Or—and maybe this is the surprise, now—50 percent. The cheat coin is simply a special case of a random process. To a mathematician, none of these processes is "more random" than the others. They just have different parameters.

What we think of as randomness, mathematicians call entropy. This is related to, but not the same thing as, the thermodynamic entropy that governs the direction of chemical reactions and is supposed to characterize the eventual fate of the universe. (Another post, another time, perhaps.) It turns out that this "information-theoretic" notion of entropy corresponds pretty well to what the rest of us call randomness. For those of you who are even the slightest bit curious, the definition of entropy for a flipped coin is

S = - ( pH lg pH + pT lg pT )

where pH and pT are the probabilities for heads and tails, respectively, and lg is logarithm to the base 2. For a 50-50 coin, the entropy is S = 1; for a completely deterministic coin (a two-headed one, for instance), the entropy is S = 0. For something in between—say, one that comes up heads 70 percent of the time—the entropy is something intermediate: in this case, S = 0.88 approximately.

So, all right, how entropic is a real coin? The answer is that it's probably less entropic—less random, that is—than you think it is, especially if you spin it. A paper by researchers from Stanford University and UC Santa Cruz (via Bruce Schneier, in turn via Coding the Wheel) has seven basic conclusions about coin flips:
  1. If the coin is tossed and caught, it has about a 51 percent chance of landing on the same face it was launched. (If it starts out as heads, for instance, there's a 51 percent chance it will end as heads.)
  2. If the coin is spun, rather than tossed, it can have a much larger than 50 percent chance of ending with the heavier side down. Spun coins can exhibit huge bias (some spun coins will fall tails up 80 percent of the time).
  3. If the coin is tossed and allowed to clatter to the floor, this probably adds randomness.
  4. If the coin is tossed and allowed to clatter to the floor where it spins, as will sometimes happen, the above spinning bias probably comes into play.
  5. A coin will land on its edge around 1 in 6000 throws.
  6. The same initial coin-flipping conditions produce the same coin flip result. That is, there's a certain amount of determinism to the coin flip.
  7. A more robust coin toss (more revolutions) decreases the bias.
Somewhat along the same lines, Ian Stewart, who for a while wrote a column on recreational mathematics for Scientific American, mentioned a study in one of his columns by an amateur mathematician (and professional journalist) named Robert Matthews. Matthews had watched a program in which the producers had asked people to toss buttered toast into the air, in a test of Murphy's Law as it applies to buttered toast. Somewhat to their surprise, the toast landed buttered side up about as often as it landed buttered side down.

Matthews decided that was not quite kosher. People, he thought, don't usually toss buttered toast into the air; they accidentally slide it off the plate or table. That ought to be taken into account when analyzing Murphy's Law of Buttered Toast. And when he did take it into account, he found something rather unusual. A process that you might have thought was fairly entropic turned out to be almost wholly deterministic, given some not-so-unusual assumptions about how fast the toast slides off the table. Unless you flick the toast off the table with significant speed, the buttered side lands face down almost all of the time. And it has nothing to do with the butter making that side heavier; it's that the rotation put on the toast as it creeps off the table is just enough to give it a half spin. Since the toast starts out buttered side up (one presumes), it ends up buttered side down. Stewart recommends that if you do see the toast beginning to slide off the table, and you can't catch it, to give it that fast flick, so that it isn't able to make a half flip, and lands buttered side up. You won't save the toast, unless you keep your floor fastidiously clean, but you might save yourself the mess of cleaning up the butter.

On the other hand, maybe there's another solution.

Friday, August 21, 2009

Lines for Fries (and Fry's)

Thoughts about (and while) waiting in line at the neighborhood McD's. Mmm...fries.

The Most Evil Being mocked the last queueing theory post, but he actually read the whole thing to mock it. Apparently, so did Squishy. I approve, of course. But Squishy noticed, with some temerity, that I had tagged that post with "queueing theory," indicating a potential for future posts about same. Well, the future is now. That post dealt with queueing theory itself as little as I could manage, which, OK, is still quite a bit, I guess. Fair warning: There's a bit more of it in this one.

If you were so foolhardy as to look in a queueing theory textbook, you'd probably see a representation of a queue as something like this:

The thing at the left is the queue, or waiting line; the colored blocks inside the queue represent customers; and the circle at the right is the server. Nowadays, in the computer world, we think of servers as big honking machines, but in general, it could be anything that provides a service. Say, an order taker at a fast food restaurant.

That diagram up there represents only one potential way to hook customers up with servers: a single queue with a single server. Lots of places, like McDonald's, or the supermarket, or the bank, have multiple servers available at a given time. How do they connect their customers to their servers? Here are two diagrams representing two options, without any explanation. Before reading on, see if you can figure out what queueing systems they represent, and how the lines you wait in everyday are arranged.

System (a), on the left, represents a queueing system in which each server (e.g., checkout clerk, bank teller, etc.) has his or her own line. System (b), on the right, represents a system in which all the servers together share a single line. Where I live, in Los Angeles, McDonald's and the supermarket use (a), and the bank and some other fast food places use (b). Your mileage may vary, of course.

OK, that wasn't too hard, probably. Now think about this one: All other things remaining equal, which system is better? And by better, I mean that it improves the time that you have to wait, on average, before getting service. We'll assume, to make things easier, that once you enter a queue or line, you stay in it; you don't give up, and you don't defect to another line. We'll also assume, furthermore, that all the servers are equally fast (or slow, depending on your point of view). Would you prefer to wait in (a), or (b)?

Before I answer that, let me first define some terms. All queueing systems have what's called an arrival rate, which is the rate, on average, at which new customers enter the queueing system. All servers have a service rate, which is the rate, on average, at which they can serve customers, assuming they have any customers to serve. One of the things I mentioned in that last queueing theory post was that a system is stable (that is, it doesn't jam up) if the arrival rate doesn't exceed the service rate. With me so far?

All right, one last term: The utilization of a server, or a group of servers, is the arrival rate divided by the service rate. So, pretty obviously, if the utilization of a server or servers is less than one, it's stable, and if it's greater than one, it's unstable—the line or lines get longer and longer. Somewhat less obviously, the utilization of a server is also the fraction of time that it spends actually serving customers, rather than sitting idle (which is why it's called the utilization in the first place).

Suppose Store A is using queueing system (a). It's got, let's say, six servers, each capable of serving one customer a minute. Customers come into the store at a rate of three customers a minute. Since each server gets one-sixth of all the customers, on average, each server's customer arrival rate is half a customer a minute, and each server's utilization is 1/2 divided by 1, or 1/2.

Store B, on the other hand, is using queueing system (b). It also has six servers, each of which also serve at one customer a minute. Because they all get fed from the same line, it's convenient to think of them as together serving customers at a rate of six customers per minute. If the arrival rate to Store B is the same, three customers per minute, then the utilization of the six servers, combined, is 3 divided by 6, or again, 1/2. So far, it seems like the two systems are pretty equivalent.

However, Store A has a problem that Store B doesn't. Consider the situation diagrammed below:

Five of the servers are busy, and they even have customers waiting in line behind them. The sixth server, however, is entirely idle, but because we've assumed that customers don't switch lines, it has nobody to serve. (Lest you think this is entirely unrealistic, I see it all the time at the supermarket, possibly because the idle server is a few counters away from the busy ones.) This is bound to happen from time to time, since the utilization is less than one. Servers are going to be idle every now and then, and if that happens when some other customers are waiting to be served, Store A is going to be inefficient at those times.

Note that this never happens to Store B. Certainly, servers are going to be idle from time to time, and customers are going to have to wait from time to time. But they never both happen at the same time. Any time a server comes idle, if there's any customer waiting for service, it can go straight to that server. As a result, Store B, and queueing system (b), is better for the customers: They wait for a shorter time, on average, than customers at Store A.

What's more, queueing system (b) is fairer, in the sense that customers that arrive first are served first. That doesn't always happen with queueing system (a). In the situation depicted above, if a customer now arrives to that sixth, idle server, it gets served immediately, without having to wait, even though customers that arrived previously to other lines are still waiting. So (b) is doubly better than (a).

In light of this, it shouldn't come as any surprise that Fry's Electronics, essentially the store for übergeeks, uses system (b) in every one of its stores I've been in. It even takes advantage of the longer single line (as opposed to an array of shorter lines) by snaking it between and amongst a panoply of impulse buys. One could argue that supermarkets can't really take proper advantage of system (b), because people usually have carts, and these take up a lot of room, which would obstruct other supermarket traffic. (I also haven't considered the effect of the 12-items-or-less express lanes.)

But a place like McDonald's has no such excuse. Even if you make the point that people switch lines when there's nobody waiting at a server (because the service counter is not so large), it's still unfair, in that it's not first-come-first-served. And other fast food places are perfectly willing to arrange a single line for all servers.