Monday, November 28, 2016

Rating the Droughts

Although I live in California, this actually has nothing to do with rainfall.

Earlier this month, the Chicago Cubs ended a century-long drought—that is to say, they hadn't won the World Series since 1908, a span of 108 years.  (I suppose it's really 107 years without a title, since there's a span of a year even between consecutive titles.)  In so doing, they defeated a team that has now gone 68 years without a title, the Cleveland Indians.  The combined droughts of those two teams was a large part of what made the 2016 World Series matchup so compelling (not to mention the twists and turns of Game 7, one of the all-time great baseball games in history).

Joining them in Major League Baseball's version of the Final Four were the Los Angeles Dodgers and the Toronto Blue Jays.  The Dodgers have now gone 28 years without winning the title, and the Blue Jays have gone 23 years.  Those seem like long-ish times, although obviously nothing like the waits the Cubs endured and the Indians continue to endure.

Consider, though, that there are currently 30 teams in MLB, and if they each had an equal chance of winning each year (which they obviously don't), you'd expect each one to win one out of every 30, which also means that the expected wait between titles, for any given team, is 30 years.  So, by that measure, the Dodgers and Blue Jays haven't yet waited as long as they should expect to, the Indians have waited over twice as long as they should have, and the Cubs waited about three-and-a-half times as long as they should have.

But wait!  That assumes that there have always been 30 teams in MLB, which there certainly hasn't.  The major leagues started out with just 16 teams in 1901, which is when modern baseball is reckoned to have started: eight in the National League, and eight in the American League.  There were 16 teams still when the Cubs last won in 1908, and also when the Indians last won in 1948.  In those days, teams should have won the title every 16 years, on average, not every 30.  When assessing the severity of title droughts, years in the early days of baseball should count for nearly twice as much as they do now.

We can reflect that insight by adding title expectations per year, rather than years.  Presently, for instance, each team can expect to win 1/30 of a title each year.  Of course, that's on average.  What happens in reality, of course, is that 1/30 of the teams win one title, and the other 29/30 of the teams win no title.  But the magic of mathematics is that by adding the average, you get a measure of how long you've waited for a title, compared to how long you should wait.  In the early years, you would have added 1/16 of a title, and in intermediate years, the value would also be intermediate—more than 1/30, but less than 1/16.

To make things a bit more manageable, let's narrow our focus to those teams that haven't won in the last 50 years (and to give a basis for comparison, we'll depict the situation as it was this fall, before the Cubs won):

Chicago Cubs: No titles in 1909–
Cleveland Indians: No titles in 1949–
Texas Rangers: No titles in 1961–
Houston Astros: No titles in 1962–

Now, let's take a look at the expansion history of baseball, setting aside situations where teams just moved from one town to another:

1901–1960: 16 teams
1961: 18 teams (American League added two teams)
1962–1968: 20 teams (National League added two teams)
1969–1976: 24 teams (Both leagues added two teams)
1977–1992: 26 teams (American League added two teams)
1993–1997: 28 teams (National League added two teams)
1998–2012: 30 teams (Both leagues added one team, but the Milwaukee Brewers moved from AL to NL)
2013–present: No change in total team count, but the Astros moved from NL to AL

Thus, the Astros have played seven years with title expectations of 1/20, eight years with title expectations of 1/24, 16 years with title expectations of 1/26, five years with title expectations of 1/28, and 18 years (remember, we're looking at the situation before the Cubs won) with title expectations of 1/30.  Add those all up and you get about 2.08; the Astros have waited more than twice as long as they should have.  We might call this the waiting factor.

The Rangers are almost in the same boat, but they played a single extra year with a title expectation of 1/18, so their waiting factor is just a little bit higher, at about 2.13.  The Indians have played 12 more years without a title than the Rangers, all with a title expectation of 1/16, so their waiting factor is 2.88.

And the Cubs, those grand old lovable losers, had, as of this October, played an extra 40 years, all with title expectations of 1/16, so their waiting factor was a whopping 5.38.  They had waited, effectively, nearly twice as long as the Indians have, and compared to the average team, over five times as long as they should have.  To put it another way, if you had substituted a merely average team for the Chicago Cubs back in 1908, those alternate-universe Chicagoans would have won an extra five or so World Series.  By comparison, the Yankees won all 27 of their World Series during that time.

Holy cow indeed!

Actually, it's just a little more complicated than that, even, since (as you can tell from the brief expansion history above) the two leagues have on occasion had different numbers of teams.  The World Series always pits one National League team against one American League team, and if the National League had 12 teams that year, the chances of any given National League team winning should be 1/24, no matter how many American League teams there were.  If we take that into account, the numbers change ever so slightly:

Astros waiting factor = 2.10
Rangers waiting factor = 2.12
Indians waiting factor = 2.87
Cubs waiting factor = 5.41

For the Cubs, of course, their waiting factor has reset.  For everyone else, the wait continues.

Wednesday, November 9, 2016

A Few Thoughts on the Election and Exit Polls

Whether you're pleased or dejected this morning, I think there's very few of us who aren't stunned by the result in the general election yesterday.  In particular, polling was way off—even exit polls, which are supposed to take the pulse of voters as they leave the booth.  How did they get the result so badly wrong?  (Pre-count models showed Clinton with an average of about 300 electoral votes, and winning about 80 percent of the time.)

I'd guess that there are a number of factors (aside from the conspiracy theories, GOP or Dems):
  1. People were embarrassed to admit voting for Trump (i.e., he was viewed as the less respectable candidate), but that shame didn't translate to the actual ballot. That doesn't mean that people voted for Trump on a whim; it just means that they weren't keen on admitting that to someone else, even a pollster they'd never see again.
  2. Exit polling was not done at all locations, for obvious reasons. So projections were based on a regression analysis that fits estimates to the sampled locations. That regression assumes, among other things, a certain degree of polarization between demographics. It looks like that polarization was even more extreme than expected (which was already significant).
  3. Trump was simply a higher-variance candidate than the traditional Republican. This strategy makes sense in any contest where you're the underdog (as Trump was for most of the time)—if he were to play a low-risk strategy, he was almost guaranteed to lose. Employing a high-risk strategy increases the probability of a blowout loss, but it also increases the probability of a close win, which is what happened. We're seeing this all the time in sports, where endgame strategies by the trailing team are becoming more aggressive. That increase in variance translated to the polls. Five thirty-eight was very open about this—they pointed out that their model, though predicting a Clinton win, had about three times more variation (by some metric) in it than in past years.
I don't think fraud played any significant role in this election. We're seeing real disquiet with the state of the nation. Whether that disquiet has a basis in fact is immaterial as regards the result of the election.

I may have more to say about the election results themselves, but I'll save that for another post. 

[Most of this post was drawn from a Facebook comment.]