Showing posts with label computing. Show all posts
Showing posts with label computing. Show all posts

Monday, August 3, 2015

Reasons Why I Use the Mutt Mailreader

My favored mailreader is mutt.

The running joke is that I like it because it's conspicuously antediluvian.  Well, I don't dislike it for that reason, but there are better and more accurate reasons for why I actually like it.

The first and most important reason is that it has support (after a fashion) for tagging of mail messages.  I grew up (so to speak) on the Berkeley mailreader, which stored old messages into an array of files within an archive directory.  Although it's the term "directory" and not "file" that implies "folder" in a post-Windows world, these files are the moral equivalent of modern mail folders.

And folders are a distinctly sub-optimal way of organizing mail.  Suppose I have a folder for bills and statements, and a separate folder for medical.  So an e-mail receipt for the gas bill goes in the bills-and-statements folder, and an eyeglass prescription goes in the medical folder.  But what happens if I get a medical statement?  Where does that go?  Either I have to choose a folder to go in, or I save it in both folders.  The former makes it more difficult for me to find the message later on, and the latter is more tedious (some mailreaders consciously resist any attempts to store multiple copies) and causes consistency problems in case you want to go in and edit messages (for example, to make notes).

The proper solution to this problem is to support mail tagging, a la Gmail.  In Gmail, one creates tags, not folders, and then any number of tags can be attached to a given message.  One can put both the bills-and-statements tag and the medical tag on a medical statement e-mail, and then it will show up whenever you search either.  More usefully, you can search for both tags together, and then only medical statements (and anything else that has both tags simultaneously) will show up.  When I started using my Gmail account, I was blown away by how powerful an organizing mechanism tags were.  They basically implement multiple inheritance.  I never wanted to go back to folders for my personal e-mail.  I mean, social networking (including this blog) relies critically on tagging, why shouldn't e-mail?

Work e-mail, alas, was a different matter.  Understandably, they wanted people to use the company e-mail address and not a Gmail address, and the corporate IT infrastructure didn't support using the Gmail interface (at either of the places I worked at)—until, that is, I discovered mutt's tagging support.

To be sure, it is support after a fashion: It provides support for the X-Label header field, in terms of displaying it, but scripts have to be added in order to support adding the tags yourselves (because tags aren't very useful if you have to manually add them into the e-mail).  There's a certain amount of, ahh, customization needed to make the experience minimally unpleasant, but it's worth it.  The corporate-approved mailreader doesn't support tagging, and I won't (willingly) switch to it until it does.  We recently switched to an Exchange server, and that threatened to coerce me into the corporate mailreader, but I found a solution, Davmail, that provides an IMAP interface to an Exchange server, and that has permitted me to happily continue tagging my e-mail.

But that's only the most important reason I cleave to mutt.  Among others:
  • It can be used on any dumb text terminal you can think of, as long as it can log into my machine.  I occasionally have to check my mail on some remarkably incapable devices, and mutt will work on all of them.
  • It is blindingly fast, meaning that I can access and search my entire mail archive from years back and expect results back effectively the moment I hit the enter key.
  • It is remarkably configurable.  That's not a bonus for some people, but I like tinkering with my e-mail interface, and this suits me.
  • A somewhat backhanded compliment of mutt is that it prevents me from being exposed to e-mail attacks that depend on code being automatically loaded and executed within the e-mail message.  Well, OK, I do like that, but it's really a way of admitting that mutt can't possibly support the same kind of message display interface that a graphical mailreader can.
Mutt's slogan sums it up nicely: "All mail clients suck. This one just sucks less."

Tuesday, May 7, 2013

Why CPU Utilization is a Misleading Architectural Specification

MOAR Q-ING TREE PLZ.

Actually, this post only has a little to do with queueing theory.  But I can't help tagging it that way, just 'cause.

Once upon a time, before the Internet, before ARPANet, even before people were born who had never done homework without Google, computer systems were built.  These systems often needed to plow their way through enormous amounts of data (for that era) in a relatively short period, and they needed to be robust.  They could not break down or fall behind if, for instance, all of a sudden, there was a rush in which they had to work twice as fast for a while.

The companies that were under contract to build these systems were therefore compelled to build to a specified requirement.  This requirement often took a form something like, "Under typical conditions, the system shall not exceed 50 percent CPU utilization."  The purpose of this requirement was to ensure that if twice the load did come down the pike, the system would be able to handle itthat the system could handle twice the throughput that it experienced under a typical conditions, if it needed to.

One might reasonably ask, if the purpose was to ensure that the system could handle twice the load, why not just write the requirement in terms of throughput, using words something like, "The system shall be able to handle twice the throughput as in a typical load of work"?  Well, for one thing, CPU utilization is, in many situations, easier to measure on an ongoing basis.  If you've ever run the system monitor on your computer, you know how easy it is to track how hard your CPU is working, every second of every day.  Whereas, to test how much more throughput your system could handle, you'd actually have to measure how much work your CPU is doing, then run a test to see if it could do twice as much work without falling behind.  A requirement written in terms of CPU utilization would simply be easier to check.

For another thing, at the time these requirements were being written, CPU utilization was an effective proxy for throughput.  That is to say, in the single-core, single-unit, single-everything days, the computer could essentially be treated like a cap-screwing machine on an assembly line.  If your machine could screw caps onto jars in one second, but jars only came down the line every two seconds, then your cap-screwing machine had a utilization of 50 percent.  And, on the basis of that measurement, you knew that if there was a sudden burst of jars coming twice as fast—once per second—your machine could handle it without jars spilling all over the production room floor.

In other words, CPU utilization was quite a reasonable way to write requirements to spec out your system—once upon a time.

Since those days, computer systems have undergone significant evolution, so that we now have computers with multiple CPUs, CPUs with multiple cores, cores with multi-threading/hyper-threading.  These developments have clouded the once tidy relationship between CPU utilization and throughput.

Without getting too deep into the technical details, let me give you a flavor of how the relationship can be obscured.  Suppose you have a machine with a single CPU, consisting of two cores.  The machine runs just one single-threaded task.  Because this task has only one thread, it can only run in one core at a time; it cannot split itself to work on both cores at the same time.

Suppose that this task is running so hard that it uses up just exactly all of the one core it is able to use.  Very clearly, if the task is suddenly required to work twice as hard, it will not be able to do so.  The core it is using is already working 100 percent of the time, and the task will fall behind.  All the while, of course, the second core is sitting there idly, with nothing to do except count the clock cycles.

But what does the CPU report is its utilization?  Why, it's 50 percent!  After all, on average, its cores are being used half the time.  The fact that one of them is being used all of the time, and the other is being used none of the time, is completely concealed by the aggregate measurement.  Things look just fine, even though the task is running at maximum throughput.

In the meantime, while all of these developments were occurring, what was happening with the requirements?  Essentially nothing.  You might expect that at some point, people would latch onto the fact that computing advances were going to affect this once-firm relationship between CPU utilization (the thing they could easily measure) and throughput (the thing that they really wanted).

The problem is that requirements-writing is mind-numbing drudge work, and people will take any reasonable measure to minimize the numbness and the drudge.  Well, one such reasonable measure was to see what the previous system had done for its requirements.  What's more, those responsible for creating the requirements were, in many cases, not computer experts themselves, so unless the requirements were obviously wrong (which these were not), the inclination was to duplicate them.  That would explain the propagation of the old requirement down to newer systems.

At any rate, whatever the explanation, the upshot is that there is often an ever-diverging disconnect between the requirement and the property the system is supposed to have.  There are a number of ways to address that, to incrementally improve how well CPU utilization tracks throughput.  There are tools that measure per-core utilization, for instance.  And even though hyper-threading can also obscure the relationship, it can be turned off for the purposes of a test (although this then systematically underestimates capacity).  And so on.

But all this is beside the point, which is that CPU utilization is not the actual property one cares about.  What one cares about is throughput (and, on larger time scales, scalability).  And although one does not measure maximum throughput capacity on an ongoing basis, one can measure it each time the system is reconfigured.  And one can measure what the current throughput is.  And if the typical throughput is less than half of the maximum throughput—why, that is exactly what you want to know.  It isn't rocket science (although, to be sure, it may be put in service of rocket science).

<queueingtheory>And you may also want to know that the throughput is being achieved without concomitantly high latency.  This is a consideration of increasing importance as the task's load becomes ever more unpredictable.  Yet another reason why CPU utilization can be misleading.</queueingtheory>