Science Sunday: Bleeding Hearts and the Tribulations of Initializing Variables in C
These are troubled times indeed. The Heartbleed bug is causing wailing and gnashing of teeth among all administrators of servers, and Bruce Schneier, the man who taught Chuck Norris to divide by zero, has declared this an emergency of 11, on a scale of 1-10.
Internet protocols do not have any built-in security measures. Think about that for a minute. The internet is a datagram service. All it actually does is try to get data from a server to a client on request. So our servers have to add their own security measures. An estimated 2/3 of the servers on the internet use OpenSSL, the open source version of the Secure Socket Layer and Transport Layer Security protocols. Without going into detail, OpenSSL works very well at what it is supposed to do, but it has a bug in it. Because people make mistakes coding, bugs happen. There is research on how to write code that is provably correct, but my impression is that provably correct code is really hard and I don’t understand it. So most code isn’t provably correct and has bugs. Some bugs are, of course, worse than others. The one is OpenSSL is so very very bad that an attacker can use it to scrape a server’s memory. Randall Munroe explains this process better than I could. This is usually true.
Why does Heartbleed work? It’s because of the way the C programming language handles uninitialized variables. A variable in a computer program isn’t much like the variable in algebra: a computer’s variable is simply a named piece of memory that has a certain value that the programmer assigns and changes as needed. For example:
Without worrying too much about the structure of the program just notice a, b, and c are all variables of type int, which just means they are integers. I assign a and b values, and then c gets a value equal to the sum of a and b. Basic arithmetic. I can verify it by printing the value of c, which is, as I expect it to be, 40. Notice that when I just write int a,b, c; that means I have declared a, b, and c as variables, but they don’t currently have values. This means they are uninitialized, that is they don’t have initial values. They get values in the next few lines of code, but what happens if I introduce another variable d, and try to add that to c without initializing it? In other words, what is the result of the following code?
Well, when I compile and run it on my machine I get 32792. If you try it on your own machine, you may very well get a different answer. What’s happening is that I am telling my C compiler to use a variable without a value to perform arithmetic. My compiler will not give me an error about this. It will just assign a value to my variable. How does the compiler pick the value? It grabs a value from some random unused piece of memory. This is where things get interesting. Whenever you delete something from your computer, that something does NOT go away. All references the computer used to find that memory went away. The memory still holds the value that was just “deleted.” The value in memory does not go away until and unless the computer needs that piece of memory for something else, and will then overwrite the original value with that something else.
We’re almost ready to talk about Heartbleed. But we need to move on from individual integer variables to collections of variables. Suppose we want to keep a list of integers around in our code. C lets us do this with a built-in data type called an array. An array is a special kind of list. It has a fixed size, that is decided at its declaration. Say I have the following code.
“int testarray” is a declaration, similar to int a, except I am saying that my variable is an array of size 5. The “int” part at the beginning means that this array holds integers. Then I give my array 5 values, 5,3,2,1, and 6. Each integer in the array is accessed by an index, a number that describes where it is in the array. We start at 0 (computer scientists always start counting at 0) and go to 4, so testarray at index 0 (written as testarray) is 5, testarray at index 1 is 3, and so on until testarray, which is 6. Knowing what we do about the behavior of uninitialized variables in C, what will the computer do if I ask about testarray at index 5? We can check by printing out an index that shouldn’t exist.
On my computer I get 32767, another random value from my machine’s memory. From a small scale example using only integers, I accessed a some random part of my computers memory to see what I could see. Arrays can hold any type of data. So, with that in mind, Heartbleed resulted from a lack of checking for out of bounds indexes. There’s a part of the openSSL code where there is access to an array index without first checking to see if that array index is smaller than the size of the array If you see any reports casually mentioning that it’s a bounds checking error (or failure to check therefore) that is exactly what the reports mean. The code was written in such a way that queries to the server could access nonexistent indices from some array.
One might, at this point, reasonably ask why C doesn’t generate an error for uninitialized variables or non-existent indices. Particularly since this is a characteristic of the C language rather than of all programming languages. The Python interpreter, for example, will give me a nasty error if I try something like this. It is possible C works this way In order to guide us to the more elegant parentheses of the programming language of the gods themselves. Or possibly just because C is an older language and creating a language and compiler is not the easiest thing in the world to do, so sure our earlier efforts won’t be necessarily the easiest to work with. The noble programmer, however, will be educated about the language being used and compensate for its flaws and assume in coding that everyone is the thief and take all precautions. Regardless of whether there is an actual good reason, C is the language of operating systems, and most servers are going to be running it or some variant. Since we can’t simply reengineer the internet and all machines connected to it at will, we have to live with this.
Oh, and because Heartbleed exposed a lot of data from servers, possibly including login credentials, if you haven’t already, please change your passwords (and don’t do the correct horse battery staple thing, in this one instance, Randall Munroe is wrong) and plan for the worst. What is the worst thing that could happen based on the data you most wanted to remain private? What can you do to minimize damage from this? When you see one zombie in the garden, it could be an isolated zombie, but the canny adventurer will immediately prepare for an incipient horde of the ravening unquiet dead.