FeaturedScience SundaySoftwareTechnology

Science Sunday: Bleeding Hearts and the Tribulations of Initializing Variables in C

These are troubled times indeed.  The Heartbleed bug is causing wailing and gnashing of teeth among all administrators of servers, and Bruce Schneier, the man who taught Chuck Norris to divide by zero, has declared this an emergency of 11, on a scale of 1-10.

Internet protocols do not have any built-in security measures.  Think about that for a minute.  The internet is a datagram service.  All it actually does is try to get data from a server to a client on request.  So our servers have to add their own security measures.  An estimated 2/3 of the servers on the internet use OpenSSL, the open source version of the Secure Socket Layer and Transport Layer Security protocols.  Without going into detail, OpenSSL works very well at what it is supposed to do, but it has a bug in it. Because people make mistakes coding, bugs happen.  There is research on how to write code that is provably correct, but my impression is that provably correct code is really hard and I don’t understand it.  So most code isn’t provably correct and has bugs.  Some bugs are, of course, worse than others.  The one is OpenSSL is so very very bad that an attacker can use it to scrape a server’s memory.  Randall Munroe explains this process better than I could.  This is usually true.

Why does Heartbleed work?  It’s because of the way the C programming language handles uninitialized variables.  A variable in a computer program isn’t much like the variable in algebra: a computer’s variable is simply a named piece of memory that has a certain value that the programmer assigns and changes as needed.  For example:

variables-usual

The variables here, circled in red, are a, b, and c

 

Without worrying too much about the structure of the program just notice a, b, and c are all variables of type int, which just means they are integers.  I assign a and b values, and then c gets a value equal to the sum of a and b.  Basic arithmetic.  I can verify it by printing the value of c, which is, as I expect it to be, 40.  Notice that when I just write int a,b, c; that means I have declared a, b, and c as variables, but they don’t currently have values.  This means they are uninitialized, that is they don’t have initial values.  They get values in the next few lines of code, but what happens if I introduce another variable d, and try to add that to c without initializing it?  In other words, what is the result of the following code?

uninitialized

Well, when I compile and run it on my machine I get 32792. If you try it on your own machine, you may very well get a different answer.  What’s happening is that I am telling my C compiler to use a variable without a value to perform arithmetic.  My compiler will not give me an error about this.  It will just assign a value to my variable.  How does the compiler pick the value?  It grabs a value from some random unused piece of memory.  This is where things get interesting.  Whenever you delete something from your computer, that something does NOT go away.  All references the computer used to find that memory went away.  The memory still holds the value that was just “deleted.”  The value in memory does not go away until and unless the computer needs that piece of memory for something else, and will then overwrite the original value with that something else.

We’re almost ready to talk about Heartbleed.   But we need to move on from individual integer variables to collections of variables.   Suppose we want to keep a list of integers around in our code.  C lets us do this with a built-in data type called an array. An array is a special kind of list.  It has a fixed size, that is decided at its declaration.  Say I have the following code.

array

“int testarray[5]” is a declaration, similar to int a, except I am saying that my variable is an array of size 5.  The “int” part at the beginning means that this array holds integers.  Then I give my array 5 values, 5,3,2,1, and 6.  Each integer in the array is accessed by an index, a number that describes where it is in the array.  We start at 0 (computer scientists always start counting at 0) and go to 4, so testarray at index 0 (written as testarray[0]) is 5, testarray at index 1 is 3, and so on until testarray[4], which is 6.  Knowing what we do about the behavior of uninitialized variables in C, what will the computer do if I ask about testarray at index 5?   We can check by printing out an index that shouldn’t exist.

outofbounds

On my computer I get 32767, another random value from my machine’s memory.  From a small scale example using only integers, I accessed a some random part of my computers memory to see what I could see.  Arrays can hold any type of data.  So, with that in mind, Heartbleed resulted from a lack of checking for out of bounds indexes.  There’s a part of the openSSL code where there is access to an array index without first checking to see if that array index is smaller than the size of the array   If you see any reports casually mentioning that it’s a bounds checking error (or failure to check therefore) that is exactly what the reports mean.  The code was written in such a way that queries to the server could access nonexistent indices from some array.

One might, at this point, reasonably ask why C doesn’t generate an error for uninitialized variables or non-existent indices.  Particularly since this is a characteristic of the C language rather than of all programming languages.    The Python interpreter, for example, will give me a nasty error if I try something like this.  It is possible C works this way In order to guide us to the more elegant parentheses of the programming language of the gods themselves.  Or possibly just because C is an older language and creating a language and compiler is not the easiest thing in the world to do, so sure our earlier efforts won’t be necessarily the easiest to work with.   The noble programmer, however, will be educated about the language being used and compensate for its flaws and assume in coding that everyone is the thief and take all precautions.   Regardless of whether there is an actual good reason, C is the language of operating systems, and most servers are going to be running it or some variant.  Since we can’t simply reengineer the internet and all machines connected to it at will, we have to live with this.

Oh, and because Heartbleed exposed a lot of data from servers, possibly including login credentials, if you haven’t already, please change your passwords (and don’t do the correct horse battery staple thing, in this one instance, Randall Munroe is wrong) and plan for the worst.  What is the worst thing that could happen based on the data you most wanted to remain private?  What can you do to minimize damage from this?  When you see one zombie in the garden, it could be an isolated zombie, but the canny adventurer will immediately prepare for an incipient horde of the ravening unquiet dead.

Previous post

Awesome Sauce Music Friday! They Contain the History of Life Edition

Next post

Reality Checks: Measles Drug, Misogynist Geeks, Minivan Physics

Elizabeth

Elizabeth

Elizabeth is a professional belly dancer, a flaky computer scientist, and a returned Peace Corps volunteer. She lives in Georgia (the state of the U.S., not the country) but is nonetheless somehow not a combination of stereotypes from Gone with the Wind and Deliverance. Her personal blog is Coffeefied. Operafied. Fluffified. Beglittered.

2 Comments

  1. April 20, 2014 at 1:35 pm —

    Great introduction in this topic. But unfortunately, the real issue is more complex than that.

    In short, OpenSSL was doing its own memory management: It did not ask the operating system for every little block of memory it needed when it needed it. Instead, it asked for a big block up front, and then manually figured out how to partition it when the program needed a piece of it. This has three effect: Firstly, even if the operating system or the default C memory management scrubbed the memory before it handed it out to a program, this would not have stopped Heartbleed. Secondly, even if the operating system did rigorous bound checks (which most operating systems do, for the most part, with exceptions in the corner cases), this won’t help either. Thirdly, and most damning, since that big piece of memory you can read from is exclusively used by OpenSSL, the chances of sniffing a private SSL key this way quickly are pretty high.

    Moreover, C compilers, if instructed through -Wuninitialized and -Wmaybe-uninitialized (usually enabled by -Wall, which also adds more checks), nowadays do warn about unitialized variables. Of course, due to the halting problem, this is never 100% fool-proof and there’s bound to be both false positives and false negatives with those checks. Python, incidentally, solves this by not checking the code up-front, but checking while executing, which, of course, has impacts on the performance.

    There’s also static analyzer to check those issues, and a lot of other (but again, limited by the halting problem). Simple ones include Splint and cppcheck. More complex, commercial ones exist too, for example Coverity (which offer gratis checks for FLOSS projects like OpenSSL). I’ve used Coverity for some of my open source project, and it’s in general a pretty awesome tool. It did not, however, find the Heartbleed bug [1][2].

    Another fine tool to use is valgrind, a dynamic analysis framework. It hooks into your program and analyzes the program while it runs. Its memcheck, for exampe, monitors memory allocations and usage, and gives you detailed info on what the program does, where it happens in the source code and additional information like where a memory block was allocated when it detects you’re accessing invalid addresses near it. Of course, this slows down code execution considerably, and it only finds problems that happen right now, not theoretical issues you’re not currently looking for.

    [1] http://blog.regehr.org/archives/1125
    [2] http://blog.regehr.org/archives/1128

  2. April 27, 2014 at 9:20 pm —

    This is sort of a minor side point from your article, but I think Bruce Schneier is wrong about Randall Munroe being wrong on the “correct horse battery staple” thing. Schneier says the method won’t work because crackers are onto the trick of combining dictionary words for cracking. But, Munroe’s entropy calculations effectively assume that the crackers are already doing that. In fact, the Munroe method makes the conservative assumption that the crackers also already know that this is the method being used to generate passwords, and the exact dictionary that the password-choser is using, so his calculations should actually be an underestimate of the difficulty of cracking such passwords.

    Of course, it’s important that the random words chosen are in fact random, not just arbitrarily selected by a person; people are pretty bad at generating true randomness.

Leave a reply