{"id":246,"date":"2021-09-29T07:48:28","date_gmt":"2021-09-29T14:48:28","guid":{"rendered":"https:\/\/blog.mozilla.org\/attack-and-defense\/?p=246"},"modified":"2021-10-13T10:19:29","modified_gmt":"2021-10-13T17:19:29","slug":"fixing-a-security-bug-by-changing-a-function-signature","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/attack-and-defense\/2021\/09\/29\/fixing-a-security-bug-by-changing-a-function-signature\/","title":{"rendered":"Fixing a Security Bug by Changing a Function Signature"},"content":{"rendered":"<p><span style=\"display:none\">&nbsp;<\/span><\/p>\n<h3><b>Or: The C Language Itself is a Security Risk, Exhibit #958,738<\/b><\/h3>\n<p>This post is aimed at people who are developers but who do not know C or low-level details about things like sign extension. In other words, if you&#8217;re a seasoned pro and you eat memory safety vulnerabilities for lunch, then this will all be familiar territory for you; our goal here is to dive deep into how integer overflows can happen in real code, and to break the topic down in detail for people who aren&#8217;t as familiar with this aspect of security.<\/p>\n<h1><b>The Bug<\/b><\/h1>\n<p>In July of 2020, I was sent <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1653371\">Mozilla bug 1653371<\/a> (later assigned <a href=\"https:\/\/cve.mitre.org\/cgi-bin\/cvename.cgi?name=CVE-2020-15667\">CVE-2020-15667<\/a>). The reporter had found a segfault due to heap overflow in the library that parses <a href=\"https:\/\/wiki.mozilla.org\/Software_Update:MAR\">MAR files<\/a><sup id=\"footnote-1-back\"><a href=\"#footnote-1\">1<\/a><\/sup>, which is the custom package format that\u2019s used in the Firefox\/Thunderbird application update system. So that doesn\u2019t sound great. (spoiler: it isn&#8217;t as bad as it sounds because that overflow happens after the MAR file has had its signature validated already)<\/p>\n<h1><b>The Fix<\/b><\/h1>\n<p><a href=\"https:\/\/hg.mozilla.org\/mozilla-central\/rev\/b79b6cc78248\">The patch<\/a> I wrote for this bug consists entirely of changing <a href=\"https:\/\/searchfox.org\/mozilla-central\/rev\/dafb74eec8028248324018e8cd32b93808e3fd5c\/modules\/libmar\/src\/mar_read.c#29\">one function signature<\/a> in a C source file from this:<\/p>\n<pre>static int mar_insert_item(MarFile* mar, const char* name, <span style=\"color: #0060df;\">int<\/span> namelen,\r\n                           uint32_t offset, uint32_t length, uint32_t flags)<\/pre>\n<p>to this:<\/p>\n<pre>static int mar_insert_item(MarFile* mar, const char* name, <span style=\"color: #0060df;\">uint32_t<\/span> namelen,\r\n                           uint32_t offset, uint32_t length, uint32_t flags)<\/pre>\n<p>I swear that is the entire patch. All I had to do was change the type of one of this function\u2019s parameters from <code>int<\/code> to <code>uint32_t<\/code>. Can that change really fix a security bug? It can, and it did, and I\u2019ll explain how. We have some background to cover first, though.<\/p>\n<h1><b>Background<\/b><\/h1>\n<p>The problem here comes down to numbers and how computers work with them, so let\u2019s talk a bit about that first<sup id=\"footnote-2-back\"><a href=\"#footnote-2\">2<\/a><\/sup>. Since the bug is in a file written in the C language, our discussion will be from that perspective, but I am going to try to explain things so that you don\u2019t need to know C or much at all about low-level programming in order to understand what happened.<\/p>\n<h2><b>Binary Numbers<\/b><\/h2>\n<p>Any number that your computer is going to work with has to be stored in terms of binary bits. The way those work isn\u2019t as complicated as it might seem.<\/p>\n<p>Think about how you write a number in decimal digits using place value. If we want to write the number one thousand, three hundred, and twelve, we need four digits: 1,312. What does each one of those digits mean? Well the rightmost 2 means\u2026 2. But the 1 next to that doesn\u2019t mean 1, it means 10. You take the digit itself and multiply that by 10 to get the value that\u2019s being represented there. And then as you go through the rest of the digits, you go up by another power of 10 for each one. The 3 doesn\u2019t mean either 3 or 30, it means 300, because it\u2019s being multiplied by 100. And the leftmost 1 gets multiplied by 1000.<\/p>\n<p>Guess what? Binary numbers work the same way. The only difference is, since binary only has two different digits, 0 and 1, it doesn\u2019t make any sense to use powers of 10; there\u2019d be loads of numbers we couldn\u2019t write, anything greater than 1 but less than 10 couldn\u2019t be represented. So instead of that, we use powers of 2. Each successive digit isn\u2019t multiplied by 1, 10, 100, 1000, etc., it\u2019s multiplied by 1, 2, 4, 8, etc.<\/p>\n<p>Let\u2019s look at a couple of examples. Here\u2019s the number twelve in binary: 1100. Why? Well, let\u2019s do the same thing we did with our decimal example, multiply each digit. I\u2019ll write out the whole thing this time:<\/p>\n<pre>1100\r\n\u2502\u2502\u2502\u2514\u2500 0 x (2 ^ 0) = 0 x 1 = 0\r\n\u2502\u2502\u2514\u2500\u2500 0 x (2 ^ 1) = 0 x 2 = 0\r\n\u2502\u2514\u2500\u2500\u2500 1 x (2 ^ 2) = 1 x 4 = 4\r\n\u2514\u2500\u2500\u2500\u2500 1 x (2 ^ 3) = 1 x 8 = 8\r\n\r\n0 + 0 + 4 + 8 = 12<\/pre>\n<p>There we go! We got 12. For each digit, we multiply its value by the power of 2 for that place value location (and the multiplication is pretty darn easy, because the only digits are 0 and 1), and then add up all those results. That\u2019s it!<\/p>\n<h3><b>Binary Addition<\/b><\/h3>\n<p>Now, what if we need to do some math? That\u2019s pretty much all computers are any good at, after all. Let\u2019s say we want to add something to a binary number.<\/p>\n<p>Well, we know how to do that in decimal: you add up each digit starting from the lowest one and carry over into the next digit if necessary. If you read the last section, you can probably guess what I\u2019m about to say: that\u2019s exactly what you do in binary too. Except again it\u2019s even easier because there\u2019s only two different digits.<\/p>\n<p>Let\u2019s have another simple example, 13 + 12. First we have to write both of those numbers in binary; we already know 12 is 1100, so 13 should just be one more than that, 1101. We\u2019ll add them up the same way we add decimal numbers by hand:<\/p>\n<pre>  1100\r\n+ 1101\r\n------\r\n ?????\r\n\r\nThe first two digits are easy, 0 + 1 = 1, and 0 + 0 = 0.\r\n\r\n  1100\r\n+ 1101\r\n------\r\n ???01<\/pre>\n<p>But now we have 1 + 1. Where do we go with that? There\u2019s no 2. Well, just like in decimal, we have to carry out of that digit; the sum of 1 and 1 in binary is 10 (because that\u2019s just binary for 2), so that means we need to write a 0 in that column and carry the 1.<\/p>\n<pre>  1\r\n  1100\r\n+ 1101\r\n------\r\n ??001<\/pre>\n<p>Only one digit to go. Again, it\u2019s 1 + 1, but now we have a 1 carried over from the previous digit. So really we have to do 1 + 1 + 1, which is 3 but in binary that\u2019s 11. This is the last column now, so we don\u2019t have to worry about carries anymore, we can just write that down:<\/p>\n<pre>  1\r\n  1100\r\n+ 1101\r\n------\r\n 11001<\/pre>\n<p>And we\u2019re done! 1100 + 1101 = 11001. And to prove we got the right answer, let\u2019s convert 11001 back to decimal, the same way we did before:<\/p>\n<pre>11001\r\n\u2502\u2502\u2502\u2502\u2514 1 x (2 ^ 0) = 1 x\u00a0 1 =\u00a0 1\r\n\u2502\u2502\u2502\u2514\u2500 0 x (2 ^ 1) = 0 x\u00a0 2 =\u00a0 0\r\n\u2502\u2502\u2514\u2500\u2500 0 x (2 ^ 2) = 0 x\u00a0 4 =\u00a0 0\r\n\u2502\u2514\u2500\u2500\u2500 1 x (2 ^ 3) = 1 x\u00a0 8 =\u00a0 8\r\n\u2514\u2500\u2500\u2500\u2500 1 x (2 ^ 4) = 1 x 16 = 16\r\n\r\n1 + 0 + 0 + 8 + 16 = 25<\/pre>\n<p>So now we <i>know<\/i> we were right; 12 + 13 = 25, and 1100 + 1101 = 11001. That\u2019s how you add numbers in binary.<\/p>\n<h3><b>Signed Integers and Two\u2019s Complement<\/b><\/h3>\n<p>So far we\u2019ve only talked about positive numbers, but that\u2019s not all computers can handle; sometimes you also need negative numbers. But you don\u2019t want <i>every<\/i> number to potentially be negative; a lot of the kinds of things that you need to keep track of in a program just cannot possibly be negative, and sometimes (as we\u2019ll see) allowing certain things to be negative can be actively harmful.<\/p>\n<p>So, computers (and many languages, including C) provide two different kinds of integers that the programmer can select between whenever they need an integer: \u201csigned\u201d or \u201cunsigned\u201d. \u201cSigned\u201d means that the number can be either negative or positive (or zero), and \u201cunsigned\u201d means it can only be positive (or zero)<sup id=\"footnote-3-back\"><a href=\"#footnote-3\">3<\/a><\/sup>.<\/p>\n<p>What we\u2019ve been talking about up to now are unsigned integers, so how do signed integers work? To start with, the first bit of the number isn\u2019t part of the number itself anymore, it\u2019s now the \u201csign bit\u201d. If the sign bit is 0, the number is nonnegative (either zero or positive), and if the sign bit is 1, the number is negative. But, when the sign bit is 1, we need a couple extra steps to convert between binary and decimal. Here\u2019s the procedure.<\/p>\n<ol>\n<li aria-level=\"1\">Discard the sign bit before doing anything else.<\/li>\n<li aria-level=\"1\">Invert all the other bits in the number, meaning make every 1 a 0 and vice versa.<\/li>\n<li aria-level=\"1\">Convert that binary number (the one with the bits flipped) to decimal the usual way.<\/li>\n<li aria-level=\"1\">Add 1 to that result.<\/li>\n<\/ol>\n<p>This operation, with the inversion and the adding 1, is called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Two%27s_complement\">\u201ctwo\u2019s complement\u201d<\/a>, and it\u2019ll get you the value of the negative number. Let\u2019s go through another simple example.<\/p>\n<p>Let\u2019s say we have a signed 8-bit integer and the value is 11010110. What is that in decimal? Well, we see right away that the sign bit is set, so we need to take the two\u2019s complement. First, we need to flip all the bits except the sign bit, so that gets us 0101001. Now we convert that to decimal and add 1.<\/p>\n<pre>0101001\r\n\u2502\u2502\u2502\u2502\u2502\u2502\u2514 1 x (2 ^ 0) = 1 x\u00a0 1 =\u00a0 1\r\n\u2502\u2502\u2502\u2502\u2502\u2514\u2500 0 x (2 ^ 1) = 0 x\u00a0 2 =\u00a0 0\r\n\u2502\u2502\u2502\u2502\u2514\u2500\u2500 0 x (2 ^ 2) = 0 x\u00a0 4 =\u00a0 0\r\n\u2502\u2502\u2502\u2514\u2500\u2500\u2500 1 x (2 ^ 3) = 1 x\u00a0 8 =\u00a0 8\r\n\u2502\u2502\u2514\u2500\u2500\u2500\u2500 0 x (2 ^ 4) = 0 x 16 =\u00a0 0\r\n\u2502\u2514\u2500\u2500\u2500\u2500\u2500 1 x (2 ^ 4) = 1 x 32 = 32\r\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\r\n\r\n1 + 0 + 0 + 8 + 0 + 32 = 41\r\n\r\n41 + 1 = 42<\/pre>\n<p>Now just remember to add back the negative sign, and we get -42. That\u2019s our number! 11010110 interpreted as a signed integer is -42.<\/p>\n<h4><i>Why?<\/i><\/h4>\n<p>Why do we bother with any of this? Why not do something simple like have the sign bit and then just the regular number<sup id=\"footnote-4-back\"><a href=\"#footnote-4\">4<\/a><\/sup>? Well, the two\u2019s complement representation has one huge advantage: you can completely disregard it while doing basic arithmetic. The exact same hardware and logic can do arithmetic on both unsigned numbers and signed two\u2019s complement numbers<sup id=\"footnote-5-back\"><a href=\"#footnote-5\">5<\/a><\/sup>. That means the hardware is simpler, which means it\u2019s smaller, cheaper, and faster. That mattered more in the early days of digital computers, which is why two\u2019s complement caught on as the standard, and it\u2019s still with us today.<\/p>\n<h4><i>Sign Extension<\/i><\/h4>\n<p>There\u2019s one other neat trick two\u2019s complement let\u2019s us do that we need to talk about. Integers in computers have a fixed \u201cwidth\u201d, or number of bits that are used to represent them. Wider integers can represent larger (or more negative) numbers, but take up more space in the computer\u2019s memory. So to balance those concerns, languages like C give the programmer access to a few different bit widths to choose from for their integers.<\/p>\n<p>So, what happens if we need to do some arithmetic between integers that are different widths, or just pass an integer into a function that\u2019s narrower than the function expects? We need a way to make an integer wider. If it\u2019s unsigned, that\u2019s easy; copy over the same value into the lower (right-hand) bits and then fill in the new high bits with 0\u2019s, and you\u2019ll have the same value, just now with more bits.<\/p>\n<p>But what if we need to widen a signed integer? Two\u2019s complement\u2019s here to save the day with a solution called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sign_extension\">\u201csign extension\u201d<\/a>. It turns out all we have to do to make a two\u2019s complement integer wider is copy over the same value into the low bits and then fill in the new high bits with copies of the sign bit. That\u2019s it.<\/p>\n<p>It\u2019s easy to see why that\u2019s correct if we think about how two\u2019s complement works. If the number is positive (the sign bit is 0), then it\u2019s the same as for an unsigned number, we\u2019ll fill in the new space with all zeroes and nothing changes. And if the number is negative (the sign bit is 1), then we\u2019ll fill in the new space with 1 bits, but the two\u2019s complement operation means those bits all get inverted into 0\u2019s when we need to get the number\u2019s value, so <i>still<\/i> nothing changes. These simple, efficient operations are why two\u2019s complement is so neat, despite seeming weird and overcomplicated at first.<\/p>\n<h3><b>Hexadecimal Numbers<\/b><\/h3>\n<p>I\u2019m going to use a few hexadecimal numbers in this article, but don\u2019t worry, I\u2019m not going to try to teach you how to work in a whole different number system yet again. You can think of hexadecimal as a shorthand for binary numbers. Hexadecimal (\u201chex\u201d for short) uses the decimal digits 0-9 and also the letters A-F, for 16 possible digits total. Since each digit can have 16 values, each one can stand in for four binary digits.<\/p>\n<p>Also, hex numbers in C and elsewhere are written starting with 0x. That\u2019s not part of the number, it\u2019s just telling you that the thing after it is written in hex so that you know how to read it.<\/p>\n<p>You don\u2019t need to know how to do any arithmetic directly on hex numbers or anything like that, just see how they convert to binary bits. Here\u2019s the conversions of individual hex digits to binary bits:<\/p>\n<pre>Binary\u00a0 Hex\r\n======\u00a0 ===\r\n 0000\u00a0 \u00a0 0\r\n 0001\u00a0 \u00a0 1\r\n 0010\u00a0 \u00a0 2\r\n 0011\u00a0 \u00a0 3\r\n 0100\u00a0 \u00a0 4\r\n 0101\u00a0 \u00a0 5\r\n 0110\u00a0 \u00a0 6\r\n 0111\u00a0 \u00a0 7\r\n 1000\u00a0 \u00a0 8\r\n 1001\u00a0 \u00a0 9\r\n 1010\u00a0 \u00a0 A\r\n 1011\u00a0 \u00a0 B\r\n 1100\u00a0 \u00a0 C\r\n 1101\u00a0 \u00a0 D\r\n 1110\u00a0 \u00a0 E\r\n 1111\u00a0 \u00a0 F<\/pre>\n<h2><b>Implicit Conversions in C<\/b><\/h2>\n<p>In C, unlike some languages, there are a bunch of different types that represent different ways of storing numbers; basically, every kind and size of number that CPU\u2019s can work with has its own type in C. There\u2019s also a \u201cdefault\u201d integer type, which is called int. How many bits are in an int depends on the C compiler you\u2019re using (and on its settings)<sup id=\"footnote-6-back\"><a href=\"#footnote-6\">6<\/a><\/sup>, but it is guaranteed by the language standard to be signed.<\/p>\n<p>Since C has so many different kinds of numbers, it\u2019s common to need to convert between them. It\u2019s so common in fact that the language designers decided to make those conversions mostly automatic. That means that, for instance, this code compiles and runs as you\u2019d probably expect:<\/p>\n<pre><code>#include &lt;math.h&gt; <i>\/\/ to get the declaration for sqrt()<\/i>\r\n\r\nlong long geometric_mean(int a, int b) {\r\n<b>  return<\/b> sqrt(a * b);\r\n}\r\n\r\nint main() {\r\n  int a = 42;\r\n  long b = 13;\r\n  double mean = geometric_mean(a, b);\r\n<b>  return<\/b> mean;\r\n}<\/code><\/pre>\n<p>Even though none of the types in that code match up at all, the compiler just makes everything work for us. Nice of it, eh? These automatic \u201cfixes\u201d are called implicit conversions, and <a href=\"https:\/\/en.cppreference.com\/w\/c\/language\/conversion\">the rules for how they work<\/a> are long and not always very intuitive. This is a pretty major gotcha of C programming, because it happens without the programmer even seeing it, you just have to know these things are happening and realize all the implications.<\/p>\n<h1><b>How the Bug Works<\/b><\/h1>\n<p>That should be all the background we need to understand what went wrong here. Now, let\u2019s have another look back at that original, unpatched function declaration:<\/p>\n<pre><code>static int mar_insert_item(MarFile* mar, const char* name, int namelen,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 uint32_t offset, uint32_t length, uint32_t flags)<\/code><\/pre>\n<p>The first two parameters are an internal data structure and a text string, they aren\u2019t relevant here. But after that we see an int parameter, which is meant to contain the length of the string parameter (in C, strings don\u2019t know their own length, the programmer has to keep track of that if they need it).<\/p>\n<p>A few lines into the mar_insert_item function, we find this call:<\/p>\n<pre><code>memcpy(item-&gt;name, name, namelen + 1);<\/code><\/pre>\n<p>I\u2019ll explain what this line is for before we move on. The mar_insert_item function is part of a procedure that reads the index of all the files contained in the MAR package (it\u2019s kind of like a ZIP file, it can contain a bunch of different files and compress them all, and you can extract the whole thing or just individual files). mar_insert_item is called repeatedly, once for each compressed file, and each call adds one entry to the index that\u2019s being gradually built up. This specific line just copies the file\u2019s name into that index entry; memcpy of course is short for \u201cmemory copy\u201d, and its parameters are the destination to copy to (which is the name field of the item we\u2019re adding to our index), the source to copy from (the name string was passed into mar_insert_item in the first place), and the amount of memory that needs to be copied, in bytes. That last parameter is where everything goes wrong.<\/p>\n<p>What do you think would happen if mar_insert_item is called with namelen set to the highest positive value it can store, which is 0x7fffffff? Well then, in this one line of code, the program does all of these things:<\/p>\n<ol>\n<li aria-level=\"1\">A 1 gets added to <code>namelen<\/code><sup id=\"footnote-7-back\"><a href=\"#footnote-7\">7<\/a><\/sup>. But I just said <code>namelen<\/code> already has the highest positive value it can store, so something has to give. The C language standard doesn\u2019t define what happens in this case, but in practice what you get on most computers is\u2026 the addition just happens anyway. So we get the value 0x80000000. But <code>namelen<\/code> is a signed integer, and that value has its sign bit set! We\u2019ve added 1 to a positive number and it transformed into a negative number. -2,147,483,648 to be precise<sup id=\"footnote-8-back\"><a href=\"#footnote-8\">8<\/a><\/sup>. Computers are weird. And we\u2019re not even done yet.<\/li>\n<li aria-level=\"1\"><code>memcpy<\/code> takes a 64-bit value, so our temporary value has to get extended from 32 bits to 64. That means a sign extension; we take the most significant bit, which is a 1, and copy it into 32 new bits, getting us the value 0xFFFFFFFF80000000. Remember, sign extension preserves the two\u2019s complement value, so the decimal version of that number is still -2,147,483,648, it didn\u2019t change during this step.<\/li>\n<li aria-level=\"1\">The length parameter that <code>memcpy<\/code> takes is also supposed to be unsigned, so now that the value has been extended to 64 bits, we take those bits and interpret them as an unsigned number. We no longer have -2,147,483,648, we now have positive 9,223,372,036,854,775,807. As a byte length, that\u2019s over a trillion terabytes<sup id=\"footnote-9-back\"><a href=\"#footnote-9\">9<\/a><\/sup>. Fair to say that\u2019s more bytes than we could have really meant to be copying here.<\/li>\n<li aria-level=\"1\">Finally, memcpy is called, and it starts trying to copy from name into <code>item-&gt;name<\/code>. But because of that sign extension and unsigned reinterpretation, we can see that it\u2019s going to try to copy waaaaay more bytes than are actually there. So what <code>memcpy<\/code> ends up doing is copying all the bytes that are there (<code>memcpy<\/code> does its best for us even when we feed it junk), and then\u2026 crashing the program.<\/li>\n<\/ol>\n<p>And that\u2019s the bug; the updater crashes right here.<\/p>\n<h2><b>How the Fix Works<\/b><\/h2>\n<p>Now, with all that background, the fix makes perfect sense. Changing the parameter\u2019s type means that the conversion to unsigned happens at the time <code>mar_insert_item<\/code> is called, and at that point the value being passed in is still a positive number, so converting it then is harmless (in fact it\u2019s just nothing, that operation doesn\u2019t do anything at all at that point). And then the + 1 is done to an unsigned number, so it\u2019s harmless too, and there\u2019s no sign extension to ever do because the thing being passed to <code>memcpy<\/code> is no longer signed. Everything gets a lot simpler to understand, and simultaneously more correct.<\/p>\n<h1><b>Takeaways<\/b><\/h1>\n<h2><b>Don\u2019t Use C<\/b><\/h2>\n<p>Implicit conversions are a misfeature. What they give you in convenience is more than erased by the potential for invisible bugs. More recently designed languages tend to be more strict about this sort of thing, Rust for instance just <a href=\"https:\/\/doc.rust-lang.org\/rust-by-example\/types\/cast.html\">doesn\u2019t have these kinds of implicit conversions at all<\/a>, but C is from the 1970\u2019s and It Made Sense At The Time\u2122. But in C these things can\u2019t really be avoided, they\u2019re baked into the language. I\u2019d very much recommend using another language for any new programs you work on, for this and a variety of other reasons<sup id=\"footnote-10-back\"><a href=\"#footnote-10\">10<\/a><\/sup>.<\/p>\n<h2><b>Layers of Security<\/b><\/h2>\n<p>This bug wasn\u2019t exploitable in practice, partly because it\u2019s just in an awkward place to exploit, but also because Firefox requires update files to be digitally signed by Mozilla or they won\u2019t be read (beyond the minimum needed to check the signature), much less applied. That means that anybody wanting to attack Firefox users via this bug would also have to compromise Mozilla\u2019s build infrastructure and use it to sign their own malicious MAR file. Having that additional layer of security makes most issues surrounding MAR files much much less concerning.<\/p>\n<h2><b>You Can Do Systems Programming<\/b><\/h2>\n<p>Something I\u2019ve hoped to get across (and I acknowledge this may not be the ideal topic to make this point, but it\u2019s an important point to me) is that low-level (\u201csystems\u201d) programming isn\u2019t magic or really special in any way. It\u2019s true there\u2019s a lot going on and there\u2019s lots of little details, but that\u2019s true for any kind of programming, or anything else involving computers at all to be honest. Everything involved here was invented and built by people, and it can all be broken down and understood. And that\u2019s the message I want to sign off with: you can do systems programming. It\u2019s not too hard. It\u2019s not too complicated. It\u2019s not limited to just \u201cexperts\u201d. You are smart and capable and you can do the thing.<\/p>\n<hr \/>\n<footer id=\"footer\">\n<p><small id=\"footnote-1\">1. A fair question to ask here would be why we even have our own package format. There\u2019s a few reasons and you can read the original discussion from back when the format was first introduced if you\u2019re interested, but the main benefit nowadays is that we\u2019re able to locate and validate the package\u2019s signature before really having to parse anything else. In fact, the bug that this post is about doesn\u2019t get hit until after the MAR file has passed signature validation, so it could only be exploited using either a properly signed MAR or a custom build of Firefox\/Thunderbird\/whatever other application that disables MAR signing.\u00a0<a href=\"#footnote-1-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-2\">2. I\u2019m only going to talk about integers, because numbers that have a decimal or fraction part work very differently (and can be implemented a few different ways), and they aren\u2019t relevant here. <a href=\"#footnote-2-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-3\">3. You almost never need numbers that can only be either negative or zero, so neither hardware nor languages generally support those, you\u2019d just have to use a signed integer in that case. <a href=\"#footnote-3-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-4\">4. That is a real thing called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Signed_number_representations#Signed_magnitude_representation\">signed magnitude<\/a> and it is used for certain things, but not standard integers in modern computers. <a href=\"#footnote-4-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-5\">5. If you\u2019re curious about the math that explains why this is the case, I\u2019ll direct you to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Two's_complement#Why_it_works\">Wikipedia\u2019s proof<\/a>; I\u2019ve spent enough time in the weeds for one blog post already. <a href=\"#footnote-5-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-6\">6. Theoretically int is meant to be whatever size the computer hardware you\u2019re compiling your program for finds most convenient to work with (its \u201cword size\u201d), so it would be 32 bits on a 32-bit CPU and 64 bits on a 64-bit CPU. In practice though, for backwards compatibility reasons, int is usually 32 bits on all but pretty specialized hardware. It\u2019s best never to depend on int being any particular size and to use the type that specifically represents a particular size if you need to be sure; for instance if you know you need exactly 32 bits, use int32_t, not int. <a href=\"#footnote-6-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-7\">7. If you don\u2019t know C, you might be wondering what the + 1 is even for. It\u2019s a little out of scope for this post, but in short, since as we mentioned earlier C strings don\u2019t keep track of their length, if you don\u2019t store that length off somewhere (and typically you don\u2019t), you need some other way to find where the string ends. That\u2019s done by adding one character made up of all zero bits to the end of the string, called a \u201cnull terminator\u201d, so when you\u2019re reading a string and you encounter a null character, then you know the string is over. Most C coding conventions have you leave the terminator out of the length, so whenever you\u2019re doing something that needs to account for the terminator (like copying it, because then you have to copy the terminator also), you have to add 1 to the length so that you have space for it. C programming is full of fiddly details like this.\u00a0<a href=\"#footnote-7-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-8\">8. This problem shows up so often and is such a common source of security bugs that it gets its own name, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Integer_overflow\">integer overflow<\/a>. On Wikipedia you&#8217;ll find lots of famous examples and different ways to combat the issue.\u00a0<a href=\"#footnote-8-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-9\">9. AKA one yottabyte. I swear that is really what it\u2019s called.\u00a0<a href=\"#footnote-9-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<p><small id=\"footnote-10\">10. Yes, I acknowledge there are certain circumstances where you really must write things in C, or maybe C++ if you\u2019re lucky. If you have one of those situations, then you already know whatever I could tell you. If you don\u2019t, then don\u2019t use C. And don\u2019t @ me.\u00a0<a href=\"#footnote-10-back\">\u21a9\ufe0e<\/a><\/small><\/p>\n<\/footer>\n<style>html { scroll-padding-top: 2.2em; }<\/style>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Or: The C Language Itself is a Security Risk, Exhibit #958,738 This post is aimed at people who are developers but who do not know C or low-level details &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/attack-and-defense\/2021\/09\/29\/fixing-a-security-bug-by-changing-a-function-signature\/\">Read more<\/a><\/p>\n","protected":false},"author":1871,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[448815,449530],"tags":[],"coauthors":[465194],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/posts\/246"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/users\/1871"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/comments?post=246"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/posts\/246\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/media?parent=246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/categories?post=246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/tags?post=246"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/attack-and-defense\/wp-json\/wp\/v2\/coauthors?post=246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}