{"id":162,"date":"2013-04-11T16:32:09","date_gmt":"2013-04-11T16:32:09","guid":{"rendered":"http:\/\/blog.mozilla.org\/nfroyd\/?p=162"},"modified":"2013-04-11T16:32:09","modified_gmt":"2013-04-11T16:32:09","slug":"introducing-mozillaendian-h","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nfroyd\/2013\/04\/11\/introducing-mozillaendian-h\/","title":{"rendered":"introducing mozilla\/Endian.h"},"content":{"rendered":"<p>In the continuing effort <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=796941\">to eliminate usage of <tt>prtypes.h<\/tt> from the tree<\/a>, a significant obstacle is the usage of <code>IS_LITTLE_ENDIAN<\/code> and <code>IS_BIG_ENDIAN<\/code> in various places.\u00a0 These macros are defined if the target platform is little-endian or big-endian, respectively.\u00a0 (All of our tier-1 platforms are little-endian platforms.\u00a0 Pay no attention to the big-endian ARM variants; there be dragons.)\u00a0 If you search for these identifiers, you&#8217;ll find that their uses fall into three broad categories:<\/p>\n<ol>\n<li>Defining byte-swapping macros.\u00a0 Various amounts of attention are paid to using compiler intrinsics for the byte swaps.<\/li>\n<li>Conditionally compiling byte-swapping functionality.<\/li>\n<li>Taking slightly different actions or using different data according to the endianness of the target.<\/li>\n<\/ol>\n<p>Point 1 is bad because we&#8217;re not always using the most efficient code possible to perform the swap.\u00a0 Using functions would be preferable to gain the benefit of type checking and defined argument evaluation.\u00a0 Depending on where you looked in the tree, sometimes the argument was modified in-place and sometimes it was returned as a value, so consistency suffers.\u00a0 And IMHO, stumbling upon:<\/p>\n<pre>SWAP(value);<\/pre>\n<p>in code is not that informative.\u00a0 Am I swapping to little-endian or from big-endian or something else?\u00a0 More explicit names would be good.\u00a0 Point 2 is bad because <code>#ifdef<\/code>-ery clutters the code and we may not be compiling the <code>#ifdef<\/code>&#8216;d code all the time, which may lead to bitrot.<\/p>\n<p><tt>mfbt\/Endian.h<\/tt>, which landed last week in <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=798172\">bug 798172<\/a>, is a significant step towards addressing the first two issues above.\u00a0 <tt>Endian.h<\/tt> provides faster, clearer functions for byte-swapping functionality and also enables the byte-swapping to be compiled away depending on the target platform.\u00a0 While it doesn&#8217;t address point 3 directly, it does provide <code>MOZ_LITTLE_ENDIAN<\/code> and <code>MOZ_BIG_ENDIAN<\/code> macros as an alternative to <code>IS_LITTLE_ENDIAN<\/code> and <code>IS_BIG_ENDIAN<\/code>.\u00a0 Since <code>MOZ_LITTLE_ENDIAN<\/code> and <code>MOZ_BIG_ENDIAN<\/code> are always defined, <tt>Endian.h<\/tt> means that previously <code>#ifdef<\/code>&#8216;d code can now be written (where possible) as straight C++ code, making things more readable.\u00a0 And <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=807879#c5\">there are ideas for how to address point 3<\/a> more directly.<\/p>\n<p>Enough talk; what about the bits and bytes?<\/p>\n<p>As previously mentioned, <tt>Endian.h<\/tt> <code>#define<\/code>s <code>MOZ_LITTLE_ENDIAN<\/code> and <code>MOZ_BIG_ENDIAN<\/code>.\u00a0 <code>MOZ_LITTLE_ENDIAN<\/code> is equal to 1 if we&#8217;re targeting a little-endian platform and 0 otherwise.\u00a0 Likewise, <code>MOZ_BIG_ENDIAN<\/code> is equal to 1 if we&#8217;re targeting a big-endian platform and 0 otherwise.\u00a0 The intent is these are legacy macros.\u00a0 You shouldn&#8217;t have to use them in newly written Mozilla code, though they may come in handy for interfacing with external libraries that need endianness information.<\/p>\n<p>The next major piece of functionality is a family of functions that read 16-, 32-, or 64-bit signed or unsigned quantities in a given endianness.\u00a0 The intent here is to replace code written like:<\/p>\n<pre>v1 = SWAP(*(uint32_t*)pointer);\r\n*(int64_t*)other_pointer = SWAP(v2);<\/pre>\n<p>only with clearer code that&#8217;s free of aliasing and (mis-)alignment issues:<\/p>\n<pre>v1 = mozilla::BigEndian::readUint32(pointer);\r\nmozilla::BigEndian::writeInt64(other_pointer, v2);<\/pre>\n<p>The other read and write functions are named similarly. And of course there&#8217;s <code>mozilla::LittleEndian::readUint32<\/code> and so forth as well. As a concession to readability (no code uses this yet, so we&#8217;re not sure how useful it is), there&#8217;s also <code>mozilla::NetworkOrder<\/code> which functions exactly the same as <code>mozilla::BigEndian<\/code>.<\/p>\n<p>In an ideal world, <a href=\"http:\/\/commandcenter.blogspot.com\/2012\/04\/byte-order-fallacy.html\">those are all the functions that you&#8217;d need<\/a>.\u00a0 But looking through the code that needed to do byte-swapping, it often seemed that some sort of swap primitive was more convenient than reading or writing in defined endiannesses.\u00a0 Who knows?\u00a0 Maybe when the whole tree has been converted over to <tt>Endian.h<\/tt>, we&#8217;ll find that the swap primitives are completely unnecessary and eliminate them. However, in a partial-converted and not-quite-so-ideal world, we have byte swaps all over.<\/p>\n<p>Accordingly, the last major piece of functionality deals with byte swap primitives.\u00a0 But these swap primitives specify the direction in which you&#8217;re swapping, so as to make the code more self-documenting.\u00a0 For instance, maybe you had:<\/p>\n<pre>struct Header {\r\n  uint32_t magic;\r\n  uint32_t total_length;\r\n  uint64_t checksum;\r\n} header;\r\nfread(&amp;header, sizeof(Header), 1, file);\r\nheader.magic = SWAP(header.magic);\r\nheader.total_length = SWAP(header.total_length);\r\nheader.checksum = SWAP64(header.checksum);<\/pre>\n<p>Assuming that the header was stored in little-endian order, you&#8217;d use <tt>Endian.h<\/tt> functions thusly:<\/p>\n<pre>struct Header {\r\n  uint32_t magic;\r\n  uint32_t total_length;\r\n  uint64_t checksum;\r\n} header;\r\nfread(&amp;header, sizeof(Header), 1, file);\r\nheader.magic = mozilla::NativeEndian::swapFromLittleEndian(header.magic);\r\nheader.total_length = mozilla::NativeEndian::swapFromLittleEndian(header.total_length);\r\nheader.checksum = mozilla::NativeEndian::swapFromLittleEndian(header.checksum);<\/pre>\n<p>You could write this using <code>LittleEndian::readUint{32,64}<\/code>. But it&#8217;s a little more straightforward to write it with swaps instead. In a similar fashion, there&#8217;s <code>NativeEndian::swapToLittleEndian<\/code>.<\/p>\n<p>You can replace <tt>LittleEndian<\/tt> with <tt>BigEndian<\/tt> or <tt>NetworkOrder<\/tt> in these single-element swap functions and all the functions below with the obvious change to the behavior.<\/p>\n<p>Single-element swaps solve a lot of problems. But the following coding pattern was semi-common:<\/p>\n<pre>void* pointer2;\r\n...\r\nmemcpy(pointer1, pointer2, n_elements * sizeof(*pointer1));\r\n#if defined(IS_BIG_ENDIAN)\r\nfor (size_t i = 0; i &lt; n_elements; i++) {\r\n  pointer1[i] = SWAP(pointer1[i]);\r\n}\r\n#endif<\/pre>\n<p>Again, this could be written with <code>LittleEndian::readUint32<\/code> or similar. But that loses the benefits of <code>memcpy<\/code> on little-endian platforms (which are the common case for us). Depending on the type of <code>pointer2<\/code>, there might be some ugly casting and pointer arithmetic involved too. So <tt>Endian.h<\/tt> also includes &#8220;bulk swap&#8221; primitives:<\/p>\n<pre>mozilla::NativeEndian::copyAndSwapFromLittleEndian(pointer1, pointer2, n_elements);<\/pre>\n<p>which will do a straight <code>memcpy<\/code> on a little-endian platform and whatever copying + swapping is necessary on a big-endian platform. As you might expect by now, there&#8217;s also <code>NativeEndian::copyAndSwapToLittleEndian<\/code>. And since the related but slightly different:<\/p>\n<pre>uint32_t* pointer = new uint32_t[length];\r\n...\r\nfread(pointer, sizeof(*pointer), length, file);\r\n#if defined(IS_BIG_ENDIAN)\r\nfor (size_t = 0; i &lt; length; ++i) {\r\n  pointer[i] = SWAP(pointer[i]);\r\n}\r\n#endif<\/pre>\n<p>was also semi-common, the functions <code>NativeEndian::swapFromLittleEndianInPlace<\/code> and <code>NativeEndian::swapToLittleEndianInPlace<\/code> were also provided:<\/p>\n<pre>uint32_t* pointer = new uint32_t[length];\r\n...\r\nfread(pointer, sizeof(*pointer), length, file);\r\nmozilla::NativeEndian::swapFromLittleEndianInPlace(pointer, length);<\/pre>\n<p>All the <code>NativeEndian<\/code> functions are actually templates, so they&#8217;ll work with 16-, 32-, or 64-bit signed or unsigned variables. They&#8217;ll also byteswap things like <code>wchar_t<\/code> and <code>PRUnichar<\/code>, though compilation will fail if you attempt to byteswap non-integer things like <code>double<\/code>s or pointers.<\/p>\n<p>Let the converting begin!\u00a0 Makoto Kato has already begun <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=857957\">by eliminating <code>NS_SWAP{16,32,64}<\/code> and replacing them with their <tt>Endian.h<\/tt> equivalents<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the continuing effort to eliminate usage of prtypes.h from the tree, a significant obstacle is the usage of IS_LITTLE_ENDIAN and IS_BIG_ENDIAN in various places.\u00a0 These macros are defined if the target platform is little-endian or big-endian, respectively.\u00a0 (All of our tier-1 platforms are little-endian platforms.\u00a0 Pay no attention to the big-endian ARM variants; there [&hellip;]<\/p>\n","protected":false},"author":320,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/162"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/users\/320"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/comments?post=162"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/162\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/media?parent=162"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/categories?post=162"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/tags?post=162"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}