Preventing data leaks by stripping path information in HTTP Referrers

To help prevent third party data leakage while browsing privately, Firefox Private Browsing Mode will remove path information from referrers sent to third parties starting in Firefox 59.

Referrers can leak sensitive data

Screenshot of healthcare.gov requests. Source: EFF

An example of personal health data being sent to third parties from healthcare.gov. Source: EFF

When you click a link in your browser to navigate to a new site, the new site you visit receives the exact address of the site you came from through the so-called “Referrer value”. For example, if you came to this Mozilla Security Blog from reddit.com, the browser would send blog.mozilla.org this:

Referer: https://www.reddit.com/r/privacy/comments/Preventing_data_leaks_by_stripping_path_information_in_HTTP_Referrers/

This leaks user data to websites, telling websites the exact page you were looking at when you clicked the link. To make things worse, browsers also send a referrer value when requesting sub-resources, like ads, or other social media snippets integrated in a modern web site. In other words, embedded content also knows exactly what page you are visiting

Most sites log this data for operational and statistical purposes. Many sites also log this data to collect as much information about their users as possible.  They can then use that data for a variety of purposes, or even sell that data – e.g., for re-targeting.

While the data above may not be a problem, consider this example:

Referer: https://www.
healthcare.gov/see-plans/85601/results/?county=04019&age=40&smoker=1&pregnant=1&zip=85601&state=AZ&income=35000

EFF researchers discovered this leak of personal health data from healthcare.gov to DoubleClick. As indicated, the referrer in this case leaks information about your age, your zip code, whether you are a smoker or not, and potentially even your income. Other companies (link1, link2) have disclosed similar vulnerabilities and leaks.

Private Browsing will strip paths in HTTP referrers

Screenshot: Firefox Private Browsing window

To prevent this type of data leakage when Firefox users are browsing privately, we are changing the way Firefox sends referrers in Private Browsing Mode.

Starting with Firefox 59, Private Browsing will remove path information from referrer values sent to third parties (i.e. technically, setting a Referrer Policy of strict-origin-when-cross-origin).

In the previous examples, this setting would remove the path and query string data from the referrer values so that they are stripped down to:

Referer: https://www.reddit.com/

and

Referer: https://www.healthcare.gov/

This change prevents site authors from accidentally leaking user data to third parties when their users choose Private Browsing Mode.  We made this change only after first ensuring that this would have minimal to no effect on web usability.

Other ways of controlling referrers

Vendors and authors continue to propose changes to Referrers to improve web privacy, security, and functionality.

In 2014, the W3C Web Application Security Working Group started its Referrer Policy Recommendation. This Policy lets vendors and authors control referrer values. For example, it defines a secure-by-default no-referrer-when-downgrade policy for user agents, which does not send referrers to HTTP resources from an HTTPS page. In Firefox Regular and Private Browsing Mode, if a site specifically sets a more restrictive or more liberal Referrer Policy than the browser default, the browser will honor the websites request since the site author is intentionally changing the value.

Users can also change their default referrer options in Firefox.  These will override the browser’s default Referrer Policy and override the site author’s Referrer Policy, putting the users choice first.

 

January 2018 CA Communication

Mozilla has sent a CA Communication to inform Certificate Authorities (CAs) who have root certificates included in Mozilla’s program about current events related to domain validation for SSL certificates and to remind them of a number of upcoming deadlines. This CA Communication has been emailed to the Primary Point of Contact (POC) and an email alias for each CA in Mozilla’s program, and they have been asked to respond to the following 6 action items:

  1. Disclose Use of Baseline Requirements Methods 3.2.2.4.9 or 3.2.2.4.10 for Domain Validation – Recently discovered vulnerabilities in these methods of domain validation have led Mozilla to require CAs to disclose their use of these methods and to describe how they have mitigated these vulnerabilities.
  2. Disclose Use of Methods 3.2.2.4.1 or 3.2.2.4.5 for Domain Validation – Significant concerns were recently raised about the reliability of these methods that are defined in the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates.
  3. Disclose All Non-Technically-Constrained Subordinate CA Certificates – CAs have until April 15, 2018 to disclose all non-technically constrained subordinate CA certificates – including subordinate CA certificates that are constrained via EKU to S/MIME but do not have Name Constraints – as required by version 2.5 of Mozilla’s Root Store Policy.
  4. Complete BR Self Assessment – Mozilla has asked all CAs to complete a Baseline Requirements Self-Assessment by January 31, 2018, or by April 15, 2018 if an extension was requested.
  5. Update CP/CPS to Comply with version 2.5 of Mozilla’s Root Store Policy – In the November 2017 CA Communication, a number of CAs indicated that their CP/CPS does not yet comply with version 2.5 of the Mozilla Root Store Policy. The deadline for compliance has been extended to April 15, 2018.
  6. Reduce SSL Certificate Validity Periods to 825 Days or Less by March 1, 2018 – On March 17, 2017, in ballot 193, the CA/Browser Forum set a deadline of March 1, 2018 after which newly-issued SSL certificates must not have a validity period greater than 825 days, and the re-use of validation information must be limited to 825 days.

The full action items can be read here. Responses to the survey will be automatically and immediately published by the CCADB.

With this CA Communication, we reiterate that participation in Mozilla’s CA Certificate Program is at our sole discretion, and we will take whatever steps are necessary to keep our users safe. Nevertheless, we believe that the best approach to safeguard that security is to work with CAs as partners, to foster open and frank communication, and to be diligent in looking for ways to improve.

Mozilla Security Team

Secure Contexts Everywhere

Since Let’s Encrypt launched, secure contexts have become much more mature. We have witnessed the successful restriction of existing, as well as new features to secure contexts. The W3C TAG is about to drastically raise the bar to ship features on insecure contexts. All the building blocks are now in place to quicken the adoption of HTTPS and secure contexts, and follow through on our intent to deprecate non-secure HTTP.

Requiring secure contexts for all new features

Effective immediately, all new features that are web-exposed are to be restricted to secure contexts. Web-exposed means that the feature is observable from a web page or server, whether through JavaScript, CSS, HTTP, media formats, etc. A feature can be anything from an extension of an existing IDL-defined object, a new CSS property, a new HTTP response header, to bigger features such as WebVR. In contrast, a new CSS color keyword would likely not be restricted to secure contexts. Additionally, to avoid fracturing ecosystems that extend beyond the web, core language features and builtin libraries of JavaScript and WebAssembly will likely not be restricted to secure contexts.

Requiring secure contexts in standards development

Everyone involved in standards development is strongly encouraged to advocate requiring secure contexts for all new features on behalf of Mozilla. Any resulting complication should be raised directly against the Secure Contexts specification.

Exceptions to requiring secure contexts

There is room for exceptions, provided justification is given to the dev.platform mailing list. This can either be inside the “Intent to Implement/Ship” email or a separate dedicated thread. It is up to Mozilla’s Distinguished Engineers to judge the outcome of that thread and ensure the dev.platform mailing list is notified. Expect to be granted an exception if:

  • other browsers already ship the feature insecurely
  • it can be demonstrated that requiring secure contexts results in undue implementation complexity.

Secure contexts and legacy features

Features that have already shipped in insecure contexts, but are deemed more problematic than others from a security, privacy, or UX perspective, will be considered on a case-by-case basis. Making those features available exclusively to secure contexts should follow the guidelines for removing features as appropriate.

Developer tools and support

To determine whether features are available developers can rely on feature detection. E.g., by using the @supports at-rule in CSS. This is recommend over the self.isSecureContext API as it is a more widely applicable pattern.

Mozilla will provide developer tools to ease the transition to secure contexts and enable testing without an HTTPS server.

Mitigations landing for new class of timing attack

Several recently-published research articles have demonstrated a new class of timing attacks (Meltdown and Spectre) that work on modern CPUs.  Our internal experiments confirm that it is possible to use similar techniques from Web content to read private information between different origins.  The full extent of this class of attack is still under investigation and we are working with security researchers and other browser vendors to fully understand the threat and fixes.  Since this new class of attacks involves measuring precise time intervals, as a partial, short-term, mitigation we are disabling or reducing the precision of several time sources in Firefox.  This includes both explicit sources, like performance.now(), and implicit sources that allow building high-resolution timers, viz., SharedArrayBuffer.

Specifically, in all release channels, starting with 57:

  • The resolution of performance.now() will be reduced to 20µs. (UPDATE: see the MDN documentation for performance.now for up-to-date precision information.)
  • The SharedArrayBuffer feature is being disabled by default.

Furthermore, other timing sources and time-fuzzing techniques are being worked on.

In the longer term, we have started experimenting with techniques to remove the information leak closer to the source, instead of just hiding the leak by disabling timers.  This project requires time to understand, implement and test, but might allow us to consider reenabling SharedArrayBuffer and the other high-resolution timers as these features provide important capabilities to the Web platform.

Update [January 4, 2018]: We have released the two timing-related mitigations described above with Firefox 57.0.4, Beta and Developers Edition 58.0b14, and Nightly 59.0a1 dated “2018-01-04” and later. Firefox 52 ESR does not support SharedArrayBuffer and is less at risk; the performance.now() mitigations will be included in the regularly scheduled Firefox 52.6 ESR release on January 23, 2018.

Blocking Top-Level Navigations to data URLs for Firefox 59

End users rely on the address bar of a web browser to identify what web page they are on. However, most end users are not aware of the concept of a data URL which can contain a legitimate address string making the end user believe they are browsing a particular web page. In reality, attacker provided data URLs can show disguised content tricking end users into providing their credentials. The fact that the majority of end users are not aware that data URLs can encode untrusted content makes them popular amongst scammers for spoofing and particularly for phishing attacks.

To mitigate the risk that Firefox users are tricked into phishing attacks by malicious actors encoding legitimate address strings in a data URL, Firefox 59 will prevent web pages from navigating the top-level window to a data URL and hence will prevent stealing an end user’s credentials. At the same time, Firefox will allow navigations to data URLs that truly result because of any end user action.

In more detail, the following cases will be blocked:

  • Web page navigating to a new top-level data URL document using:
    •  window.open(“data:…”);
    • window.location = “data:…”
    • clicking <a href=”data:…”> (including ctrl+click, ‘open-link-in-*’, etc).
  • Web page redirecting to a new top-level data URL document using:
    • 302 redirects to “data:…”
    • meta refresh to “data:…”
  • External applications (e.g., ThunderBird) opening a data URL in the browser

Whereas the following cases will be allowed:

  • User explicitly entering/pasting “data:…” into the address bar
  • Opening all plain text data files
  • Opening “data:image/*” in top-level window, unless it’s “data:image/svg+xml”
  • Opening “data:application/pdf” and “data:application/json”
  • Downloading a data: URL, e.g. ‘save-link-as’ of “data:…”

Starting with Firefox 59, web pages attempting to navigate the top-level window to a data URL will be blocked and the following message will be logged to the console:

For the Mozilla Security Team:
Christoph Kerschbaumer

November 2017 CA Communication

Mozilla has sent a CA Communication to inform Certificate Authorities (CAs) who have root certificates included in Mozilla’s program about Mozilla’s expectations regarding version 2.5 of Mozilla’s Root Store Policy, annual CA updates, and actions the CAs need to take. This CA Communication has been emailed to the Primary Point of Contact (POC) and an email alias for each CA in Mozilla’s program, and they have been asked to respond to the following 8 action items:

  1. Review version 2.5 of Mozilla’s Root Store Policy, and update the CA’s CP/CPS documents as needed to become fully compliant.
  2. Confirm understanding that non-technically-constrained intermediate certificates must be disclosed in the Common CA Database (CCADB) within one week of creation, and of new requirements for technical constraints on intermediate certificates issuing S/MIME certificates.
  3. Confirm understanding that annual updates (audits, CP, CPS, test websites) are to be provided via Audit Cases in the CCADB.
  4. Confirm understanding that audit statements that are not in English and do not contain all of the required information will be rejected by Mozilla, and may result in the CA’s root certificate(s) being removed from our program.
  5. Perform a BR Self Assessment and send it to Mozilla. This self assessment must cover the CA Hierarchies (and all of the corresponding CP/CPS documents) that chain up to their CA’s root certificates that are included in Mozilla’s root store and enabled for server authentication (Websites trust bit).
  6. Provide a tested email address for the CA’s Problem Reporting Mechanism.
  7. Follow new developments and effective dates for Certification Authority Authorization (CAA)
  8. Check issuance of certs to .tg domains between October 25 and November 11, 2017.

The full action items can be read here. Responses to the survey will be automatically and immediately published by the CCADB.

With this CA Communication, we re-iterate that participation in Mozilla’s CA Certificate Program is at our sole discretion, and we will take whatever steps are necessary to keep our users safe. Nevertheless, we believe that the best approach to safeguard that security is to work with CAs as partners, to foster open and frank communication, and to be diligent in looking for ways to improve.

Mozilla Security Team

Statement on DigiCert’s Proposed Purchase of Symantec’s CA

Mozilla’s Root Store Program has taken the position that trust is not automatically transferable between organizations. This is specifically stated in section 8 of our Root Store Policy v2.5, which details how Mozilla handles transfers of root certificates between organizations. Mozilla has taken an interest in such transfers, and there is the potential for trust adjustments based on the particular circumstances.

The CA DigiCert has announced that it is in negotiations to acquire the CA business of Symantec. This announcement was made following the decision of Mozilla and other root store programs to phase out trust in Symantec’s root certificates, based on a detailed investigation of their old and large CA hierarchies and their behaviour and practices over the past few years. There are no plans to change this phase-out of trust in the roots owned by Symantec.

While Mozilla does not intend to micro-manage any CA, the final arrangements for management and processes and infrastructure to be used by the combined company is of interest and potential concern to us. It would not be appropriate for a CA to escape root program sanction by restructuring, or by purchasing another CA through M&A and continuing operations under that CA’s name, essentially unchanged. And examination of historical corporate merger and acquisition activity, including deals involving Symantec, show that it’s possible for an M&A billed as the “purchase of B by A” to end up with name A and yet be mostly managed by the executives of B.

Representatives of DigiCert have sought guidance from us on the type of arrangements which would and would not cause us concern. In a good faith effort to answer that enquiry, we can make the following, non-exhaustive statements of what would cause Mozilla concern.

  • We would be concerned if the combined company continued to operate significant pieces of Symantec’s old infrastructure as part of their day-to-day issuance of publicly-trusted certificates.
  • We would be concerned if Symantec validation and operations personnel continued their roles without retraining in DigiCert methods and culture.
  • We would be concerned if Symantec processes appeared to displace DigiCert processes.
  • We would be concerned if the management of the combined company, particularly that part of it providing technical and policy direction and oversight of the PKI, were to appear as if Symantec were the controlling CA organization in the merger.

We hope that this provides useful guidance about our concerns, and note that our final opinion of the trustworthiness of the resulting entity will depend on the facts and behavior of the resulting organization. Mozilla reserves the right to include or exclude organizations or root certificates from our root store at our sole discretion. However, if the M&A activity  moves forward, we hope that the list above  will be helpful to DigiCert in planning for a future harmonious working relationship with the Mozilla Root Program.

Gervase Markham
Kathleen Wilson

MWoS: Improving ssh_scan Scalability and Feature Set

Editors Note: This is a guest post by Ashish Gaurav, Harsh Vardhan, and Rishabh Saxena

Maintaining a large number of servers and keeping them secure is a tough job! System administrators rely on tools like Puppet and Ansible to manage system configurations.  However, they often lack the means of independently testing these systems to ensure expectations match reality.

ssh_scan was created in an effort to provide a “simple to configure and use” tool that fills this gap for system administrators and security professionals seeking to validate their ssh configurations against a predefined policy. It aims to provide control over what policies and configurations you self-identify as important.

As CS undergraduates, we had the opportunity to participate in the 2016-2017 edition of Mozilla Winter of Security (MWoS), where we volunteered to improve the scalability and feature set of ssh_scan.

The goal of the project was to improve the existing scanner to make securing your ssh servers easier. It scans ssh servers by initiating a remote unauthenticated connection, enumerates all the attributes about the service and compares them against a user defined policy. For providing a sane baseline policy recommendation for SSH configuration parameters the Mozilla OpenSSH Security Guide was used.

Early Work

Before we started working on the project, ssh_scan was a simple command-line tool. It had limited fingerprinting support and no logging capability. We started with introducing some key features to improve the CLI tool, like adding logging, making it multi-threaded, and extending its dev-ops usability.  However, we really wanted to make the tool more accessible for everyone, so we decided to evolve ssh_scan into a web API.  As soon as the initial CLI tool was leveled up, we moved on to architecture planning for the web API.

The API

Since ssh_scan is written in Ruby, we looked at different Ruby web frameworks to implement the web API.  We finally settled on Sinatra, as it was a lightweight framework which gave us the power and flexibility to adapt and evolve quickly.

We started with providing a REST API around the existing command-line tool so that it could be integrated into the Mozilla Observatory as another module.  Because the Observatory receives a large number of scan requests per day, we had to make our API scale enough to keep pace with that high demand if it was ever to be enabled by default.

High-level Design Overview

Our high-level design evolved around a producer/consumer model. We also tried to keep things simple and modular so that it was easy to trade out or upgrade components as needed, using HTTPS as a transport where-ever possible. This flexibility was invaluable as we progressed throughout the project, learned where the bottlenecks were, and upgraded individual sub-components when they showed strain.

In our approach, a user makes a request to the API which is queued in the database as a state machine to track a scans progress throughout. The worker then polls the API for work, takes the work off the database queue, performs the scan and sends the scan results back to API server. The scan results are then stored in the database. As a starting point, an ssh_scan_api operator can have a single worker process running on the API/DB server.  As the workload requirements increases and queues build up, which we can monitor through our stats API route, we simply scale workers horizontally with Docker to pick up the additional load with relative ease without the need to disrupt other system components.

Challenges

Asynchronous job management was a totally new concept for us before we started this this project.  Because of this, it took us some time to settle on the components to efficiently handle our use-cases.  Fortunately, with the help of our mentors, we settled on implementing many things from scratch to start, which gave us a more detailed insight on the following:

  • How asynchronous API systems work
  • How to make it scale by identifying and removing the bottlenecks

As the end-to-end scan time depends mainly in completing the scan, we have achieved scalability with the help of multiple workers doing the scans in parallel.  Also to avoid API abuse, we provided authentication requirements around the API to prevent the abuse of essential functions.

Current Status of Project

We have already integrated ssh_scan_api as a supporting sub-module of the Mozilla Observatory and it is deployed as a beta here.  However, even as a beta service, we’ve already run over 4,000+ free scans for public SSH services, which is far more than we could have ever done with the single-threaded command-line version we started with.  We also expect usage to increase significantly as we raise awareness of this free tool.

Future Plans

We plan to do more performance testing of the API to continue to identify and plan for future scaling needs as demand presents itself. Outcomes of this effort might also include an even more robust work management strategy, as well as performance stressing the API.  The process continues to be iterative and we are solving challenges one step at a time.

Thanks Mozilla!

This project was a great opportunity to help Mozilla in building a more secure and open web and we believe we’ve done that. We’d like to give special thanks to claudijd, pwnbus and kang who supported us as mentors and helped guide us through the project.  Also, a very special thanks to April for doing all the front-end web development to add this as a submodule in the Observatory and helping make this real.

If you would like to contribute to ssh_scan or ssh_scan_api in any way, please reach out to us using GitHub issues on the following respective projects as we’d love your help:

Best,

Ashish Gaurav, Harsh Vardhan, and Rishabh Saxena

Treating data URLs as unique origins for Firefox 57

The data URL scheme provides a mechanism which allows web developers to inline small files directly in an HTML (or also CSS) document. The main benefit of data URLs is that they speed up page load time because the inlining of otherwise external resources reduces the number of HTTP requests a browser has to perform to load data.

Unfortunately, criminals also utilize data URLs to craft attack pages in an attempt to gather usernames, passwords and other confidential information from innocent users. Data URLs are particularly attractive to attackers because they allow them to mount attacks without requiring them to actually host a full website. Instead, scammers embed the entire attack code within the data URL, which previously inherited the security context of the embedding element. In turn, this inheritance model opened the door for Cross-Site-Scripting (XSS) attacks.

Rather than inheriting the origin of the settings object responsible for the navigation, data URLs will be treated as unique origins for Firefox 57. In other words, data URLs loaded inside an iframe are not same-origin with their parent document anymore.

Let’s consider the following example:

In Firefox version 56 and older, the script within the data URL iframe on line 13 was able to access objects from the embedding context because data URLs inherited the security context and hence were considered to be same-origin. In the specific example, the script within the data URL iframe was able to call the function foo() on line 8 which was defined by the including context and hence should be treated as a different security context.

Starting with Firefox 57, data URLs loaded inside an iframe will be considered cross-origin. Not only will that behavior mitigate the risk of XSS, it will also make Firefox standards compliant and consistent with the behavior of other browsers. In Firefox 57, an attempt to reach content from a different origin (like the one from line 13) will be blocked and the following message will be logged to the console:

Note that data URLs that do not end up creating a scripting environment, such as those found in img elements, will still be considered same-origin.

For the Mozilla Security Team:
Christoph Kerschbaumer, Ethan Tseng, Henry Chang & Yoshi Huang

Improving AES-GCM Performance

AES-GCM is a NIST standardised authenticated encryption algorithm (FIPS 800-38D). Since its standardisation in 2008 its usage increased to a point where it is the prevalent encryption used with TLS. With 88% it is by far the most widely used TLS cipher in Firefox.

Firefox telemetry on symmetric ciphers in TLS

Unfortunately the AES-GCM implementation used in Firefox (provided by NSS) until now did not take advantage of full hardware acceleration on all platforms; it used a slower software-only implementation on Mac, Linux 32-bit, or any device that doesn’t have all of the AVX, PCLMUL, and AES-NI hardware instructions. Based on hardware telemetry information, only 30% of Firefox 55 users get full hardware acceleration (as well as the resulting resistance to side channel analysis). In this post I describe how I made AES-GCM in NSS and thus Firefox 56 significantly faster, more side-channel resistant, and more energy efficient on most platforms using hardware support.

To evaluate  the actual impact on Firefox users, I tested the practical speed of our encryption by downloading a large file from a secure site using various hardware configurations:  Downloading a file on a mid-2015 MacBook Pro Retina with Firefox 55 spends 17% of its CPU usage in ssl3_AESGCM, the routine that performs the decryption. On a Windows laptop with an AMD C-70 (without the  AES-NI instruction) Firefox CPU usage is 60% and the download speed is capped at 3.5MB/s. This doesn’t seem to be only an academic issue: Particularly for battery-operated devices, the energy consumption difference would be noticeable.

Improving GCM performance

Speeding up the GCM multiplication function is the first obvious step to improve AES-GCM performance. A bug was opened on integration of the original AES-GCM code to provide an alternative to the textbook implementation of gcm_HashMult. This code is not only slow but has timing side channels as you can see in the following excerpt from the binary multiplication algorithm:

    for (ib = 1; ib < b_used; ib++) {
      b_i = *pb++;

      /* Inner product:  Digits of a */
      if (b_i)
        s_bmul_d_add(MP_DIGITS(a), a_used, b_i, MP_DIGITS(c) + ib);
      else
        MP_DIGIT(c, ib + a_used) = b_i;
    }

We can improve on two fronts here. First NSS should use the PCLMUL hardware instruction to speed up the ghash multiplication if possible. Second if PCLMUL is not available, NSS should use a fast constant-time implementation.

Bug 868948 has several attempts of speeding up the software implementation without introducing timing side-channels. Unfortunately the fastest code that was proposed uses table lookups and is therefore not constant-time (accessing memory locations in the same cache line still leaks timing information). Thanks to Thomas Pornin I re-implemented the binary multiplication in a way that doesn’t leak any timing information and is still faster than any other proposed C code (see Bug 868948 or openssl/boringssl for other software implementations). Check out Thomas’ excellent write-up for details.

If PCLMUL is available on the CPU, using it is the way to go. All modern compilers support intrinsics, which allow us to write “inline assembly” in C that runs on all platforms without having to write assembly code files. A hardware accelerated implementation of the ghash multiplication can be easily implemented with _mm_clmulepi64_si128.

On Mac and Linux the new 32-bit and 64-bit software ghash functions (faster and constant-time) are used on the respective platforms if PCLMUL or AVX is not available. Since Windows doesn’t support 128-bit integers (outside of registers) NSS falls back to the slower 32-bit ghash code – which is still more than 25% faster than the previous ghash implementation.

Improving AES performance

To speed up AES, NSS requires hardware acceleration on Mac as well as on Linux 32-bit and any machine that doesn’t support AVX (or has it disabled). When NSS can’t use the specialised AES code it falls back to a table-based implementation that is again not constant-time (in addition to being slow). There are currently no plans of rewriting the existing fallback code. AES is impossible to implement efficiently in software without introducing side channels. Implementing AES with intrinsics on the other hand is a breeze.

    m = _mm_xor_si128(m, cx->keySchedule[0]);
    for (i = 1; i < cx->Nr; ++i) {
      m = _mm_aesenc_si128(m, cx->keySchedule[i]);
    }
    m = _mm_aesenclast_si128(m, cx->keySchedule[cx->Nr]);
    _mm_storeu_si128((__m128i *)output, m);

Key expansion is a little bit more involved (for 192 and 256 bit), but is written in about 100 lines as well.

Mac sees the biggest improvement here. Previously, only Windows and 64-bit Linux used AES-NI, and now all desktop x86 and x64 platforms use it when available.

Looking at the numbers

To measure the performance gain of the new AES-GCM code I encrypted a 479MB file with a 128-bit key (the most widely used key size for AES-GCM). Note that these numbers are supposed to show a trend and heavily depend on the used machine and system load at the time.

Linux measurements are done on an Intel Core i7-4790, Windows measurements on a Surface Pro 2 with an Intel Core i5-4300U, and Mac mid 2015 Core i7-4980HQ. For all following graphs lower is better.

Linux 64 AES-GCM 128 encryption performance improvements

Linux 32 AES-GCM 128 encryption performance improvements

Performance of AES-GCM 128  on any 64-bit Linux machine without hardware support for the AES, PCLMUL, or AVX instructions is at least twice as fast now. If the AES and PCLMUL instructions are available, the new code only needs 33% of the time the old code took.

The speed-up for 32-bit Linux is more significant as it didn’t previously have any hardware accelerated code. With full hardware acceleration the new code is more than 5 times faster than before. Even in the worst case – when PCLMUL is not available – the speedup is still more than 50%.

The story is similar on Windows, although NSS already had fast code for 32-bit Windows users.

Windows 64 AES-GCM 128 encryption performance improvements

 

Windows 32 AES-GCM 128 encryption performance improvements

 

Performance improvements on Mac (64-bit only) range from 60% in the best case to 44% when AES-NI or PCLMUL is not available.

Mac OSX AES-GCM 128 encryption performance improvements

The numbers in Firefox

NSS 3.32 (Firefox 56) ships with the new accelerated AES-GCM code. It provides significantly reduced CPU usage for most TLS connections or higher download rates –  meaning better energy efficiency, too. NSS 3.32 is more intelligent in detecting the CPU’s capabilities and using hardware acceleration whenever possible. Assuming that all intrinsics and mathematical operations (other than division) are constant-time on the CPU, the new code doesn’t have any timing side-channels.

On the very basic laptop with the AMD C-70 download rates increased from ~3MB/s to ~6MB/s, and this is a device that has no hardware acceleration support.

To see the performance improvement we can look at the case where AVX is not available (which is the case for about 2/3 of the Firefox population). Assuming that at least AES-NI and PCLMUL is supported by the CPU we see the CPU usage drop from 15% to 3%.

AES_Decrypt CPU usage with NSS 3.31 without AVX hardware support

AES_Decrypt CPU usage with NSS 3.32 without AVX hardware support

The most immediate effect can be seen on Mac. AES_Decrypt NSS 3.31 used 9% CPU while in NSS 3.32 it uses only 4%.

AES_Decrypt CPU usage with NSS 3.31 on Mac OSX

AES_Decrypt CPU usage with NSS 3.32 on Mac OSX

The most significant performance improvements are summarise din the following table depicting the time in seconds to decrypt a ~500MB file with AES-GCM 128; lower is better.

Linux 32-bit Mac No AVX support
NSS 3.31 (Firefox 55) 20.3 11.5 21.3
NSS 3.32 (Firefox 56) 3.4 4.6 3.5

These improvements to AES-GCM in NSS make Firefox 56 significantly faster, more side-channel resistant, and more energy efficient on most platforms using hardware support.