Last Thursday, the US Senate voted to renew the USA Freedom Act which authorizes a variety of forms of national surveillance. As has been reported, this renewal does not include an amendment offered by Sen. Ron Wyden and Sen. Steve Daines that would have explicitly prohibited the warrantless collection of Web browsing history. The legislation is now being considered by the House of Representatives and today Mozilla and a number of other technology companies sent a letter urging them to adopt the Wyden-Daines language in their version of the bill. This post helps fill in the technical background of what all this means.
Despite what you might think from the term “browsing history,” we’re not talking about browsing data stored on your computer. Web browsers like Firefox store, on your computer, a list of the places you’ve gone so that you can go back and find things and to help provide better suggestions when you type stuff in the awesomebar. That’s how it is that you can type ‘f’ in the awesomebar and it might suggest you go to Facebook.
Browsers also store a pile of other information on your computer, like cookies, passwords, cached files, etc. that help improve your browsing experience and all of this can be used to infer where you have been. This information obviously has privacy implications if you share a computer or if someone gets access to your computer, and most browsers provide some sort of mode that lets you surf without storing history (Firefox calls this Private Browsing). Anyway, while this information can be accessed by law enforcement if they have access to your computer, it’s generally subject to the same conditions as other data on your computer and those conditions aren’t the topic at hand.
In this context, what “web browsing history” refers to is data which is stored outside your computer by third parties. It turns out there is quite a lot of this kind of data, generally falling into four broad categories:
- Telecommunications metadata. Typically, as you browse the Internet, your Internet Service Provider (ISP) learns every website you visit. This information leaks via a variety of channels (DNS lookups), the IP address of sites, TLS Server Name Indication (SNI), and then ISPs have various policies for how much of this data they log and for how long. Now that most sites have TLS Encryption this data generally will be just the name of the Web site you are going to, but not what pages you go to on the site. For instance, if you go to WebMD, the ISP won’t know what page you went to, they just know that you went to WebMD.
- Web Tracking Data. As is increasingly well known, a giant network of third party trackers follows you around the Internet. What these trackers are doing is building up a profile of your browsing history so that they can monetize it in various ways. This data often includes the exact pages that you visit and will be tied to your IP address and other potentially identifying information.
- Web Site Data. Any Web site that you go to is very likely to keep extensive logs of everything you do on the site, including what pages you visit and what links you click. They may also record what outgoing links you click. For instance, when you do searches, many search engines record not just the search terms, but what links you click on, even when they go to other sites. In addition, many sites include various third party analytics systems which themselves may record your browsing history or even make a recording of your behavior on the site, including keystrokes, mouse movements, etc. so it can be replayed later.
- Browser Sync Data. Although the browsing history stored on your computer may not be directly accessible, many browsers offer a “sync” feature which lets you share history, bookmarks, passwords, etc. between browser instances (such as between your phone and your laptop). This information has to be stored on a server somewhere and so is potentially accessible. By default, Firefox encrypts this data by default, but in some other browsers you need to enable that feature yourself.
So there’s a huge amount of very detailed data about people’s browsing behavior sitting out there on various servers on the Internet. Because this is such sensitive information, in Mozilla’s products we try to minimize how much of it is collected with features such as encrypted sync (see above) or enhanced tracking protection. However, even so there is still far too much data about user browsing behavior being collected and stored by a variety of parties.
This information isn’t being collected for law enforcement purposes but rather for a variety of product and commercial reasons. However, the fact that it exists and is being stored means that it is accessible to law enforcement if they follow the right process; the question at hand here is what that process actually is, and specifically in the US what data requires a warrant to access — demanding a showing of ‘probable cause’ plus a lot of procedural safeguards — and what can be accessed with a more lightweight procedure. A more detailed treatment of this topic can be found in this Lawfare piece by Margaret Taylor, but at a high level, the question turns on whether data is viewed as content or metadata, with content generally requiring a more heavyweight process and a higher level of evidence.
Unfortunately, historically the line between content and metadata hasn’t been incredibly clear in the US courts. In some cases the sites you visit (e.g., www.webmd.com) are treated as metadata, in which case that data would not require a warrant. By contrast, the exact page you went to on WebMD would be content and would require a warrant. However, the sites themselves reveal a huge amount of information about you. Consider, for instance, the implications of having Ashley Madison or Stormfront in your browsing history. The Wyden-Daines amendment would have resolved that ambiguity in favor of requiring a warrant for all Web browsing history and search history. If the House reauthorizes USA Freedom without this language, we will be left with this somewhat uncertain situation but one where in practice much of people’s activity on the Internet — including activity which they would rather keep secret — may be subject to surveillance without a warrant.