HTTPS, Mixed Content, and the real web… oh my!

This post was written by Pomax, a software engineer at the Mozilla Foundation
We recently fixed something around Mozilla’s X-Ray Goggles. A long running problem that caused people headaches and the feeling of lost work, while at the same time doing nothing “wrong”, from a technical perspective. This is going to be a story about how modern browsers work, how people use the web, and how those two things… don’t always align.

X-Ray Goggles by Mozilla

So let’s start with X-Ray Goggles: the X-Ray Goggles are a tool made by Mozilla that lets you “remix” web pages after loading them in your browser. You can go to your favourite place on the web, fire up the goggles (similar to how a professional web developer would open up their dev tools), and then change text, styling, images, and whatever else you might want to change, for as long as you want to change things, and then when you’re happy with the result and you want to show your remix to your friends, you can publish that remix so that it has its own URL that you can share.
However, the X-Ray Goggles use a publishing service that hosts all its content over https, because we care about secure communication at Mozilla, and using https is best practice. But in this particular case, it’s also kind of bad: large parts of the web still use http, and even if a website has an https equivalent, people usually visit the http version anyway. Unless those websites force users to the https version of the site (using a redirect message), then site they’ll be on, and the site they’ll be remixing, will use HTTP, and the moment the user publishes their remix with X-Ray Goggles, and they get an https URL back, and they open that URL in their browser….
well, let’s just say “everything looks broken” is not wrong.
But the reason for this is not because Goggles, or even the browser is doing something wrong – ironically, it’s because they’re doing something right, and in so doing, what the user wants to do turns out incompatible with what the technology wants them to do. So let’s look at what’s going on here.

HTTP, the basis upon which browsing is built

If you’re a user of the web, no doubt you’ll have heard about http and https, even if you can’t really say what they technically-precisely mean. In simple terms (but without dumbing it down), HTTP is the language that servers and browsers use to negotiate data transfers. The original intention was for those two to talk about HTML code, so that’s where the h in http comes from (it stands for “hypertext” in both http and html), but we’re mostly ignoring that these days, and HTTP is used by browsers and servers to negotiate transmission of all sorts of files – web pages, stylesheets, javascript source code, raw data, music, video, images, you name it.
However, HTTP is a bit like regular English: you can listen in on it. If you go to a bar and sit yourself with a group of people, you can listen to their conversations. The same goes for HTTP: in order for your browser and the server to talk they rely on a chain of other computers connected to the internet to get messages relayed from one ot the other, and any of those computers can listen in on what the browser and server are saying to each other. In an HTTP setting it gets a little stranger even, because any of those computers could look at what the browser or server are saying, replace what is being said with something else and then forward that on. And you’ll have no way of knowing whether that’s what happened. It’s literally as if the postal service took a letter you sent, opened it, rewrote it, resealed it, and then sent that on. We trust that they won’t, and computers connected to the internet trust that other computers don’t mess with the communication, but… they can. And sometimes they do.
And that’s pretty scary, actually. You don’t want to have to “trust” that your communication isn’t read or tampered with, you want to know that’s the case.

What can we do to fix that?

Well, we can use HTTPS, or “secure HTTP”, instead. Now, I need to be very clear here: the term “secure” in “secure HTTP” refers to secure communication. Rather than talking “in English”, the browser and server agree on a secret language that you could listen to, but you won’t know what’s being said, and so you can’t intercept-and-modify the communication willy-nilly without both parties knowing that their communications are being tampered with. However it does not mean that the data the browser and server agree to receive or send is “safe data”. It only means that both parties can be sure that what one of them receives is what the other intended to send. All we can be sure of is that no one will have been able to see what got sent, and that no one modified it somewhere along the way without us knowing.
However, those are big certainties, so for this reason the internet’s been moving more and more towards preferring HTTPS for everything. But not everyone’s using HTTPS yet, and so we run into something called the “Mixed Content” issue.

Let’s look at an example.

Imagine I run a web page, much like this one, and I run it on HTTP because I am not aware of the security issues, and my page relies on some external images, and some JavaScript for easy navigation, and maybe an embedded podcast audio file. All of those things are linked as http://......, and everything worked fine.
But then I hear about the problems with HTTP and the privacy and security implications sound horrible! So, to make sure my visitors don’t have to worry about whether the page they get from my server is my page, or a modified version of my page, I spring into action, I switch my page over to HTTPS; I get a security certificate, I set everything on my own server up so that it can “talk” in HTTPS, and done!
Except immediately after switching, my web page is completely broken! The page itself loads, but none of the images show up, and the JavaScript doesn’t seem to be working, and that podcast embed is gone! What happened??
This is a classic case of mixed-content blocking. My web page is being served on HTTPS, so it’s indicating that it wants to make sure everything is secure, but the resources I rely on still use HTTP, and now the browser has a problem: it can’t trust those resources, because it can’t trust that they won’t have been inspected or even modified when it requests them, and because the web page that’s asking them to be loaded expressed that it cares about secure communication a great deal, the browser can’t just fetch those insecure elements, things might go wrong, and there’s no way to tell!
So it does the only thing it knows is safe: better safe than sorry, and it flat out refuses to even request them, giving you a warning about “mixed content”.
Normally, that’s great. It lets people who run websites know that they’re relying on potentially insecure third party content in an undeniably clear way, but it gets a bit tricky in two situations:

  1. third party resources that themselves require other third party resources, and
  2. embedding and rehosting

The first is things like your web page using a comment thread service: your web page includes a bit of JavaScript from something like www.WeDoCommentsForYou.com and then that JavaScript then loads content from that site’s comment database, for instance comments.WeDoCommentsForYou.com. If we have a page that uses HTTPS, running on https://ourpage.org then we can certainly make sure that we load the comment system from https://www.WeDoCommentsForYou.com, but we don’t control the protocol for the URL that the JavaScript we got back uses. If “WeDoCommentsForYou” wrote their script poorly, and they try to load their comments over http://, then too bad, the browser will block that. Sure, it’s a thing that “WeDoCommentsForYou” should fix, but until they do your users can’t comment, and that’s super annoying.
The second issue is kind of like the first, but is about entire web pages. Say you want to embed a page; for instance, you’re transcluding an entire wiki page into another wiki page. If the page you’re embedding is http and the page it’s embedded on is https, too bad, that’s not going to work. Or, and that brings us to what I really want to talk about, if you remix a page on http, with http resources, and host that remix on a site that uses https, then that’s not going to work either…

Back to the X-Ray Goggles

And that’s the problem we were hitting with X-Ray Goggles, too.
While the browser is doing the same kind of user protection that it does for any other website, in this particular case it’s actually a big problem: if a user remixed an HTTP website, then knowing what we know now, obviously that’s not going to work if we try to view it using HTTPS. But that also means that instead of a cool tool that people can use to start learning about how web pages work “on the inside”, the result of which they can share with their friends, they have a tool that lets them look at the insides of a web page and then when they try to share their learning, everything breaks.
That’s not cool.
And so the solution to this problem is based on first meeting the expectations of people, and then educating them on what those expectations actually mean.

Give me HTTPS, unless I started on HTTP

There are quite a few solutions to the mixed-content problem, and some are better than others. There are some that are downright not nice to other people on the web (like making a full copy of someone’s website and then hosting that on Mozilla’s servers. That’s not okay), or may open people up exploits (like running a proxy server, which runs on HTTPS and can fetch HTTP resources, then send them on as if they were on HTTPS, effectively lying about the security of the communication), so the solution we settled on is, really, the simplest one:
If you remix an http://... website, we will give you a URL that starts with http://, and if you remix an https:// website, we will give you a URL that starts with https://.... However, we also want you to understand what’s going on with the whole “http vs https” thing, so when you visit a remix that starts with http:// the remix notice bar at the top of the page also contains a link to the https:// version –same page, just served using HTTPS instead of HTTP– so that you can see exactly how bad things get if you can’t control which protocol gets used for resources on a page.

Security vs Usability

Security is everybody’s responsibility, and explaining the risks on the web that are inherent to the technology we use every day is always worth doing. But that doesn’t mean we need to lock everything down so “you can’t use it, the end, go home, stop using HTTP”. That’s not how the real world works.
So we want you to be able to remix your favourite sites, even if they’re HTTP, and have a learning/teaching opportunity there around security. Yes, things will look bad when you try to load an HTTP site on HTTPS, but there’s a reason for that, and it’s important to talk about it.
And it’s equally important to talk about it without making you lose an hour or more of working on your awesome remix.