Deploying tor relays

On November 11, 2014 Mozilla announced the Polaris Privacy Initiative.  One key part of the initiative is us supporting the Tor network by deploying Tor middle relay nodes.  On January 15, 2015 our first proof of concept (POC) went live.

TL;DR; here are our Tor relays: https://globe.torproject.org/#/search/query=mozilla

When we started this POC, the requirements we had were:

  • the Tor nodes should run on dedicated hardware
  • the nodes should be logically and physically separated from our production infrastructure
  • use low cost and commoditized hardware
  • nodes should be operational within 3 weeks

Hardware and Infrastructure

  • We chose to make use of our spare and decommissioned hardware.  That included a pair of Juniper EX4200 switches and three HP SL170zG6 (48GB ram, 2*Xeon L5640, 2*1Gbps NIC)
  • We dedicated one of our existing IP Transit providers to the project (2 X 10Gbps).

Actual tor physical infrastucture diagram

The current design is fully redundant.  This allows us to complete maintenance or have node failure without impacting 100% of traffic.  The worst case scenario is a 50% loss of capacity.

The design also allows us to easily add more servers in the event we need more capacity, with no anticipated impact.

Building and Learning

There is a large body of knowledge available on building Tor nodes.  I read mailing lists archives, blog posts, and tutorials. I had exchanges with people already running large relays.  There are still data points Mozilla needs to understand before our experiment is complete.  This section is a “quick run down” on some of those data points.

  • A single organization shouldn’t be running more than 10Gbps of traffic for a middle relay (and 5Gbps for an exit node).

This seems to be more of gut feeling from existing operators than a proven value (let me know if I’m wrong), but it makes sense.  We do have available transit and capacity. Understanding throughput and resource utilization is a key criteria for us.

Important Note: An operator running relays must use the “MyFamily” option in torrc.  This ensures a user doesn’t bounce through several of your servers.

  • Slow ramp up

A new Tor instance (identified by its private/public key pair) will take time (up to 2 months) to use all its available bandwidth. This is explained in this blog post: The lifecycle of a new relay. We will be updating our blog posts and are curious how closely our nodes mirror the lifecycle.

  • A Tor process (instance) can only push about 400Mbps.

This is based on mailing list discussions, as we haven’t reached that bandwidth yet. We run several instances per physical server.

  • A single public IP can only be shared by 2 Tor instances

This is a security feature to prevent a single person to run a ton of fake different nodes as explained in this research paper. This feature is documented in the Tor protocol specification.

  • Listen on well known ports like 80 or 443

This helps people behind strict firewall to access Tor. Don’t worry about running the process as root (needed to listen on ports < 1024), as long as you have the “User” option in torrc, Tor will drop the privileges after binding to the ports.

Automation

We decided to use Ansible for configuration management.  A few things motivated us to make that choice.

  • There was an existing ansible-tor role very close to what we needed to accomplish (and here is our pull request with our fixes and additions).
  • Some of our teams are using Ansible in production and we (Network Engineering) are considering it.
  • Ansible does not require a heavy client/server infrastructure which should make it more accessible to other operators.

And look! Mozilla’s Ansible configuration is available on GitHub!

Security

The security team helped us a lot along this project. Together we have put together a list of requirements, such as

  • strict firewall filtering
  • hardening the operating system (disable unneeded services, good SSH configuration, automatic updates)
  • hardening the network devices management plane
  • implementing edge filtering to make sure only authorized systems can connect to the “network management plane”

The only place for the infrastructure administration is the jumphost. Systems don’t accept management connection from anywhere else.

It is important to note, that many of the security requirements align nicely with what’s considered a good practices in general system and network administration. Take enabling NTP or centralized syslog for example – equally important for some services to run smoothly, for troubleshooting and for Incident Response. Similar concepts apply with the principle “make sure the network devices security is at least as good as system’s one”.

We’ve also implemented a periodic security check to be run on these systems. All of them are scanned from inside for security updates and outside for opened ports.

Metrics

One of the points we’re wondering are: how do we figure out if we’re running an efficient relay (in terms of cost, participation in the Tor network, hardware efficiency, etc). Which metrics to use and how to use them?

Looking around it seems like there is no “good answer”. We’re graphing everything we can about bandwidth and servers utilization using Observium. The Tor network already has a project to collect relays statistics called Tor metrics. Thanks to it, tools like Globe and others can exists.

Future

Note that we have just started them and they are far from running at their maximal bandwidth (for the reasons listed above). We will share more information down the road about performances and scaling.

Depending on the results of the POC,  we may move the nodes to a managed part of our infrastructure. As long as their private keys stay the same, their reputation will follow them wherever they go, no more ramp up period.

On a technical side there are a lot of possible things to do like adding IPv6 connectivity.  We’re reviewing opportunities to more parts of the deployment (like iptables, logs, etc…).

Links

Here are a few links that you might find interesting:

[blog] IPredator – building a Tor server
[mailing list] [tor-dev] Scaling tor for a global population

[mailing list] How to Run High Capacity Tor Relays
[wiki] tor – archwiki
[blog] Run A Tor-Relay On Ubuntu Trusty
[mailing list] [tor-relays] Someone broke the tor-relay speed record?
[tor website] Configuring a Tor relay on Debian/Ubuntu
[wiki] tor exit full setup

Thanks

Of course, none of that would have been possible without the help of Van, Michal (who wrote the part about security) and Opsec, Javaun, James, Moritz and the people of #tor!

31 responses

  1. Ben wrote on :

    Your first resources link is missing. This is what you meant to include:

    https://ipredator.se/guide/torserver

    1. Arzhel wrote on :

      Indeed, thanks!

  2. Daenney wrote on :

    “only unauthorized systems can connect to the “network management plane””

    I really hope you meant ‘authorised’ instead.

    1. Arzhel wrote on :

      Of course 🙂 Fixed.

    2. Popey Gilbert wrote on :

      Nice catch =)

  3. Ben Hearsum wrote on :

    This is very cool! One thing I’m curious about is why the nodes have so much RAM. Do they need that much to be an effective node, or is it just because that’s what you had?

    1. Arzhel wrote on :

      That’s because we decommissioned some servers with this amount of ram from another project and decided to use them there.
      It’s also convenient to be sure that the ram doesn’t become a bottleneck and to monitor more efficiently its usage.

  4. Wesley wrote on :

    “implementing edge filtering to make sure only unauthorized systems can connect to the “network management plane””

    I think you mean authorized systems.

    Great read btw

    1. Arzhel wrote on :

      Thanks a lot! Fixed.

  5. David wrote on :

    Great news! Any plans to run exit nodes in the future?

    1. Arzhel wrote on :

      There are a lot more legal involvement for exit nodes. I’m only on the technical side here.

  6. Nah Vik wrote on :

    Is it possible to turn individuals’ browsers into Tor relays without dedicated hardware?

    1. Arzhel wrote on :

      The 2 most important things for a tor relay is bandwidth and stability.
      It’s not an issue to run a relay from a home connection if you have at least 250kB/s both ways ( from https://www.torproject.org/docs/faq.html.en#RunningATorRelay )
      But if you run it the same way as you use a browser, that means it might get shutdown quite often and thus cutting connections of users going through your node.
      Also I think the tor relay process is a standalone tool separated from the browser itself.

      1. Nah Vik wrote on :

        Thanks for the input Arzhel!

  7. quadhead wrote on :

    Great move by Mozilla!

    It might reduce your guard probability when you have so many relays in one datacentre though (AS2828). I don’t know where the exact limits are…

    1. Arzhel wrote on :

      By default all the relays in the same IPv4 /16 are considered as being part of the same “Family”. This is to make sure a user doesn’t go through 2 nodes operated by the same company.
      I haven’t found any documentation saying that it might reduce our guard probability. Please let me know if you have any pointer.
      Right now only the oldest process (~1week older than the others) has the guard flag. We will see how it evolves.

  8. Tom Schilling wrote on :

    I am using firefox as the only browser but the above is quite choinese for an olöd bucket of 69. So pls advice if you will insert or use a add-on with your TOR system.
    In case of yes, pls inform with the normal way of update.

    Many thanks and waiting for your system

    Kind regards

    Tom Schilling

    1. Arzhel wrote on :

      This blogpost doesn’t impact how the normal Firefox works. If you want a browser with Tor included, you can use the Tor browser ( https://www.torproject.org/projects/torbrowser.html.en )

  9. Pamputt wrote on :

    Thanks a lot to provide such bandwidth for the Tor network. However, from the webpage of one of the server (https://globe.torproject.org/#/relay/95AC12EEFD2F89DBE4185E6B5B29ED0CAA5FFFE2), it seems that the server are not exit nodes. It is really a pity. Do you plan to change this policy?

    Another stuff, why did you place all your servers in the US? From the Snowden’s revelations, we know that the NSA try to break the Tor network and having all this bandwidth in the US may help them to analyze a part of the Tor traffic even if the content is encrypted.

    Anyway, it is really a good news. Continue like that 😀

    1. Arzhel wrote on :

      This infra is a proof of concept made to test Tor relays. If everything is successful, we will move them to a more final location.

  10. ffsfgfsgsfsfgdfsfsffsg wrote on :

    That’s good news! I guess Mozilla did provide only middle nodes in order to avoid legal trouble? But anyway more bandwidth is still useful on all levels of the network.

    >A Tor process (instance) can only push about 400Mbps.

    According to websites displaying the Tor directory ( https://collector.torproject.org/#references ), the fastest Tor exit node, “IPredator”, has a measured bandwidth of 888 Mb/s (= 111 MB/s) on one instance.

    1. Arzhel wrote on :

      Indeed, they also have a pretty powerful configuration:
      https://ipredator.se/guide/torserver
      Usually the bottleneck is the CPU and they seem to have done well tuning their server.

  11. Sebastian Urbach wrote on :

    Hi,

    The rampup time is 3 month not 2, at leastvwith the actual version.

  12. Krat wrote on :

    This is fantastic news. Thank you _so_ much for this, Mozilla. You are setting an example for other tech companies: let it be known that the sheeple need this, even though they don’t realize it. The mass media ignores the substance of the revelations; so the sheeple aren’t aware. But for the technocrats and lovers of freedom and Western Democratic values, this is revolutionary and needed.

    It must, to protect our democratic values, be made technically and practically infeasible for our new overlords to continue to violate what Americans enshrined as imperative values in their 1st, 3rd, 4th, 5th Amendments in the Bill of Rights.

  13. Gergely Imreh wrote on :

    The first link at the end of the article, “[blog] IPredator – building a Tor server” should point to https://ipredator.se/guide/torserver

    Cheers!

  14. Alexandru wrote on :

    “A Tor process (instance) can only push about 400Mbps.”
    Why is there a limit to only 400mbps when you said that the network infrastructure has 2x10gbps ?
    Can someone provide more details about this please?

  15. fkenme wrote on :

    Thank you mozilla. This is why mozilla always has and will have my support. Their commitment to privacy.

  16. chris wrote on :

    Great to hear your doing this. I’ve been a big advocate of Tor for years and have setup my companies servers to also donate otherwise unutilized bandwidth to the Tor network. It would be great if more entities of Mozilla’s nature did this. Unfortunately it does seem like a lot of potentially good candidates fear government push back for running as an exit node-or even just a relay. That should tell us just how free the “free world” really is.

    I think what people need to consider is that even if they feel can’t operate an exit node due to government oppression that even just a relay can have a major impact. I think it is also important to point out that while Tor is primarily geared toward providing access to the larger www it’s the onion nodes that are most important for those resisting oppressive governments. Both in the west and elsewhere. These governments seemingly have the upper hand right now, but that could theoretically change. Hopefully we’ll have the equivalent of the Tails project in a few years for servers and by that time improved the onion code such that its not so easily attacked.

  17. Christian Dietrich wrote on :

    You will not be able push nearly 400 Mbit per Tor process, not with an 5 year old CPU. I were able to push up to 800mbit with 2 Tor processes with an Intel i5-3570K which offers around 44% more single core performance according to passmark.

  18. Tor not tor wrote on :

    Great work, thanks!

    However, the fact you cannot seem to spell “Tor” correctly is pretty worrisome… (Can you please fix your blog post?)

  19. Hugo wrote on :

    Well done Mozilla.