Privacy analysis of FLoC
In a previous post, I wrote about a new set of technologies “Privacy Preserving Advertising”, which are intended to allow for advertising without compromising privacy. This post discusses one of those proposals–Federated Learning of Cohorts (FLoC)–which Chrome is currently testing. The idea behind FLoC is to make it possible to target ads based on the interests of users without revealing their browsing history to advertisers. We have conducted a detailed analysis of FLoC privacy. This post provides a summary of our findings.
In the current web, trackers (and hence advertisers) associate a cookie with each user. Whenever a user visits a website that has an embedded tracker, the tracker gets the cookie and can thus build up a list of the sites that a user visits. Advertisers can use the information gained from tracking browsing history to target ads that are potentially relevant to a given user’s interests. The obvious problem here is that it involves advertisers learning everywhere you go.
FLoC replaces this cookie with a new “cohort” identifier which represents not a single user but a group of users with similar interests. Advertisers can then build a list of the sites that all the users in a cohort visit, but not the history of any individual user. If the interests of users in a cohort are truly similar, this cohort identifier can be used for ad targeting. Google has run an experiment with FLoC; from that they’ve stated that FLoC provides 95% of the per-dollar conversion rate when compared to interest-based ad targeting using tracking cookies.
Our analysis shows several privacy issues that we believe need to be addressed:
Cohort IDs can be used for tracking
Although any given cohort is going to be relatively large (the exact size is still under discussion, but these groups will probably consist of thousands of users), that doesn’t mean that they cannot be used for tracking. Because only a few thousand people will share a given cohort ID, if trackers have any significant amount of additional information, they can narrow down the set of users very quickly. There are a number of possible ways this could happen:
Not all browsers are the same. For instance, some people use Chrome and some use Firefox; some people are on Windows and others are on Mac; some people speak English and others speak French. Each piece of user-specific variation can be used to distinguish between users. When combined with a FLoC cohort that only has a few thousand users, a relatively small amount of information is required to identify an individual person or at least narrow the FLoC cohort down to a few people. Let’s give an example using some numbers that are plausible. Imagine you have a fingerprinting technique which divides people up into about 8000 groups (each group here is somewhat bigger than a ZIP code). This isn’t enough to identify people individually, but if it’s combined with FLoC using cohort sizes of about 10000, then the number of people in each fingerprinting group/FLoC cohort pair is going to be very small, potentially as small as one. Though there might be larger groups that can’t be identified this way, that is not the same as having a system that is free from individual targeting.
People’s interests aren’t constant and neither are their FLoC IDs. Currently, FLoC IDs seem to be recomputed every week or so. This means that if a tracker is able to use other information to link up user visits over time, they can use the combination of FLoC IDs in week 1, week 2, etc. to distinguish individual users. This is a particular concern because it works even with modern anti-tracking mechanisms such as Firefox’s Total Cookie Protection (TCP). TCP is intended to prevent trackers from correlating visits across sites but not multiple visits to one site. FLoC restores cross-site tracking even if users have TCP enabled.
FLoC leaks more information than you want
FLoC undermines these more restrictive cookie policies: because FLoC IDs are the same across all sites, they become a shared key to which trackers can associate data from external sources. For example, it’s possible for a tracker with a significant amount of first-party interest data to operate a service which just answers questions about the interests of a given FLoC ID. E.g., “Do people who have this cohort ID like cars?”. All a site needs to do is call the FLoC APIs to get the cohort ID and then use it to look up information in the service. In addition, the ID can be combined with fingerprinting data to ask “Do people who live in France, have Macs, run Firefox, and have this ID like cars?” The end result here is that any site will be able to learn a lot about you with far less effort than they would need to expend today.
FLoC’s countermeasures are insufficient
Google has proposed several mechanisms to address these issues.
First, sites have the option of whether or not to participate in FLoC. In the current experiment that Chrome is conducting, sites are included in the FLoC computation if they do ads-type stuff, either “load ads-related resources” or call the FLoC APIs. It’s not clear what the eventual inclusion criteria are, but it seems likely that any site which includes advertising will be included in the computation by default. Sites can also opt-out of FLoC entirely using the Permissions-Policy HTTP header but it seems likely that many sites will not do so.
Second, Google itself will suppress FLoC cohorts which it thinks are too closely correlated with “sensitive” topics. Google provides the details in this whitepaper, but the basic idea is that they will look to see if the users in a given cohort are significantly more likely to visit a set of sites associated with sensitive categories, and if so they will just return an empty cohort ID for that cohort. Similarly, they say they will remove sites which they think are sensitive from the FLoC computation. These defenses seem like they are going to be very difficult to execute in practice for several reasons: (1) the list of sensitive categories may be incomplete or people may not agree on what categories are sensitive, (2) there may be other sites which correlate to sensitive sites but are not themselves sensitive, and (3) clever trackers may be able to learn sensitive information despite these controls. For instance: it might be the case that English-speaking users with FLoC ID X are no more likely to visit sensitive site type A, but French-speaking users are.
While these mitigations seem useful, they seem to mostly be improvements at the margins, and don’t address the basic issues described above, which we believe require further study by the community.
FLoC is premised on a compelling idea: enable ad targeting without exposing users to risk. But the current design has a number of privacy properties that could create significant risks if it were to be widely deployed in its current form. It is possible that these properties can be fixed or mitigated — we suggest a number of potential avenues in our analysis — further work on FLoC should be focused on addressing these issues.
For more on this:
Building a more privacy-preserving ads-based ecosystem
Mozilla responds to the UK CMA consultation on google’s commitments on the Chrome Privacy Sandbox
Privacy analysis of SWAN.community and Unified ID 2.0