Date: Sun, 16 Jul 2017 01:00:40 -0400 From: Roger Dingledine Subject: Re: [tor-research-safety] Request for feedback on measuring popularity of Facebook onion site front page Here are some thoughts that are hopefully useful. I encourage other safety board people to jump in if they have responses or other perspectives. A) Here's an attack that your published data could enable. Let's say there is another page somewhere in onion land that looks just like the Facebook frontpage, from your classifier's perspective. In that case you're going to be counting, and publishing, what you think are Facebook frontpage visits, but if somebody knows the ground truth for the Facebook visits (and at least Facebook does), then they can subtract out the ground truth and learn the popularity of that other page. Counterintuitively, the more "colliding" pages there are, or rather, the more colliding pages there are that are sufficiently popular, the less scary things get, since you're publishing popularity of "Facebook + all the others that look like Facebook", and if that second part of the number is a broad variety of pages, it's not so scary to publish it. I guess you can look through your training traces to see if there are traces that look very similar, to get a handle on whether there are zero or some or many. But even if you find zero, (1) are your training traces just the front pages of other onion sites? In that case you'll be missing many internal pages, and maybe there is a colliding internal page. And (2) you don't have a comprehensive onion list (whatever that even means), so you can't assess closeness for the pages you don't know about. And (3) remember dynamic pages -- like a duckduckgo search for a particular sensitive word that you didn't think to search for. (I don't think that particular example will produce a collision with the Facebook frontpage, but maybe there's some similar example that would.) So, principle 1: publishing a sum of popularity of a small number of sites is inherently more revealing than publishing the sum of popularity of a broader set of sites, because the small number is more precise. And principle 1b: if an external party has a popularity count for a subset of your sum, they can get more precision than you originally intended. Unless the differential privacy that you talked about handles this case? I have been assuming it focuses on making it hard to reconstruct exactly how many hits a given relay saw, but maybe it does more magic than that? B) You mention picking Facebook in particular, and I assume you'll be naming them in your paper. Have you asked them if they're ok with this? Getting consent when possible would go a long way to making your approach as safe as it can be. I can imagine that Facebook would say it's ok, while I could imagine that a particular SecureDrop deployment might ask you to please not do it. In particular, two good contacts for Facebook would be Alec Muffett and . The service side is of course only half of the equation: in an ideal world it would be best to get consent from all the clients too. But since your experiment's approach aggregates all the clients, yet singles out the service, I think it's much more important to think about consent from the service side for this case. So, principle 2: the more you're going to single out and then name a particular entity, the more important it is for you to get consent from that entity before doing so. C) In fact, it would probably be good in your paper to specify *why* the safety board thought that doing this measurement was ok -- that you got consent and that's why you were comfortable naming them and publishing a measurement just for them. I think mentioning it in the paper is important because if this general "popularity measurement" attack works, I can totally imagine somebody wanting to do a follow-on paper measuring individual popularity for a big pile of other onion services, first because it would seem cool to do it in bulk (the exact opposite of the reason why you decided not to do it in bulk), and second because eventually people will realize that measuring popularity is a key stepping stone to building a bayesian prior, which could make website fingerprinting attacks work better than the default "assume a uniform prior" that I guess a lot of them do now. So, principle 3: explicitly say in your paper where your lines-you-didn't-want-to-cross are, so readers can know which follow-on activities are things you wouldn't have wanted to do (and so future PCs reviewing future follow-on papers have something concrete to point to when they're expressing concern about methodology). D) I actually expect your Facebook popularity measurement to show that it's not very popular right now. That's because, at least last I checked, all of the automated apps and stuff use facebook.com as their destination. The Facebook folks have talked about putting the onion address by default in their various mobile apps if it detects that Orbot is running, but as far as I know they haven't rolled that out yet. So (1) doing a measurement now will allow you to do another measurement later once Facebook has made some changes, and you'll have a baseline for comparison; and (2) if you coordinate better with the Facebook people, you can learn the state and expected timing for their "onion address by default" roll-out -- or heck, you might learn other confounding factors that Facebook can explain for you. E) If you're planning to use PrivCount, does that mean you are running more than one relay to do this measurement? I think yes because "and more than 10 data collectors"? For the last person who asked us about running many relays for doing measurements, we suggested that they label their relays in the ContactInfo section, and put up a little page explaining what their research is and why it's useful (I think the text you sent us would do fine). Here is their page for an example: http://tor.ccs.neu.edu/ I think that step would be wise here too. Hope those are helpful! Let me know if you have any questions. --Roger