Date: Thu, 18 Aug 2016 10:47:45 -0700
From: David Fifield <david@bamsoftware.com>
Subject: Tor Research Safety Board: default bridge reachability

We're seeking comments on a continuation of our research on the blocking
of default Tor Browser bridges. What we've done so far on this subject
is covered in our FOCI 2016 paper, "Censors' Delay in Blocking
Circumvention Proxies":
https://www.bamsoftware.com/proxy-probe/

The short summary of what we want to do is to greatly expand our
measurement locations, by using existing platforms such as ICLab, OONI,
or RIPE Atlas. We want to start doing traceroutes in addition to TCP
reachability. We want to control how new bridges are introduced, in
order to test specific hypotheses, such as whether there is a difference
in detection between stable and alpha.


== What are you trying to learn, and why is that useful for the world?
   That is, what are the hoped-for benefits of your experiment?

1. Where the default bridges are blocked, globally. We know that China
   (eventually) blocks them, and Iran (currently) does not; but we don't
   know the situation anywhere else.
2. In places where the default bridges get blocked, the dynamics of
   blocking, such as how long it takes, its granularity (IP only or
   IP/port), and whether blocks are eventually removed.
3. How bridge addresses are discovered (e.g. through traffic analysis,
   tickets, or source code), and how they are extracted (e.g. manually
   or through automated parsing).

The overarching, abstract benefit of the experiment is a better
understanding of censorship, leading to the development of better
informed circumvention.

The latest bridge users' guide
(https://blog.torproject.org/blog/breaking-through-censorship-barriers-even-when-tor-blocked)
recommends using meek to users in China, because obfs4 is blocked. This
research would let us know whether to expand that advice beyond China.

By comparing reachability timelines across many censors, we may find
evidence for or against censors sharing a common data source. For
example, if two countries block a set of bridges at the same moment, it
is probably because there is something in common in their detection.

We may uncover specific operational weaknesses of censors that can be
exploited. To choose an invented but plausible scenario, maybe a censor
only does black-box testing of new bundles on the day of release: in
that case, the browser could avoid connecting to a subset of bridges
until after a certain date.

If we are able to reachability publish data online on a frequently
updated basis, someone could use it to build a Weather-like service that
notifies operators of default bridges when their bridge stops running.
This happened a few times already: some of the default bridges stopped
running because of lost iptables rules after a reboot, and we were the
first to notice, only because we were looking at the graphs every once
in a while. (This would not always be possible using only Collector
data, because for example the bridge might be running, but its obfs4
port closed because of a firewall misconfiguration.)


== What exactly is your plan? That is, what are the steps of your
   experiment, what will you collect, how will you keep it safe, and so
   on.

So far, we have only run from a handful of VPSes, never more than 4 at a
time. We only had visibility into the U.S., China, and Iran. We
carefully watched for the introduction of new obfs4 bridges (in some
cases being privately informed in advance), and added them to a probe
list, which got probed every 20 minutes by a cron job on the VPSes.

We want to greatly expand our probe sites, by using existing measurement
platforms such as ICLab, OONI, or RIPE Atlas. We hope to be able to
measure from dozens or hundreds of diverse locations. We have already
talked to ICLab and they are willing to probe our destinations from
their endpoints, which mostly consist of commercial VPNs in various
countries. The probes will consist of periodic TCP connections to Tor
Browser default obfs4 bridges (released and not-yet-released) and
control destinations. We want to start doing traceroutes as well.

We expect that the TCP reachability data we collect will be similar to
what we have collected so far. It looks like this:
	date,site,host,port,elapsed,success,errno,errmsg
	1449892115.2,bauxite,178.209.52.110,443,10.0101830959,False,None,timed out
	1449901202.36,eecs-login,192.30.252.130,443,0.0761489868164,True,,
	1450858800.18,eecs-login,109.105.109.165,24215,0.189998865128,False,146,[Errno 146] Connection refused
For traceroutes we will collect hop information (perhaps with some hops
obscured; see the risks in the next section). We expect to be able to
publish everything we collect in an immediate and ongoing basis.

We also want to test some specific hypotheses by controlling the
circumstances of bridge release. Here are specific experiments we have
thought of (see corresponding risks in the next section):
a. Rotating bridge ports with every release. Since the GFW blocks based
   on IP/port, we can try just changing the port number of each bridge
   in every release (using iptables forwarding for example).
b. Putting different subsets of bridges in stable and alpha releases. We
   saw that Orbot-only bridges did not get blocked; we wonder if
   stable-only or alpha-only bridges also will not get blocked.
c. Leaving a bridge commented out in bridge_prefs.js. This may help us
   distinguish between black-box testing and manual source code review.


== What attacks or risks might be introduced or assisted because of your
   actions or your data sets, and how well do you resolve each of them?

The main risk is potentially enabling censors to discover new bridge
addresses early, by monitoring our probe sites. Even though "default
bridges" are conceptually broken, they do in fact work for many people,
and we wouldn't want to reduce their utility.

In our research so far, we've identified a number of ways that censors
can discover new bridges: by watching the bug tracker, by reading source
code, or by inspecting releases. Whenever possible, we want to start
monitoring new bridges even before they enter the bug tracker. If a
censor discovers one of our probe sites (which would not be hard to do),
then they could watch for new addresses being connected to and add them
to a blocklist. An adversary keeping netflow records could identify
probe sites retroactively: download Tor Browser and get the new bridges,
then find the clients that made the earliest connections to those
addresses.

We mitigate this risk partially by only testing default bridges, not
secret BridgeDB bridges. That way, even if a censor discovers them, it
doesn't affect users of secret bridges. Also, we suspect that, because
default bridges are, in theory, easily discoverable, adding another
potential discovery mechanism of medium difficult does not greatly
increase the risk of their being blocked.

If early blocking of bridges as a result of our experiment becomes a
problem, we can adjust the protocol, for example not to monitor bridges
in advance of their ticket being filed.

Our heretofore published data do not include the IP addresses of the
probe locations. The people who supplied us with the probe locations
asked us not to reveal them. Traceroute will make it harder to conceal
the source of probes in our published data. We can, for example, omit
the first few hops in each trace, but we don't know the best practices
along these lines. The potential harm to probe site operators is
probably less when we use existing measurement platforms rather than
VPSes acquired through personal contacts.

Our results may be contaminated by other experiments being run from the
same source address. The measurement platforms we propose to use already
are running various other experiments, so they may be treated
differently by firewalls. The most likely wrong outcome is that we
falsely detect a bridge being blocked, when it is really the client
address being blocked (because it is a VPN node, for example). The risk
goes in the other direction as well: our experiment might affect others
running on the same endpoint.

Here are the risks related to testing the specific bridge-blocking
hypotheses enumerated in the previous section:
a. The risk in rotating bridge ports is that eventually the censor
   catches on to the pattern and develops more sophisticated, automated
   blocking. If the censor doesn't react, it means we have better
   reachability; but if it does, we lose what small window of
   post-release reachability we have.
b. The risk in segregating bridge addresses across stable and alpha is
   that a network observer can tell which a user is running by observing
   what addresses they connect to. This may, for example, enable them to
   target an exploit that only works on a specific version.
c. The risk in playing games like commenting out bridge lines is slight:
   a commented-out bridge may get blocked even before it has had any
   real users.


== Walk us through why the benefits from item 1 outweigh the remaining
   risks from item 3: why is this plan worthwhile despite the remaining
   risks? 

The main risk, bridge discovery by censors, has low potential harm, and
can be mitigated if necessary by changing when we start monitoring
bridges, or even ceasing the experiment altogether. The risk of our
measurements is probably less than that of even having default bridges
in the first place, because our probes are not connected to any
real-world circumventor.

The risks associated with our specific bridge-blocking hypotheses are
variable, and we would appreciate discussion on them. The one we planned
to try first is the commenting-out one, because it seems to have the
best risk/reward tradeoff.

Incidentally, OONI already has a bridge_reachability nettest that is
similar to what we have proposed:
https://ooni.torproject.org/nettest/tor-bridge-reachability/
However their bridge list is not up to date,
https://gitweb.torproject.org/ooni-probe.git/tree/var/example_inputs/bridges.txt?id=v1.6.1
and a perusal of http://measurements.ooni.torproject.org/ shows that the
test is not being run regularly.