%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Overview - The problem we're trying to solve. - Background: anonymity systems - Background: Internet routing - Some things we learned - Questions we still need to answer %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The big picture Alice wants to transact with Bob on the Internet (fetch a web page, send an email) without letting anybody link them together. Alice wants to be safe from somebody watching her or somebody watching Bob. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% A trusted proxy isn't good enough These proxies are trust and performance bottlenecks. Add a constraint: want to be safe from a compromised middle node too. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% So: distributed trust By using multiple hops, no single node can link Alice to Bob. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Tor and Mixmaster Two widely deployed anonymity networks (thousands of users each). Tor is for TCP streams (low-latency). Mixmaster is for email (high-latency). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Two major attack classes Follow-the-transaction attack: try to learn each hop of a transaction and follow it from beginning to end. Endpoint attack: use statistics to match a transaction coming into the network to a transaction leaving the network. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Three major defense classes Batching and pooling: delay messages to get a large batch, and mix them together to hinder the adversary from linking Alice's message to Alice. Padding: senders provide decoy traffic as well as normal traffic. Dispersal: reduce the chance that the adversary sees enough of the network to complete his attack. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Dispersal approaches Grow the network so a given adversary sees less. Arrange the topology so messages can enter or exit at many places (e.g. cascade vs free route). Location arbitrage: spread each transaction over multiple jurisdictions. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Many different families of locations E.g. Areas controlled by a single country, state, company, ... E.g. Nodes running the same operating system or class of software. We focus here on the family of locations that are ISPs. More correctly, \emph{autonomous systems} (ASes). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% AS-level paths The key insight is that while we typically think of a connection as going from Alice to Node1, actually it traverses many different ASes for that single hop. Paths based on policies, not just shortest-path %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Need to passively estimate paths Can't pull down all routing tables; can't traceroute. Used Oregon RouteViews Project data to learn adjacencies. Mao et al's [24] estimation technique is >80\% right. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Location independence metric What is the chance that some AS is on both the path from Alice to the mix-net, and also the path from the mix-net to Bob? %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Question one Is considering IP prefix good enough? Tarzan, Morphmix, etc recommend this. Not the same. In practice, we see several cases of same-AS nodes with different prefixes Of the 5 pairs in Mixmaster in the same AS, three have different class A prefixes! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Question two (1) How much can one AS attack inside the network? This lets him follow a transaction (easier than doing stats). Also means we're not getting the full protection of n hops that we thought we were. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Question two (2) Top two AS-level between-node observers in the US, to both Tor and Mixmaster: Level 3 and Abovenet Together they watch over half the links in the Tor network. Choosing paths without replacement helps: a 4-hop Tor path can be observed by a single AS with prob .10, compared to .16 if replacement is allowed. Don't forget that forward paths may be different from reverse paths. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Question three How much can one AS attack the endpoints? Remember that it's sufficient to look at endpoints of the network, both for low-latency or high-latency networks. Endpoints can be first and last node, but they can also be Alice and Bob. We picked some reasonable sounding Alices and Bobs, mostly in the US. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Endpoint attacks are an issue Given random entry and exit points, a single AS will often be able to win 10\% to 30\% of the time. It's possible to reduce this to almost 0. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Node placement With US-based Alices and Bobs, adding a far-flung node (e.g. in Asia) *hurts* us, not helps. Best node placement for protection against the AS-level adversary is in ASes that have the most links to other ASes: tier-1 ISPs. A given transaction is safest when Alice, Bob, or both are in tier-1 ISPs. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Future work (1) Consider a more diverse set of Alices and Bobs. How to get routing table to Alice? Are there practical approximations that still work ok? How sensitive is this metric to adding or subtracting a few nodes? What about repeated web fetches, using different entry and exit points each time -- how quickly does Alice's location independence degrade? Correlation between speed/reliability of network and its location independence? %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Future work (2) Do this analysis for different location metrics, such as countries. Caching at exit nodes (when feasible) changes the equation. Akamai? Different routing; also dangerous observer. Do we *hurt* anonymity by restricting path choices, against larger adversaries who can take advantage of knowing our algorithm?