BRAIN-DUMP ON END-TO-END CORRELATION Passive end-to-end correlation attacks all follow the same general format: You're watching some traffic enter and leave an anonymity system, and you want to link traffic streams coming in to traffic streams going out. Let A_i be all data about the streams entering, and B_j be all data about the streams exiting. You compute feature-set functions f_in(A_i), f_out(B_j) on each stream, and then a similarity function c on pair. The pairs giving high values of c probably represent the same stream as it passes through the network. If instead of observing exit streams at the same time as entry streams, you pre-compute expected f_out(B_j) for a large number of possible target responders, and then later observe A_i, you're doing a fingerprinting attack. The math is the same. The usual way to transform this into an active attack is to introduce patterns into one or more target A_i or B_j, so that the corresponding B_j/A_i will not resemble the background traffic. There are many experiments all following the same general design: launch a correlation or fingerprinting attack; see how fast it works. The results are pretty uniformly: "fast". Here are some independent variables and experiment parameters: - What pattern does the traffic on the streams follow? - How many simultaneous streams are there? How many are under observation? - What transformation does the network apply to the streams? This can include delay, padding, aggregation, etc. - What data can the attacker measure about streams, and to what accuracy? - What method does the attacker use for f_in, f_out, and c? - If a fingerprinting attack, how many sites has the attacker fingerprinted, and how often do they change, and by how much? Here are some methodological options: - Do you use an actual deployed anonymity system, an ersatz relaying system, or just simulation? - Do you use live traffic, simulated traffic, or replayed traffic? - For network effects, do you use the real internet, a network with simulated internet-like delays, or do you just do everything on one computer? Here are some possible dependent variables: - How long on average does it take the attacker to correctly identity N (or N%) of target streams? - How much traffic on average does it take the attacker to correctly identify N (or N%) of target streams? Here are some other considerations: - How linkable are different B_j streams coming from the same user? If the user is connecting to a website that uses a "user id" cookie or other identifier, an intersection attack should work very well, even if we somehow beat end-to-end correlation. *** I want answers for the following questions. Evaluation of existing attack methods: - In the domains where they work, do any end-to-end attack methods work better than any others? Does this difference matter in practice? - How much information, in practice, does the attacker need to win? For example, is microsecond timing info really helpful, or can an attacker do just as well with access to the more loosely synchronized, less accurate clocks you'd find in an existing network of off the shelf IDS systems? Evaluation of existing attack methods against Tor: - Against a Tor-like system, do any of the end-to-end attack methods fail to work? What works best? - To what extent if any does Tor's existing cell padding confuse end-to-end attacks? - To what extent if any does Tor's existing stream-multiplexing over circuits, and circuit multiplexing over connections confuse these attacks? Evaluation of Tor modifications against end-to-end attacks - Do any of these help? - Long-range padding cells - Revised cell-packing methods - Mid-latency design - Mixed-latency design (alpha-mixing) Evaluation of experimental reliability: - How dependent are the above results on user behavior models, number of users, etc etc? *** Rough experimental design First, we'll need data. This will consist in some traffic patterns, and in the result of transmitting those traffic patterns through Tor or a Tor-like network. For the traffic, I'd like to consider a few patterns: * To model the report-aggregation case, we can use a simulated model, or traces from actual series of reports. * Lots of people have done work with HTTP: we can ask for an existing corpus of data there (which will be an inaccurate reflection of HTTP in the wild today, but which will at least be repeatable) or we can solicit a few volunteers to have their browsing behavior recorded, or we can some up with some kind of dopey simulation. * I bet IRC, IM, and SSH follow pretty noticeable patterns: if we record them, we'll have a pretty good profile for such patterns in general. * We should consider the above patterns in isolation and in combination (as they'd exist on a real network). For the network, we've got the a few options: * We could simulate something. This seems kind of useless given how unlike simulated networks real low-latency anonymity networks behave, but it would at least be quite repeatable. * We could rig up a private Tor network on a LAN, and maybe add some artificial delay or congestion on some links. This would be better in realism, and still somewhat repeatable. * As above, but we could put the private Tor network on hosts distributed around the internet. We could even encourage people who don't mind having their traffic captured to use it in order to generate background traffic. * We could add a few private entries and exits to the live Tor network, and capture traffic on them. * We could just use the actual live Tor network. This would be the most realistic approach, but I don't believe it's possible to do this without risking exposure of actual user traffic. Thus, we must utterly reject this approach for ethical grounds. (We _could_ use the live Tor network to spot-check the realism of our experimental network by comparing their delay characteristics on streams generated by us. Since we would only be observing our own streams, there would be no risk of compromising real users' traffic.) I think the third option above presents the best compromise between repeatability and realism. Also, it'll be far more easy later to experiment with protocol modifications if we're on a separate, private Tor network. The experiment doesn't need to last too long: getting a few hours with each configuration would be fine to start. (After all, it will be a publishable result if any of the attack methods we're considering turns out to take longer than a few minutes to succeed.) To evaluate how much our deviations from reality are hurting us, we should make sure to measure along axes relating to those deviations: size of network, network congestion, etc. Getting the data will probably take longer than analyzing it, and it's likely that in analyzing it, we'll think of more experiments we want to do. Therefore, I'm not going to discus analysis in too much detail right now.