BRAIN-DUMP ON END-TO-END CORRELATION

Passive end-to-end correlation attacks all follow the same general format:

    You're watching some traffic enter and leave an anonymity system, and you
    want to link traffic streams coming in to traffic streams going out.  Let
    A_i be all data about the streams entering, and B_j be all data about the
    streams exiting.  You compute feature-set functions f_in(A_i), f_out(B_j)
    on each stream, and then a similarity function c on <f_in(A_i),
    f_out(B_j)> pair.  The pairs giving high values of c probably represent
    the same stream as it passes through the network.

    If instead of observing exit streams at the same time as entry streams,
    you pre-compute expected f_out(B_j) for a large number of possible target
    responders, and then later observe A_i, you're doing a fingerprinting
    attack.  The math is the same.

The usual way to transform this into an active attack is to introduce
patterns into one or more target A_i or B_j, so that the corresponding
B_j/A_i will not resemble the background traffic.

There are many experiments all following the same general design: launch a
correlation or fingerprinting attack; see how fast it works.  The results are
pretty uniformly: "fast".

Here are some independent variables and experiment parameters:

  - What pattern does the traffic on the streams follow?
  - How many simultaneous streams are there?  How many are under
    observation?
  - What transformation does the network apply to the streams?  This can
    include delay, padding, aggregation, etc.
  - What data can the attacker measure about streams, and to what accuracy?
  - What method does the attacker use for f_in, f_out, and c?
  - If a fingerprinting attack, how many sites has the attacker
    fingerprinted, and how often do they change, and by how much?

Here are some methodological options:
  - Do you use an actual deployed anonymity system, an ersatz relaying
    system, or just simulation?
  - Do you use live traffic, simulated traffic, or replayed traffic?
  - For network effects, do you use the real internet, a network with
    simulated internet-like delays, or do you just do everything on one
    computer?

Here are some possible dependent variables:
  - How long on average does it take the attacker to correctly identity N (or
    N%) of target streams?
  - How much traffic on average does it take the attacker to correctly
    identify N (or N%) of target streams?

Here are some other considerations:
  - How linkable are different B_j streams coming from the same user?  If the
    user is connecting to a website that uses a "user id" cookie or other
    identifier, an intersection attack should work very well, even if we
    somehow beat end-to-end correlation.

                                     ***

I want answers for the following questions.

Evaluation of existing attack methods:

  - In the domains where they work, do any end-to-end attack methods work
    better than any others?  Does this difference matter in practice?

  - How much information, in practice, does the attacker need to win?  For
    example, is microsecond timing info really helpful, or can an attacker do
    just as well with access to the more loosely synchronized, less accurate
    clocks you'd find in an existing network of off the shelf IDS systems?

Evaluation of existing attack methods against Tor:

  - Against a Tor-like system, do any of the end-to-end attack methods fail
    to work?  What works best?

  - To what extent if any does Tor's existing cell padding confuse end-to-end
    attacks?

  - To what extent if any does Tor's existing stream-multiplexing over
    circuits, and circuit multiplexing over connections confuse these
    attacks?

Evaluation of Tor modifications against end-to-end attacks

  - Do any of these help?
    - Long-range padding cells
    - Revised cell-packing methods
    - Mid-latency design
    - Mixed-latency design (alpha-mixing)

Evaluation of experimental reliability:

  - How dependent are the above results on user behavior models, number of
    users, etc etc?

                                     ***

Rough experimental design

First, we'll need data.  This will consist in some traffic patterns, and in
the result of transmitting those traffic patterns through Tor or a Tor-like
network.

For the traffic, I'd like to consider a few patterns:
   * To model the report-aggregation case, we can use a simulated model, or
     traces from actual series of reports.

   * Lots of people have done work with HTTP: we can ask for an existing
     corpus of data there (which will be an inaccurate reflection of
     HTTP in the wild today, but which will at least be repeatable) or we can
     solicit a few volunteers to have their browsing behavior recorded, or we
     can some up with some kind of dopey simulation.

   * I bet IRC, IM, and SSH follow pretty noticeable patterns: if we record
     them, we'll have a pretty good profile for such patterns in general.

   * We should consider the above patterns in isolation and in combination
     (as they'd exist on a real network).

For the network, we've got the a few options:
   * We could simulate something.  This seems kind of useless given how
     unlike simulated networks real low-latency anonymity networks behave,
     but it would at least be quite repeatable.

   * We could rig up a private Tor network on a LAN, and maybe add some
     artificial delay or congestion on some links.  This would be better in
     realism, and still somewhat repeatable.

   * As above, but we could put the private Tor network on hosts distributed
     around the internet.  We could even encourage people who don't mind
     having their traffic captured to use it in order to generate background
     traffic.

   * We could add a few private entries and exits to the live Tor network,
     and capture traffic on them.

   * We could just use the actual live Tor network.  This would be the most
     realistic approach, but I don't believe it's possible to do this without
     risking exposure of actual user traffic.  Thus, we must utterly reject
     this approach for ethical grounds.  (We _could_ use the live Tor network
     to spot-check the realism of our experimental network by comparing their
     delay characteristics on streams generated by us.  Since we would only
     be observing our own streams, there would be no risk of compromising
     real users' traffic.)

I think the third option above presents the best compromise between
repeatability and realism.  Also, it'll be far more easy later to experiment
with protocol modifications if we're on a separate, private Tor network.

The experiment doesn't need to last too long: getting a few hours with each
configuration would be fine to start.  (After all, it will be a publishable
result if any of the attack methods we're considering turns out to take
longer than a few minutes to succeed.)

To evaluate how much our deviations from reality are hurting us, we should
make sure to measure along axes relating to those deviations: size of
network, network congestion, etc.

Getting the data will probably take longer than analyzing it, and it's likely
that in analyzing it, we'll think of more experiments we want to do.
Therefore, I'm not going to discus analysis in too much detail right now.