\section{Attacks on the Communications Channel}
\label{sec:attacks-comm}

\footnote{This section was written by Michael Freedman.}
Adversaries operating on the communications channel may seek to weaken
one of the five types of communication anonymity: sender-anonymity,
receiver-anonymity, unlinkability between sender and receiver,
node-anonymity, and carrier-anonymity.  We consider both passive and
active adversaries, attacking nodes within the communications channel,
internal links between nodes within the channel, and endpoint links
from the channel to users (i.e., the sender or receiver, which are
both servnet nodes in the case of Free Haven).

% How to fit these ideas in?
% "closed" system -- one in which distinctness of users can be verified 
%    because the number of users is fixed and known in advance
% "open" systm -- number of users not known, and users can have 
%    many identities

\subsection{Communications Nodes}
\label{sec:attack-comm-node}

The following attacks assume an active adversary that controls one or
more nodes within the communications channel:

\begin{itemize}
\item {\bf Denial of Service Attack:}
An ``evil'' node within the communications channel can selectively drop
messages/packets that it receives at will.  

{\it Prevention:} There is basically nothing a system can do to stop a
node from behaving in such a manner;  however, users can occasionally
``ping'' various nodes to determine response time.  If a node drops a
sufficient number of packets and cannot differentiate between ping and
data packets, users will come to realize that the node is not reliable
and will stop using it. 

Many sources maintain statistical information on the reliability of
mixnet nodes, especially Cypherpunk (Type I) and Mixmaster (Type II)
remailers \cite{mixmaster}.  The most common of these networks use only a
small number -- a dozen or two -- public remailers that are known.
Naive denial of service attacks would be noticed.


\item {\bf Traceroute Collusion Attack:}
A corrupt coalition of nodes within the system collude in order to trace
certain messages through the communications channel. An ``evil'' 
node receives a message, knowing the IP address of both the last-hop and
next-hop.  Given the ability to collude with a sufficient number of
nodes that have received the same message, the path through the
communications channel can be traced, as well as ultimately finding the
message sender or receiver.

{\it Prevention:} An ideal anonymous communication system will be
distributed, as any corrupted central system risks the exposure of
both sender and receiver.  For a distributed system, a route traversing
$k$ nodes preserves anonymity given a maximum of $k-1$ adversaries along
this path.  This protection also requires that adversaries cannot track
a message across one hop, requiring both that the message changes across
every node and adversaries cannot perform effective traffic analysis.

\item {\bf Cut-the-Channel Collusion Attack:}
Similar to the traceroute attack, adversaries need to control a majority
of nodes or bottlenecks within the communications channel.  If these
evil nodes drop packets and perform denial of service on a large scale,
an adversary can watch which connections remain, recognizing more easily
the normal communications path used between two users. 

{\it Prevention:}  Similar to other collusion attacks, a distributed
system with many independent operators reduces the possibility of a
large number of colluding node adversaries.  System users can recognize
the widespread failure of nodes and stop using the communication
channel.  Obviously, this presents a system-wide denial of service
attack, affecting the users' overall trust in the system.

\item {\bf Traffic Mangling Attack:}
To perform a traffic mangling attack, an adversary requires control of
nodes at the edges of the communications channel.
When an entry node receives a message from a sender, it mangles the
message such that transmission or routing will occur properly, but an
exit node on the other edge of the communications channel can
recognize the mangled message.  The colluding nodes can communicate
outside of the normal channel, and establish sender-receiver
linkability and IP correlation.  Therefore, this attack does not
require a large control over the system, such as the traceroute
collusion attack, to link communicating agents.

{\it Prevention:} There are two main defenses against this type of
attack.  First, message packets should be encrypted or encoded in such
a way that any change by a node -- the packet mangling itself -- will
be detected by other nodes, and the packet discarded.  This defense
relies on the assumption that not all nodes along the message's path
are compromised.  Many existing systems (e.g., Onion Routing, Freedom)
use symmetric key link-layer encryption to counter a traffic mangling 
attack.  Second, strong partial anonymity is a defense
mechanism against this attack.  Namely, if an ``evil'' node cannot
determine whether it exists at the true edge of the communications
channel (i.e., it cannot tell if the agent from which the message
arrives is the initial sender), the node can only reveal some
$k$-anonymous set of possible senders or receivers.

\end{itemize}

Form-based proxy systems (Anonymizer, LPWA) cannot really be analyzed in
terms of these various forms of attack.  Indeed, these systems basically
rely on a single trusted third party: the proxy itself.  If an adversary
manages to take control of the proxy, both sender-anonymity and
receiver-anonymity are lost, and linkability between the two is also
established. Carrier- and node-anonymity are only relevant to
distributed systems.  

\subsection{Communications Channel Links}


This section describes a number of attacks that an adversary may perform
on links within the communications channel.  The adversary's goal is to
determine a message's sender or receiver, or to provide linkability
between the two agents.  Many of these attacks hold the greatest risk
to systems attempting to achieve full $n$-anonymity, where $n$ is the
number of system users.  In the case of systems that provide partial
anonymity, an adversary may gain information from traffic analysis or a
similar attack. Yet, the adversary only reveals a $k$-anonymous set
($k < n$), describing an probabilistic distribution of anonymity, where
$k$ identities are below entropy threshold and thus ``exposed.''  Let us
consider the following attacks:

\begin{itemize}
\item {\bf Computational Attack:}
An adversary can sniff a link within the communications channel, and
thus be able to read anything that is sent over that link.
Presumably, both the data and transmission path are encrypted.  To
ensure the anonymity of sender and receiver agents, an adversary
should not be able to easily determine this transmission path.
Various levels of anonymity result from the security of the path's
encryption and encoding.  Obviously, this attack can also be performed
at nodes within the channel or at its edges.

{\it Prevention:} For naming schemes which rely on computationally-secure
reply blocks or pseudonyms, an adversary with sufficient computing power
can eventually decrypt the name and determine agent identity.  If the
communications channel relies on partial anonymity, successfully
decrypting the name will only reveal a $k$-anonymous set of
possibilities. 

\item {\bf Message Coding Attack:}
An adversary can trace or link a message that does not change its coding
during transmission.  If links can be passively monitored, the listener
can determine the path that a message takes through the communications
channel. Obviously, this attack can also be performed by colluding nodes
within the channel, but this attack does not require that much
penetration into the system. 

{\it Prevention:}  An end-to-end encoding or encryption scheme is used
by the sender, such that the message changes across each link in the
communications channel.  In a distributed system, a link-to-link scheme 
is not sufficient to prevent collusion attacks, as messages need to be
different at the edges of the channel to protect sender/receiver
anonymity. 


\item {\bf Message Volume Attack:}
An adversary can analyze traffic across a link and examine packet
size.  If a message of the same or similar length is detected traveling
through various communications links, a global observer can determine
the transmission path and ultimate sender/receiver.  

{\it Prevention:}  All messages in the system should be of the same size
or of a random size.  Messages should therefore be padded with random
bits, or concatenated with other messages and padded to the specified
size. 


\item {\bf Traceroute Replay Attack:}
Within an anonymous communications channel, messages are transmitted to
a given reply block or pseudonym of the receiver.  An attacker listening
on the link can record messages that pass by, and then attempt to
forward traceroute the channel by flooding the system with the replayed
message.  Similarly, if some reply path or nym is given for the sender,
the adversary can try to reverse traceroute the message by flooding the
sender. The attacker can then perform traffic analysis to detect which
links within the channel, or edges of it, see a rise in traffic.  This
rise in traffic suggests the message's route.

{\it Prevention:} To stop forward traceroute replay attacks, a nonce
should be included in each message, and nodes should not resend a
message that has already been sent.  Nodes would be required to store a
``graveyard'' of used nonces to lookup.  Also, an attacker must not be
able to change the included nonce.   The system can protect against
reverse traceroute replay attacks by only including a way to reach the
sender if a reply is necessary, as well as making this information only
available to the receiver.


\item {\bf Intersection Attack:}
An adversary may trace user identities by examining usage patterns over a long 
period.  Similar to distinguishing characteristics of a speaker, users
may manifest distinguishable behavior.  For example, they may exhibit
typical on-line/off-line periods, utilize similar resources over time,
contact the same destinations or Internet sites regularly, and so on. If
any distinguishing characteristics are transmitted (i.e., pseudonyms,
Cookies), an adversary can link past and future communications to the
specific user with greater certainty.  

{\it Prevention:} We cannot determine any protection against this type
of attack.  Others have questioned if this problem is even solvable
\cite{anon}.


\item {\bf Sniping and Cut-the-Internet-Backbone Attacks:}
This attack is similar to the cut-the-channel collusion attack between
nodes.  An adversary has the ability to snipe specific links with the
system for selective denial of service, or bring down large segments of
the Internet to destroy inter-node links within the communications
channel.  The attacker can then see which connections of interest remain
and perform traffic analysis.

{\it Prevention:}  This attack is beyond the resources of many
individuals and organizations.  A sniping attack might help an
adversary perform traffic analysis to some degree;  if the attacker has
the ability to cut any link at will, they can eventually expose senders
and receivers.  However, national intelligence agencies are probably the
only organizations able and willing to ``Cut-the-Internet-Backbone.''
\end{itemize}


\subsection{Communications Channel Edges}


This section describes a number of attacks that both active or passive
adversaries may perform on edges of the communications channel.  The
edges of the channel correspond to links between a communicating agent
(i.e., sender, receiver) and the immediate node within the channel to
which it communicates.

\begin{itemize}
\item {\bf Timing Attack:}
An adversary can attempt to link a specific message between two parties
by watching endpoint send and receive actions.  Given the would-be
transmission time of this message, the attacker can consider the
correctness of these two endpoints.

{\it Prevention:}  A message can only be protected from a timing attack
by hiding the message with others.  The linkability between sender and
receiver can obviously be established if only one message is transmitted
within a certain time period.  This protection is established by
introducing latency, adding dummy messages, or reordering packets. 

Unless system load is extremely low, a timing attack is likely to expose
linkability only with real-time services.  Many systems -- such as the
original Chaumian mix -- are based on {\tt sendmail}, already having
quite high and variable transmission time.  However, systems designed to
allow {\tt telnet}, Web browsing, IRC, and other such services risk
linkability from timing attack.

Adding a variable delay decreases the ability to perform timing
attacks given a reasonable system load.  However, the system has a
higher latency than necessary, and the system is still open to traffic
analysis attacks, especially if the number of messages across the
channel remains small. 

Dummy messages can be transmitted instead to remove this unnecessary
latency, adding load to the system.  If adversaries are given only a
possible transmission duration, this solution increases the difficulty 
of correlating times to specific messages.  Dummy messages are an
end-to-end solution, whereas link-level garbage can be recognized and
thereafter ignored by an adversary that watches the endpoints.  

Lastly, communication channel nodes can reorder packets when they are
received.  Nodes store $n$ packets.  Upon receiving more packets, the
node chooses some $k$ packets at random from this $(n+k)$ pool and sends
them out.


\item {\bf Trickle Attack:}
An adversary has complete active control of all the edges of the
communications channel.  The adversary stops all incoming messages
from entering the channel except one.  The next incoming message is
not released until the first message is detected along some edge
exiting the channel.  As only one message is transmitted through the
channel at once, the global observer can establish linkability between
sender and receiver.

Alternatively, an adversary can achieve active control over all the
links of some internal node within the channel.  This type of attack is
more easily attained than system-wide control, and allows attacks as
described in section \ref{sec:attack-comm-node}.
% More detail:  Dogan Kesdogan et al.  ``Stop and Go MIXes'' 
% 1998 Info Hiding Workshop

{\it Prevention:} We cannot determine any protection against this attack
for communications channels which provide an explicit mapping of names
to sender/receiver agents.  For systems which provide partial anonymity
even after exposure, the ``edge'' of the communications channel only
reveals a $k$-anonymous set of possible receivers.  


\item {\bf Identification Flooding Attack:}
An adversary can flood with the system with identifiable packets.  A
message can only remain hidden within the context of other known
messages.  During normal operation, each system user would send only one
message during each time interval, thus producing an independent set of
anonymous messages.  However, if an attacker floods the system and fills
a node's reorder buffer with $n$ packets, it removes the node's defense
against timing attacks and allows a certain message to be more
identifiable.


{\it Prevention:}  A flooding attack is very difficult to defend against
for practical Internet systems.  Ideally, the system would be able to
establish a unique, anonymous identity for each of the $n$ users that
send messages during one transmission interval.  Performing adequate
authentication of message senders while maintaining anonymity is a
difficult problem.  Possible solutions include the use of pseudonyms for
partial anonymous systems, requiring that adversaries cannot control a
significant number of pseudonyms.  Similarly, some form of blind
signature scheme can be used for anonymous authentication.


\item {\bf Traffic Flooding Attack:}
Similar to the identification flooding attack, an adversary sends a
large number of messages into the system to greatly increase traffic
along certain paths.  The adversary then proceeds to measure a rise in
traffic along the communications channel's edges or along internal
links.  This form of traffic analysis suggests the route taken to reach
a specific reply block or pseudonym receiver.

{\it Prevention:}  The system should ensure that an equal number of
packets are sent between each link during some time interval, by either
introducing dummy packets or latency.  This protection is similar to
that used for timing attacks.  However, maintaining steady traffic along
the edges of a communications channel is more difficult, given the
possibility of very bursty traffic.  The system can add dummy packets to
the endpoint, but would then require client-side filtering.

\item {\bf Pseudonymity Marking Attack:}
An adversary masquerades as a normal user and distributes unique names
to other distinct agents in the system.  These unique names correspond
to different reply blocks or pseudonyms, such that the last hop of the
transmission path is different for each name.  The adversary can then
correlate the last hop of the received message to a specific sender.
%even if the sender wants true anonymity (i.e., no method of reply). 
Linking a sender agent to an individual user or IP address remains a
separate problem, unless this can be determined during name
distribution.

{\it Prevention:} This attack is only viable against a communications
channel in which the receiver has an explicit transmission path, such as
with remailer reply blocks, as opposed to multi-cast or random-walk
functionality.  Secondly, a sender can defend against this attack by
getting the receiver's name from a third party -- such as another
trusted agent or some specified meeting-place for name distribution.

\item {\bf Persistent Identity Attack:}
The persistent identity attack is not an attack per se, but rather an
inherent anonymity weakness with any persistent naming infrastructure.
If an adversary manages to disclose user information with any of the
attacks we have enumerated, the adversary can correlate any future use
of this name with the exposed individual.

{\it Prevention:} System agents -- senders and receivers -- should use
dynamic naming to provide forward anonymity, or rely on a partial
anonymity scheme such that disclosure of agents only reveals a
$k$-anonymous set of possibilities.

\item {\bf The Need for ``Recipient-Hiding" Public Key Encryption:}
We call a public key cryptosystem \emph{recipient-hiding} if it is
infeasible to determine, given a ciphertext, the public key used to create
that ciphertext. The recipient-hiding property is \emph{not} implied
by the standard definition of semantic security (even with respect to
adaptive chosen ciphertext attack). Moreover, it is not even achieved in
common practical constructions. This has implications for mixnets which use
reply blocks that are separate from the body of the message. 

To see that semantic security and recipient hiding are independent,
consider any semantically secure cryptosystem $C$. Construct the
cryptosystem $C'$ which is just like $C$ in every way, except that it
appends the public key used to encrypt to every ciphertext. All messages
produced by $C'$ with the same public key are indistinguishable from each
other if the messages produced by $C$ are, and so $C'$ is semantically
secure -- but it is the very opposite of recipient hiding. 

In practice, mail programs such as PGP tend to include the recipient's
identity in their header information. Even if headers are stripped, David
Hopwood has pointed out in the case of RSA that because different RSA
public keys have different moduli, a stream of ciphertext taken modulo
the ``wrong" modulus will tend to have a distribution markedly different
from the same stream taken modulo the ``right'' modulus. This allows an
adversary to search through a set of possible public keys to find the one
which is the best fit for any ciphertext, even if OAEP or similar padding
is used. 

In mixnets which provide reply blocks, the reply block is often treated as
opaque(for example Babel \cite{babel}) and prepended to the message to be
sent. This means that the message is available for inspection by each
intermediate hop with no processing at each hop; for this reason the
message is often encrypted with the public key of the recipient.  The
point of a reply block is to provide a chain of mix nodes between
the sender and the recipient, in which intermediate nodes are supposed to
know neither the sender nor the receiver. If the recipient can be
identified by simply inspecting the message, then every single intermediate
node knows the destination, and knows approximately where it is in the
reply block. This may leak an undesirable amount of information about the
mixnet. 

{\it Prevention:} 
Rivest suggested that randomized cryptosystems (such as the
Goldwasser-Micali cryptosystem) might possess this
property. Independently, Lysyanskaya and Wagner proposed a version of
ElGamal in which all parties share the same modulus as a concrete
example. There is a formal definition of recipient-hiding proposed on
sci.crypt by Hopwood  \cite{hopwood}, with application to showing that a
variant of Bellare, Abdalla, and Rogaway's DHAES scheme \cite{DHAES}
achieves the recipient-hiding property, along with a variant of RSA in
which moduli are  generated to be close together. It seems that
recipient-hiding cryptosystems may not be that hard to construct, once
the requirement is recognized. The problem is that because previous
systems were not designed with anonymity in mind, commonly deployed
cryptosystems may not be recipient-hiding. 

\end{itemize}