\documentstyle[twoside,fancyheadings,doublespace,fullpage,epsf,amssymbols]{article}
\setstretch{1.0}

\headheight 32pt
\lhead{Defining Anonymity in Anonymous Publishing Systems}
\rhead{Roger Dingledine, Michael Freedman}

\begin{document}
\pagestyle{fancy}
\pagenumbering{arabic}

\section{Introduction}

Many anonymous publication systems claim `anonymity'
without specifying a precise definition.
%  Indeed, we present an extensive related works section:
%most of these works claim an anonymous network or some other
%catchphrase involving anonymity,
Indeed, they often fail to specify
what protections users and operators receive from their system, as
well as what protections users and operators do not receive. These
protections are a function of both the actual publication system and
the communications channel which it utilizes. 

%These protections are a function of both the design of the publication
%system -- fulfilling the requirements of publishing, storage, and
%retrieval -- and the communications medium which it utilizes.

While the anonymity requirements of communications channels have been
considered previously in depth,\footnote{Ian Goldberg, David Wagner,
and Eric Brewer. Privacy-enhancing technologies for the Internet. {\it
Proceedings of IEEE COMPCON '97};
Oliver Berthold, Hannes Federrath, and Marit Kohntopp. Project
``Anonymity and Unobservability in the Internet.''  Workshop on
Freedom and Privacy by Design / CFP2000.}
this paper addresses anonymity at a higher level: the publication systems themselves.

% This paper is not an enumeration of design goals which the Free
% Haven project meets, but rather an enumeration of design goals for
% the ideal anonymous publication system.  More precisely, we
% enumerate a set of design goals for the ideal anonymous publication
% system. 

%The current Free Haven project design does not achieve all of the
%attributes of anonymity that we might want to provide. This paper is
%not an enumeration of design goals which Free Haven meets, but rather
%an enumeration of design goals for the ideal anonymous publication system.

We begin by describing what levels of anonymity a speaker might expect
to achieve.  Then we list some agents in an anonymous publication system
and address anonymity for each of these agents separately; similarly,
we describe and consider the agents within an anonymous communications
channel.  Following this, we present some intuitive notions of types or degrees of
anonymity, addressing the functional dependence between publishing and 
communication agent anonymity.  Finally, we define models of adversaries
to help us in thinking about how anonymous a system as a whole might
be.  

Using the 16 characteristics that we enumerate and define for an ideal
anonymous publication system, we attempt to assess real-world projects
in the face of these new definitions and models.   We find that our
current Free Haven design achieves 6 of 16 attributes, compared to 5
for the Publius system, 3 for Adam Back's Eternity-over-Usenet
proposal, and 2 for both Freenet and Gnutella.  As many of these
systems are designed sufficiently modularly to allow replacing
their communications channels with alternatives that provide better
anonymity, analyzing the publication/control side separately from the
communications channel is important. This analysis leads to some very
useful insights.  

\section{Defining Privacy}

Privacy is the ability for a speaker to control dissemination of information about
himself.  The level of anonymity obtained in a given scenario is a
result of choices made about privacy.

In particular, the speaker may have control in several different
dimensions over dissemination of information about his own speech.
{\em Linkability} describes the presence of distinguishing
characteristics, known as a pseudonym, which provide the ability to
link utterances to a speaker.   The speaker controls which other
parties can be {\em readers} of his speech, and controls the {\em
persistence} of his speech after publication.  The speaker also
controls {\em content leaks} -- how much information is leaked based
on the content of his speech -- and {\em channel leaks} -- how much
partial information is leaked based on the communications channel itself.
Furthermore, the speaker controls the ability of readers to {\em
reply} to his utterances, whether the reply is private, and 
the persistence of the ability to reply.   

%\begin{enumerate}
%\item {\bf Linkability:} The speaker controls the ability of readers to
%link his utterances.  The distinguishing characteristics which provide this
%linkability are called a pseudonym.  
%If there are distinguishing characteristics that
%provide this linkability, these characteristics are called a pseudonym.
%\item {\bf Readers:} The speaker controls which other parties will be
%able to read his speech.
%\item {\bf Persistence:} The speaker controls how long his speech persists
%after the publication.
% (perhaps the ability to reply expires a week after the publication).
%\item {\bf Content Leaks:} The speaker controls how much partial
%information is leaked based on the content of his speech. For instance,
%the speaker may choose to reveal certain personal credentials.
%the speaker may choose to reveal certain information such as credentials
%for being authorized to read the New York Times on the web.
%\item {\bf Channel Leaks:} The speaker controls how much partial
%information is leaked based on the communications channel he chooses
%to use.
%\item {\bf Replies:} The speaker controls whether readers can reply to
%his utterances.  Further parameters include whether the reply is private,
%and persistence of the ability to reply.
%\end{enumerate}

\section{Agents and Operations}

The above list describes some freedoms which a given {\em speaker} may have
available to him. A number of these freedoms have analogs when considered
in the context of other parties in a publishing system.

\subsection{Publication Agents}

In general, there are several agents in an anonymous publication
system: these include the publisher, the reader, and the server. There
are a number of operations that the system might support, such as
(at a minimum) inserting a document into the system and later retrieving the
document.  Agents on the publication level of the system
largely deal with the first four privacy issues:  linkability,
readability, speech persistence, and content leaks.  We address
each of these agents and operations separately, to build an intuitive
notion about which acts allow the dissemination of information. 

\begin{itemize}
\item {\bf Publisher-anonymity:}
While author-anonymity addresses the original author of the document
itself, publisher-anonymity addresses the agent that originally introduces
the document into the system.
\item {\bf Reader-anonymity:}
Reader-anonymity means that readers requesting a document should not
have to identify themselves to anyone.
\item {\bf Server-anonymity:}
Server-anonymity means that the location of the document should not
be known or knowable.
\item {\bf Document-anonymity:}
Document-anonymity means that the server does not know the contents
of the document that it is storing or helping to store.
\item {\bf Query-anonymity:}
A system has query-anonymity if a server cannot determine specifically which
of its documents is being requested by a given reader. For instance, this
might be achieved by a variant of Private Information Retrieval.
%Query-anonymity refers to the notion that over the course of a given
%document query or request, the `identity' of the document itself is not
%revealed to the server.  More formally, when a server currently storing
%$n$ shares answers a request from a reader, that server has no way of
%distinguishing which of the $n$ shares the reader was requesting.

% Hrmm...  ``the server''   - the ``requesting'' server needs to know...
% we're more talking about the current ``storing'' server'' or an
% intermediate server (latter is issue with comm).
\end{itemize}

\subsection{Communication Agents}

In order to fulfill the insertion and retrieval of documents to remote
servers, publication agents make use of a communications channel.  The
agents on this channel include the message sender, the message receiver,
and nodes within the channel. They must support some generalized {\tt
send} primitive for point-to-point and/or multi-cast functionality.   
The design of this channel dictates the extent of protection provided
against channel leaks. The choice of channel also dictates the method
in which message replies are performed. We consider the various types
of anonymous communication properties.  

\begin{itemize}
\item {\bf Sender-anonymity:}
Sender-anonymity means that the identity of the party who sends a
message remains hidden, while making no claim as to the anonymity of
the recipient.
% and the message itself.
\item {\bf Receiver-anonymity:}
Receiver-anonymity means that the identity of a message's recipient is
hidden. 
%\item {\bf Endpoints-anonymity:}
\item {\bf Unlinkability of sender and receiver:}
This means that, although a sender and receiver might be known as
having taken part in {\em some} communication, they should not be
identifiable as communicating with each other. 
%  This is also known as
%unlinkability of sender and receiver.
% I made up ``Endpoints'' - what do you think? 
\item {\bf Node-anonymity:}
Node-anonymity means that a server should not be identifiable
as a participant within the communications channel.
%  Similarly, the presence of
%communication to and from a node should not be perceived.
% Is the second issue actually the same?
\item {\bf Carrier-anonymity:}
Carrier-anonymity means that a node should not be identifiable as a
carrier involved with communicating a message between some sender and
recipient. 
%\item {\bf Query-anonymity:}
%Query-anonymity refers to the concept that a carrier involved in some
%communication cannot determine the `identity' of the document which it
%transmits, in a notion similar to publication query-anonymity.
\end{itemize}

\section{System and Channel Characteristics}

Apart from the supported operations, there are a number of other issues
about anonymity that we address and characterize, including the notions
of linkability and pseudonymity; partial vs. full anonymity;
computational vs. information-theoretic anonymity; and perfect forward
anonymity.  
%\begin{itemize}
%\item Linkability: Anonymity vs. Pseudonymity
%%\item Characteristics of Publishing and Communication Channels
%\item Partial Anonymity vs. Full Anonymity
%\item Computational vs. Information-Theoretic Anonymity
%\item Perfect Forward Anonymity
%%\item What does it mean to `break' an anonymous system?
%%\item Modeling the Adversary
%%\item Application to real-world projects
%\end{itemize}

Given our requirements for an `anonymous' publication system and its
corresponding `anonymous' communications channel, we examine how
real-world projects fulfill these definitions.
%Modeling the problem
%in terms of a two-tiered system allows us to discuss how a system
%`breaks' anonymity and which key requirements are not fulfilled.  
Modeling the problem as a two-tiered system allows us to
more clearly describe attacks on the system, as well as to define what
it means to `break' the anonymity of the system.

Finally, note that the difficulty of designing a good anonymous
publication system is not limited to the difficulty of providing strong
anonymity for the system.  Indeed, there are a number of other qualities
which the designer might desire, such as high availability or robustness
of data; special operations such as the ability to expire or modify
published data; or speed or efficiency of implementation or operation.
Often the level of anonymity a system provides is the direct result of
a tradeoff with the level of availability or flexibility of the system.
By defining characteristics of anonymity which the ideal anonymous
publication system might achieve, we hope to provide a basis for better
system design decisions.

\end{document}