% Author:  Mike Freedman (mfreed@mit.edu)

\chapter{Communication Module and Protocols}
\label{chap:impl-comm}

\footnote{This chapter was originally written by Michael Freedman.}
This section details the design and implementation of a communications
module for Free Haven, serving as an interface between the anonymous
publication system and its corresponding anonymous communications
channel(s).  

\section{Module Design}

An anonymous communications channel is used to carry messages between
servnet nodes.  In order to provide an abstraction for the specifics of
our communication channel, we provide a communication module that
interfaces with both the outer anonymous channel and ``inner'' {\it
haven} module, which provides the actual document publishing, storage,
and retrieval functionality.

%In our design summary in section \ref{sec:req-comm}, we enumerated a
There are a number of required operations for the communications module
-- sending messages, broadcasting
messages, naming nodes for message delivery, adding nodes, and removing
nodes -- and desired goals -- low latency, delivery- and routing-robust,
resistant to attack, and decentralized.  While many of these
operations and goals are inherently part of the communications channel,
the module should provide a flexible and robust interface layer between
the two subsystems.  

\subsection{Supported Module Operations}

Specifically, the {\it comm} module must support the following operations
or characteristics:

\begin{itemize}
\item The {\it comm} module should send data to {\it haven} when it
becomes available from the communications channel.
\item The {\it comm} module should send data to the communications
channel when it becomes available from the {\it haven} module.
\item Messages can arrive simultaneously from the communications channel.
\item Messages can arrive from the {\it haven} module simultaneously with
messages arriving from the communications channel.
\item Messages on a socket can arrive in a delayed manner.
\item The {\it haven} module can {\it fail} unexpectedly, but {\it comm}
should still continue normal operation and reconnect to {\it haven}
when it becomes available again.
\end{itemize}

\subsection{Module Data Structures}

The communications module has several associated data structures:

\begin{itemize}
\item {\bf Incoming Feeder Queue:}  List of $\{socket, file desc,
filename\}$ tuples that correspond to feeder programs currently being 
processed by the communications module.
\item {\bf Haven Message Queue:} List of messages that have arrived
from the communications channel, awaiting transmission to the haven
module. 
\item {\bf Node Database:}  Database of information stored for
all known servnet nodes.\\
{\bf Key:}  Hash of public key\\
{\bf Value:}
\begin{itemize}
\item {\bf Public key}
\item {\bf Waiting message queue:} List of filenames queued for each
node in the database, corresponding to outgoing messages from the haven
module.   
\item {\bf Length of queue}
\item {\bf Cost:} Total ``weight'' of waiting messages in queue
\item {\bf Communications channel type:} Mixmaster, ZKS, etc.
\item {\bf Address / routing info:} Reply blocks, pseudonyms, etc.
\item {\bf Statistics:} Timeout and availability considerations
\end{itemize}
\end{itemize}


\subsection{System Modularity}

The Free Haven design stresses modularity between various pieces of the
system -- The {\it haven} module, the {\it comm} module, the {\it trust}
module, the {\it crypto} module, the {\it ui}, and the actual
communications channel -- providing a strong separation of function.  
While the initial proof-of-concept implementation of Free Haven will
assume a 1:1 {\it haven} to {\it comm} module relation, with both
processes running on the same machine, this relationship is not necessarily fixed.
Indeed, with only slight modification to the actual socket code, our
design allows for several {\it haven} process to share a single network
{\it comm} process interfacing with different communications channels.
In order to support this scenario, we should not assume a secure connection across
the {\it comm}--{\it haven} socket.  Instead, all data across this
socket uses both public-key encryption for security and base-64 encoding
within data blocks to ensure proper user-formatting of information.

\section{Module Implementation}

In Chapter \ref{chap:anon} we discussed the anonymity offered by
various anonymous channels; in Chapter \ref{chap:future} we will suggest some other possible
channel designs.  The current implementation, however, utilizes the
Mixmaster remailer \cite{mixmaster}.  Mixmaster was chosen because of
its simple command-line interface, strong anonymity offered through
Chaumian mixing, protection from traffic analysis and other attacks
due to its high latency, message padding, and packet reordering and
buffering within nodes.  Furthermore, Mixmaster is freely available,
making it quite suitable for an open project such as Free Haven.

The code base allows for the easy incorporation of different
mixnets and other anonymous communications channels, by adding
simple switching logic to the {\tt do\_transmit} function and
specifying the proper mixnet type and address in the node database.
Indeed, the only primitive that we require is a generalized {\tt send}
function.   Therefore, a servnet node can actually specify 
the communications channel on which it wishes to communicate. 

The {\it comm} module implementation also provides database
flexibility.  We provide an abstraction layer on top of the Node DB,
separating any database-specific operations from the user.  The GPL-released 
{\tt gdbm} is currently used in the initial implementation; a 
relational database of greater functionality is being considered for
further development. 

\subsection{Implementation Pseudocode}

The communications module provides an ``always-on'' module to handle
incoming and outgoing messages from a servnet node.  The basic
operation of the module is as follows:

\begin{itemize}
\item Create a non-blocking listening tcp socket for incoming
communications for feeder programs.  These {\it feeders} are system
processes that will pass messages from the communications channel to the
Free Haven communications module.
\item Loop on the following control structure:
\begin{enumerate}
\item Connect to haven socket.  If this socket is not available,
continue processing available information and try again upon next
iteration. 
\item If data available on {\it haven} socket, process the outgoing
message.
\begin{enumerate}
\item If message is of type {\tt broadcast}, enumerate the list of non-dormant
servnet nodes and {\tt transmit} the message to each node.
\item If message is of type {\tt transmit}, enqueue the message within
the node's waiting message queue for later processing.
\item If message is of type {\tt introduce}, add the corresponding
information about a new node to the node database.
\end{enumerate}
\item If new {\it feeder} attempts to connect to non-blocking incoming socket,
enqueue the new {\it feeder}.
\item For all {\it feeder} sockets on which data is available:
\begin{enumerate}
\item Read all the information from the socket into the {\it feeder's}
corresponding file.
\item If the {\it feeder} has reached EOF, place the file into
the haven message queue and close the {\it feeder}.
\end{enumerate}
\item Send incoming messages to the haven module, extracting from haven
message queue.
\item Transmit waiting messages for a node, concatenating and padding to
reach a total packet size up to a random or statically-assigned length. 
Any messages that cannot fit within this buffer remain in the waiting 
messages queue for later transmission.
\end{enumerate}
\end{itemize}


\begin{table}
\begin{center}
\begin{tabular}{|| l ||}
\hline \hline
{\bf /* Communications Handling Functions */}\\
void handle\_ports(int incoming\_port, int listen\_socket, int haven\_port);\\
void create\_feeder\_entry(int new\_feeder\_socket);\\
void process\_feeder\_entry(struct feeder\_t *feeder, struct feeder\_t *prev\_feeder);\\
void enqueue\_haven\_message(char *filename);\\
void process\_freehaven\_message(char *filename);\\
void process\_internal\_message(char *filename);\\
void send\_file\_to\_haven(char *filename);\\
void exit\_comm();\\

\hline
{\bf /* Transaction Functions */}\\
int transmit(struct tag\_t *tag\_list);\\
int do\_transmit(char *PK);\\
int broadcast(struct tag\_t *tag\_list);\\

\hline
{\bf /* Node Database Functions */}\\
void initialize\_nodedb(void);\\
datum construct\_gdbm\_entry(struct nodedb\_entry\_t *entry);\\
struct nodedb\_entry\_t reconstruct\_nodedb\_entry(void *noded\_value);\\
int node\_change\_PK(char *hPK, char *newPK);\\
int node\_add\_new\_node(char *hPK, char *PK, char *address, char *mixnet, int statistics);\\
struct message\_queue\_t* node\_get\_message\_queue(char *hPK);\\
char* node\_get\_PK(char *hPK);\\
char* node\_get\_mixnet(char *hPK);\\
char* node\_get\_address(char *hPK);\\
char* node\_get\_busiest\_PK();\\
int node\_add\_waiting\_msg(char *hPK, char *filename);\\
int node\_get\_waiting\_msgs(char *hPK);\\
int node\_set\_statistics(char *hPK, int statistics);\\
int node\_get\_statistics(char *hPK);\\
void free\_node\_msgs(struct nodedb\_entry\_t nodedb\_entry);\\
void close\_nodedb();\\

\hline \hline
\end{tabular}
\label{tab:comm-api}
\caption{Communications Module API}
\end{center}
\end{table}

\newpage
\subsection{Design Discussion}

The communications module itself requires access to the crypto module.
Communications between the {\it haven} and {\it comm} modules should be
ASCII-armored with base-64 encoding, as the messages use standard XML
format with data delimited by begin $<tag>$ and end
$</tag>$.   Using base-64 encoding stops an user from placing
$<$ or $>$ symbols into the internal data block, which might confuse
message parsing.  Furthermore, communications between the {\it haven}
and {\it comm} are encrypted, to provide for the option of distributed
module processes and multiple {\it haven} processes per {\it comm}
process.

The communications module also needs to perform any necessary message
encryption or encoding necessary, based on the type of communications
channel used.  For the current Mixmaster implementation, the {\it comm}
process performs layered encryption of the message data.  This layered
encryption method -- or ``onion routing'' -- is the standard Chaumian
mix-net technique to ensure anonymity. First, the sender chooses a route
through the mixnet:  Source S $\rightarrow$ Node A $\rightarrow$ Node B
$\rightarrow$ Node C $\rightarrow$ Destination D pseudonym. Second, the
sender signs the message with its private key $pk_S$ and possibly
encrypts via the destination's public key $PK_D$.  Then, the sender
encrypts the message and next-hop information along the mixnet in
reverse:  encrypt with $PK_C$, then encrypt with $PK_B$, and finally
encrypt with $PK_A$. Therefore, only the proper node in the mixnet can
decrypt the top layer of the message, exposing next-hop information and
the proper onion message to relay.  In other words, the encryption
layers are unwound (or peeled like an onion) during mixnet transmission,
before the message is relayed to its destination.  The destination
reply-block or pseudonym specifies a path to an explicit servnet node.
Once the message arrives at the destination servnet node, the node
decrypts the message using $pk_D$ and verifies the sender's signature. 


\subsection{Optimizations}

The communications module performs several efficiency and anonymity
optimizations.  First, haven message queues are built up within the {\it
comm} process, and iteratively dequeued and sent to {\it haven},
protecting against bursty transfers from the communications channel.

Second, broadcasts can be requested by the haven module by merely
calling a {\tt broadcast} primitive, as opposed to querying for a list
of available nodes, extracting each node's route and encrypting the
message according to that route, then performing the {\tt transmit}
operation on each message.  A single broadcast call decreases the
quantity of socket-level requests from $O(n)$ to $O(1)$, where $n$ is
the number of available nodes.  

Third, messages are queued during {\tt transmit} for each node.  One
node is chosen at random (while attempting to ensure semi-fairness among
nodes) to transmit its messages along to anonymous communications
channel.  Messages are concatenated and padded to a specific or random
length to protect against message volume attacks.  The act of enqueueing
and concatenating messages reduces the number of messages to a specific
node, possibly adding some protection against traffic analysis that
seeks sender/receiver linkability.  Furthermore, the random choice of a
node during that time iteration adds some latency to haven outgoing
messages, especially under high load.  This technique may also protect
against some form of traffic analysis, especially in light of
multi-step Free Haven protocols, such as trading or buddy `squawking'.