Commit bfb283bf authored by Misbah Mubarak's avatar Misbah Mubarak

(1) adding sample codes example and (2) updating codes best practices document...

(1) adding sample codes example and (2) updating codes best practices document with the CODES example
parent f39b6eb2
......@@ -239,6 +239,17 @@ model. This can help simplify reverse computation by breaking complex
operations into smaller, easier to understand (and reverse) event units with
deterministic ordering.
Adding reference to storage server example:
In the simple storage server example following this section, there are multiple
LP types i.e. a storage server LP and a Network LP. The storage server LP initiates
data transmission and reception to/from neighboring storage server LP, it also keeps
track of the amount of data sent/received in bytes. The job of data transmission
is delegated to the network LP which simply transports the data to destination storage
server LP. The network LP is unaware of the total amount of data sent by a particular
server. At the same time, the storage server LP is unaware of the networking protocol
used by the network LP for transporting the messages.
TODO: reference example, for now see how the LPs are organized in Triton
model.
......@@ -315,6 +326,8 @@ statistics, but for scalability and data management reasons those results
should be aggregated into a single file rather than producing a separate
file per LP.
TODO: look at ross/IO code and determine how it relates to this.
\subsection{codes-workload generator}
TODO: fill this in. codes-workload is an abstraction layer for feeding I/O
......@@ -333,6 +346,12 @@ global end timestamp for ROSS. The assumption is that CODES models will
normally run to a completion condition rather than until simulation time
runs out, see later section for more information on this approach.
\subsection{ross/IO}
TODO: fill this in. This is the I/O library included with ROSS, based on
phasta I/O library. What are the use cases and how do you use it. Does it
deprecate the lp-io tool?
\section{CODES: reproducability and model safety}
TODO: fill this in. These are things that aren't required for modularity,
......@@ -406,6 +425,7 @@ application workloads), when a fault has been repaired, etc. Talk about how
to handle this cleanly.
\subsection{Kicking off a simulation}
\label{sec_kickoff}
TOOD: fill this in. Each LP needs to send an event to itself at the
beginning of the simulation (explain why). We usually skew these with
......@@ -458,6 +478,229 @@ section(s).
\end{enumerate}
\section{CODES Example Model}
Outline
- The Basics:
\subsection{The Basics}
This is a simple CODES example to demonstrate the concepts described above. In
the example scenario, we have a certain number of storage servers where each
server has a network interface card (NIC) associated with it. The servers exchange
messages with their neighboring servers via their NIC card. When the neighboring
server receives the message, it sends another message to the sending server in
response. This process continues until a specific threshold is reached.
Paragraph 2: John, can you please complete the following paragraph? I've added
some notes.
Note: In this paragraph, we start off with technical aspects of the simulation.
Here we describe the LP types i.e. server LPs and NIC LPs. We then talk about
the local completion message and the remote completion messages sent by server
LPs. Local completion message is sent to the sending server when the message
leaves the sender NIC (Posting a message send at a server does not necessarily
mean that the message has been sent thats why we have a local completion
message). Remote message is the data sent to the destination server LP from
the sending LP. In this example, the remote message triggers another request
message from the receiving server LP (Refer to Listings \ref{snippet1}).
\begin{figure}
\begin{lstlisting}[caption=Server state and event message struct, label=snippet1]
struct svr_state
{
int msg_sent_count; /* requests sent */
int msg_recvd_count; /* requests recvd */
int local_recvd_count; /* number of local messages received */
tw_stime start_ts; /* time that we started sending requests */
};
struct svr_msg
{
enum svr_event svr_event_type;
tw_lpid src; /* source of this request or ack */
int incremented_flag; /* helper for reverse computation */
};
\end{lstlisting}
\end{figure}
\subsection{CODES mapping}
The CODES mapping API transparently maps LP types to MPI ranks (Aka ROSS PE's). The
LP type and count can be specified through a configuration file. In this section, we
explain the configuration file of the CODES simple example followed by how the
CODES mapping API can be used in the example program. Figure \ref{snippet2}
shows the LP mapping config file for the CODES example. Multiple LP types are
specified in a single LP group (there can also be multiple LP groups in a config file).
There is 1 server LP and 1 modelnet\_simplenet LP type in a group and this
combination is repeated 16 time (repetitions="16"). ROSS will assign the LPs to
the PEs (PEs is an abstraction for MPI rank in ROSS) by placing 1 server LP
then 1 modelnet\_simplenet LP a total of 16 times. This configuration is useful
if there is heavy communication involved between the server and
modelnet\_simplenet LP types, in which case ROSS will place them on the same PE
so that the communication between server and modelnet\_simplenet LPs will not involve remote
messages.
\begin{figure}
\begin{lstlisting}[caption=example configuration file for CODES LP mapping, label=snippet2]
LPGROUPS
{
MODELNET_GRP
{
repetitions="16";
server="1";
modelnet_simplenet="1";
}
}
PARAMS
{
packet_size="512";
message_size="256";
modelnet="simplenet";
net_startup_ns="1.5";
net_bw_mbps="20000";
}
\end{lstlisting}
\end{figure}
After the initialization function calls of ROSS (tw\_init), the configuration
file can be loaded in the example program using the calls in Figure
\ref{snippet3}. Each LP type must register itself using \'lp\_type\_register\'
before setting up the codes-mapping. Figure \ref{snippet4} shows an example of how
the server LP registers itself.
\begin{figure}
\begin{lstlisting}[caption=CODES mapping function calls in example program, label=snippet3]
int main(int argc, char **argv)
{
.....
/* ROSS initialization function calls */
tw_opt_add(app_opt);
tw_init(&argc, &argv);
/* loading the config file of codes-mapping */
configuration_load(argv[2], MPI_COMM_WORLD, &config);
/* Setup the model-net parameters specified in the config file */
net_id=model_net_set_params();
/* register the server LP type (model-net LP type is registered internally in model_net_set_params() */
svr_add_lp_type();
/* Now setup codes mapping */
codes_mapping_setup();
/* query codes mapping API */
num_servers = codes_mapping_get_group_reps("MODELNET_GRP") * codes_mapping_get_lp_count("MODELNET_GRP", "server");
.....
}
\end{lstlisting}
\end{figure}
\begin{figure}
\begin{lstlisting}[caption=Registering an LP type, label=snippet4]
static void svr_add_lp_type()
{
lp_type_register("server", svr_get_lp_type());
}
\end{lstlisting}
\end{figure}
The CODES mapping API provides ways to query information like number of LPs of
a particular LP types, group to which a LP type belongs, repetitions in the
group (For details see codes-base/codes/codes-mapping.h file). Figure
\ref{snippet3} shows how to setup CODES mapping API with our CODES example and
computes basic information by querying the number of servers in a particular
group.
\subsection{Event Handlers}
In this example, we have two LP types i.e. a server LP and a model-net LP.
Since the servers only send and receive messages to each other, the server LP state
maintains a count of the number of remote messages it has sent and received as
well as the number of local completion messages.
For the server event message, we have four message types KICKOFF, REQ, ACK and
LOCAL. With a KICKOFF event, each LP sends a message to itself (the simulation
begins from here). To avoid event ties, we add a small noise using the random
number generator (See Section \ref{sec_kickoff}). The server LP state data structure
and server message data structures are given in Figure \ref{snippet5}. The \`REQ\'
message is sent by a server to its neighboring server and when received,
neighboring server sends back a message of type \`ACK\'.
TODO: Add magic numbers in the example file to demonstrate the magic number best
practice.
\begin{figure}
\begin{lstlisting}[caption=Event handler of the server LP type., label=snippet5]
static void svr_event(svr_state * ns, tw_bf * b, svr_msg * m, tw_lp * lp)
{
switch (m->svr_event_type)
{
case REQ:
...
case ACK:
...
case KICKOFF:
...
case LOCAL:
...
default:
printf("\n Invalid message type %d ", m->svr_event_type);
assert(0);
break;
}
}
\end{lstlisting}
\end{figure}
\subsection{Model-net API}
Model-net is an abstraction layer that allow models to send messages
across components using different network transports. This is a
consistent API that can send messages across either torus, dragonfly, or
simplenet network models without changing the higher level model code.
In the CODES example, we use simple-net as the underlying plug-in for
model-net. The simple-net parameters are specified by the user in the config
file (See Figure \ref{snippet2}). A call to `model\_net\_set\_params' sets up
the model\-net parameters as given in the config file.
model\_net assumes that the caller already knows what LP it wants to deliver the
message to and how large the simulated message is. It carries two types of
events (1) a remote event to be delivered to a higher level model LP (In the
example, the model-net LPs carry the remote event to the server LPs) and (2) a
local event to be delivered to the caller once the message has been transmitted
from the node (In the example, a local completion message is delivered to the
server LP once the model-net LP sends the message). Figure \ref{snippet6} shows
how the server LP sends messages to the neighboring server using the model\-net
LP.
\begin{figure}
\begin{lstlisting}[caption=Example code snippet showing data transfer through model-net API, label=snippet6]
static void handle_kickoff_event(svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
......
/* record when transfers started on this server */
ns->start_ts = tw_now(lp);
/* each server sends a request to the next highest server */
int dest_id = (lp->gid + offset)%(num_servers*2 + num_routers);
/* model-net needs to know about (1) higher-level destination LP which is a neighboring server in this case
* (2) struct and size of remote message and (3) struct and size of local message (a local message can be null) */
model_net_event(net_id, "test", dest_id, PAYLOAD_SZ, sizeof(svr_msg), (const void*)m_remote, sizeof(svr_msg), (const void*)m_local, lp);
ns->msg_sent_count++;
.....
}
\end{lstlisting}
\end{figure}
\subsection{Reverse computation}
John: Can you please fill in the details about the reverse handlers?
\section{TODO}
\begin{itemize}
......
all: check example.o example
CFLAGS= -Wall
CC=mpicc
check: check-env
@echo CODESBASE: $(CODESBASE)
@echo CODESNET: $(CODESNET)
@echo ROSS: $(ROSS)
example.o: example.c
$(CC) -c $(CFLAGS) -I$(ROSS)/include -I$(CODESBASE)/include -I$(CODESNET)/include example.c
example: example.o
$(CC) $(CFLAGS) example.o -o example -L$(ROSS)/lib -lROSS -lm -L$(CODESBASE)/lib -lcodes-base -L$(CODESNET)/lib -lcodes-net
check-env:
ifndef CODESBASE
$(error CODESBASE is undefined, see README.txt)
endif
ifndef CODESNET
$(error CODESNET is undefined, see README.txt)
endif
ifndef ROSS
$(error ROSS is undefined, see README.txt)
endif
/*
* Copyright (C) 2013 University of Chicago.
* See COPYRIGHT notice in top-level directory.
*
*/
/* SUMMARY:
*
* This is a sample code to demonstrate CODES usage and best practices. It
* sets up a number of servers, each of which is paired up with a simplenet LP
* to serve as the NIC. Each server exchanges a sequence of requests and acks
* with one peer and measures the throughput in terms of payload bytes (ack
* size) moved per second.
*/
#include <string.h>
#include <assert.h>
#include <ross.h>
#include "codes/lp-io.h"
#include "codes/codes.h"
#include "codes/codes_mapping.h"
#include "codes/configuration.h"
#include "codes/model-net.h"
#include "codes/lp-type-lookup.h"
#define NUM_REQS 500 /* number of requests sent by each server */
#define PAYLOAD_SZ 2048 /* size of simulated data payload, bytes */
/* model-net ID, can be either simple-net, dragonfly or torus */
static int net_id = 0;
static int num_servers = 0;
static int offset = 2;
typedef struct svr_msg svr_msg;
typedef struct svr_state svr_state;
/* types of events that will constitute triton requests */
enum svr_event
{
KICKOFF, /* initial event */
REQ, /* request event */
ACK, /* ack event */
LOCAL /* local event */
};
struct svr_state
{
int msg_sent_count; /* requests sent */
int msg_recvd_count; /* requests recvd */
int local_recvd_count; /* number of local messages received */
tw_stime start_ts; /* time that we started sending requests */
};
struct svr_msg
{
enum svr_event svr_event_type;
tw_lpid src; /* source of this request or ack */
int incremented_flag; /* helper for reverse computation */
};
static void svr_init(
svr_state * ns,
tw_lp * lp);
static void svr_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void svr_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void svr_finalize(
svr_state * ns,
tw_lp * lp);
tw_lptype svr_lp = {
(init_f) svr_init,
(event_f) svr_event,
(revent_f) svr_rev_event,
(final_f) svr_finalize,
(map_f) codes_mapping,
sizeof(svr_state),
};
extern const tw_lptype* svr_get_lp_type();
static void svr_add_lp_type();
static tw_stime ns_to_s(tw_stime ns);
static tw_stime s_to_ns(tw_stime ns);
static void handle_kickoff_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_ack_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_req_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_local_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_local_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_kickoff_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_ack_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
static void handle_req_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp);
const tw_optdef app_opt [] =
{
TWOPT_GROUP("Model net test case" ),
TWOPT_END()
};
int main(
int argc,
char **argv)
{
int nprocs;
int rank;
g_tw_ts_end = s_to_ns(60*60*24*365); /* one year, in nsecs */
/* ROSS initialization function calls */
tw_opt_add(app_opt);
tw_init(&argc, &argv);
if(argc < 2)
{
printf("\n Usage: mpirun <args> --sync=2/3 mapping_file_name.conf (optional --nkp) ");
MPI_Finalize();
return 0;
}
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
/* loading the config file of codes-mapping */
configuration_load(argv[2], MPI_COMM_WORLD, &config);
/* Setup the model-net parameters specified in the config file */
net_id=model_net_set_params();
/* register the server LP type (model-net LP type is registered internally in model_net_set_params() */
svr_add_lp_type();
/*Now setup codes mapping */
codes_mapping_setup();
/*query codes mapping API*/
num_servers = codes_mapping_get_group_reps("MODELNET_GRP") * codes_mapping_get_lp_count("MODELNET_GRP", "server");
if(net_id != SIMPLENET)
{
printf("\n The test works with simple-net configuration only! ");
MPI_Finalize();
return 0;
}
tw_run();
model_net_report_stats(net_id);
tw_end();
return 0;
}
const tw_lptype* svr_get_lp_type()
{
return(&svr_lp);
}
static void svr_add_lp_type()
{
lp_type_register("server", svr_get_lp_type());
}
static void svr_init(
svr_state * ns,
tw_lp * lp)
{
tw_event *e;
svr_msg *m;
tw_stime kickoff_time;
memset(ns, 0, sizeof(*ns));
/* each server sends a dummy event to itself that will kick off the real
* simulation
*/
/* skew each kickoff event slightly to help avoid event ties later on */
kickoff_time = g_tw_lookahead + tw_rand_unif(lp->rng);
e = codes_event_new(lp->gid, kickoff_time, lp);
m = tw_event_data(e);
m->svr_event_type = KICKOFF;
tw_event_send(e);
return;
}
static void svr_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
switch (m->svr_event_type)
{
case REQ:
handle_req_event(ns, b, m, lp);
break;
case ACK:
handle_ack_event(ns, b, m, lp);
break;
case KICKOFF:
handle_kickoff_event(ns, b, m, lp);
break;
case LOCAL:
handle_local_event(ns, b, m, lp);
break;
default:
printf("\n Invalid message type %d ", m->svr_event_type);
assert(0);
break;
}
}
static void svr_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
switch (m->svr_event_type)
{
case REQ:
handle_req_rev_event(ns, b, m, lp);
break;
case ACK:
handle_ack_rev_event(ns, b, m, lp);
break;
case KICKOFF:
handle_kickoff_rev_event(ns, b, m, lp);
break;
case LOCAL:
handle_local_rev_event(ns, b, m, lp);
break;
default:
assert(0);
break;
}
return;
}
static void svr_finalize(
svr_state * ns,
tw_lp * lp)
{
printf("server %llu recvd %d bytes in %f seconds, %f MiB/s sent_count %d recvd_count %d local_count %d \n", (unsigned long long)lp->gid, PAYLOAD_SZ*ns->msg_recvd_count, ns_to_s((tw_now(lp)-ns->start_ts)),
((double)(PAYLOAD_SZ*NUM_REQS)/(double)(1024*1024)/ns_to_s(tw_now(lp)-ns->start_ts)), ns->msg_sent_count, ns->msg_recvd_count, ns->local_recvd_count);
return;
}
/* convert ns to seconds */
static tw_stime ns_to_s(tw_stime ns)
{
return(ns / (1000.0 * 1000.0 * 1000.0));
}
/* convert seconds to ns */
static tw_stime s_to_ns(tw_stime ns)
{
return(ns * (1000.0 * 1000.0 * 1000.0));
}
/* handle initial event */
static void handle_kickoff_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
svr_msg * m_local = malloc(sizeof(svr_msg));
svr_msg * m_remote = malloc(sizeof(svr_msg));
/* we allocate a local message and a remote message both */
m_local->svr_event_type = LOCAL;
m_local->src = lp->gid;
memcpy(m_remote, m_local, sizeof(svr_msg));
m_remote->svr_event_type = REQ;
/* record when transfers started on this server */
ns->start_ts = tw_now(lp);
/* each server sends a request to the next highest server */
int dest_id = (lp->gid + offset)%(num_servers*2 + num_routers);
/*model-net needs to know about (1) higher-level destination LP which is a neighboring server in this case
* (2) struct and size of remote message and (3) struct and size of local message (a local message can be null) */
model_net_event(net_id, "test", dest_id, PAYLOAD_SZ, sizeof(svr_msg), (const void*)m_remote, sizeof(svr_msg), (const void*)m_local, lp);
ns->msg_sent_count++;
}
static void handle_local_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
ns->local_recvd_count++;
}
static void handle_local_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
ns->local_recvd_count--;
}
/* reverse handler for req event */
static void handle_req_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
ns->msg_recvd_count--;
model_net_event_rc(net_id, lp, PAYLOAD_SZ);
return;
}
/* reverse handler for kickoff */
static void handle_kickoff_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
ns->msg_sent_count--;
model_net_event_rc(net_id, lp, PAYLOAD_SZ);
return;
}
/* reverse handler for ack*/
static void handle_ack_rev_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
if(m->incremented_flag)
{
model_net_event_rc(net_id, lp, PAYLOAD_SZ);
ns->msg_sent_count--;
}
return;
}
/* handle recving ack */
static void handle_ack_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
svr_msg * m_local = malloc(sizeof(svr_msg));
svr_msg * m_remote = malloc(sizeof(svr_msg));
m_local->svr_event_type = LOCAL;
m_local->src = lp->gid;
memcpy(m_remote, m_local, sizeof(svr_msg));
m_remote->svr_event_type = REQ;
/* safety check that this request got to the right server */
assert(m->src == (lp->gid + offset)%(num_servers*2));
if(ns->msg_sent_count < NUM_REQS)
{
/* send another request */
model_net_event(net_id, "test", m->src, PAYLOAD_SZ, sizeof(svr_msg), (const void*)m_remote, sizeof(svr_msg), (const void*)m_local, lp);
ns->msg_sent_count++;
m->incremented_flag = 1;
}
else
{
/* threshold count reached, stop sending messages */
m->incremented_flag = 0;
}
return;
}
/* handle receiving request */
static void handle_req_event(
svr_state * ns,
tw_bf * b,
svr_msg * m,
tw_lp * lp)
{
svr_msg * m_local = malloc(sizeof(svr_msg));
svr_msg * m_remote = malloc(sizeof(svr_msg));
m_local->svr_event_type = LOCAL;
m_local->src = lp->gid;
memcpy(m_remote, m_local, sizeof(svr_msg));
m_remote->svr_event_type = ACK;
/* safety check that this request got to the right server */