Commit 41431554 authored by Misbah Mubarak's avatar Misbah Mubarak

(i) more text on how to run dragonfly with network traces (ii) some stat...

(i) more text on how to run dragonfly with network traces (ii) some stat reporting fixes with dragonfly (iii) modifying config file error handling for dragonfly/torus (iv) modified dragonfly reverse handler for out of order packet handling, now it deletes the hash entry that it created in forward mode
parent 83c55775
......@@ -14,9 +14,8 @@ PARAMS
modelnet_order=( "dragonfly" );
# scheduler options
modelnet_scheduler="fcfs";
chunk_size="512";
chunk_size="256";
# modelnet_scheduler="round-robin";
num_vcs="1";
num_routers="4";
local_vc_size="16384";
global_vc_size="32768";
......
......@@ -20,6 +20,5 @@ PARAMS
dim_length="4,4,2";
link_bandwidth="10.0";
buffer_size="8192";
num_vc="1";
chunk_size="512";
chunk_size="256";
}
......@@ -69,11 +69,22 @@ PARAMS
....
}
The first section, MODELNET_GRP specified the number of LPs and the layout of LPs. In the above
case, there are 264 repetitions of 4 server LPs, 4 dragonfly network node LPs and 1 dragonfly
router LP, which makes a total of 264 routers, 1056 network nodes and 1056 servers in the network.
The second section, PARAMS uses 'num_routers' for the dragonfly topology lay out and setsup the
connections between routers, network nodes and the servers.
The first section, MODELNET_GRP specified the number of LPs and the layout of
LPs. In the above case, there are 264 repetitions of 4 server LPs, 4 dragonfly
network node LPs and 1 dragonfly router LP, which makes a total of 264 routers,
1056 network nodes and 1056 servers in the network. The second section, PARAMS
uses 'num_routers' for the dragonfly topology lay out and setsup the
connections between routers, network nodes and the servers. The num_routers in
PARAMS controls the topology layout that should match with the
modelnet_dragonfly and dragonfly_router in the first group (MODELNET_GRP).
According to Kim, Dally's a=2p=2h configuration (as given above in Section 1),
in this case num_routers=8 means that each dragonfly group has 8 routers, 4
terminals and each router has 4 global channels. This makes the total number of
dragonfly groups as g = 8 * 4 + 1 = 33 and number of network nodes as N = 4 * 8
* 33 = 1064. A correct dragonfly config file must specify the same number of
network nodes and routers in the MODELNET_GRP section. If there is a mismatch
between num_routers in PARAMS and MODELNET_GRP values then an error message is
displayed.
Some other dragonfly specific parameters in the PARAMS section are
......@@ -109,7 +120,68 @@ ROSS optimistic mode:
mpirun -np 8 tests/modelnet-test --sync=3 -- tests/conf/modelnet-test-dragonfly.conf
4- Performance optimization tips for ROSS dragonfly model
4- Running dragonfly model with DUMPI application traces
- codes-base needs to be configured with DUMPI. See
codes-base/doc/GETTING_STARTED on how to configure codes-base with DUMPI
5- Performance optimization tips for ROSS dragonfly model
- For large-scale dragonfly runs, the model has significant speedup in optimistic mode than the conservative mode.
- For running large-scale synthetic traffic workloads, see
codes-net/src/models/network-workloads/README_synthetic.txt
- Design forward network traces are available at:
http://portal.nersc.gov/project/CAL/designforward.htm
For illustration purposes, we use the AMG network trace with 27 MPI processes
available for download at:
http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n27_dumpi.tar.gz
- Note on trace reading - the input file prefix to the dumpi workload generator
should be everything up to the rank number. E.g., if the dumpi files are of the
form "dumpi-YYYY.MM.DD.HH.MM.SS-XXXX.bin", then the input should be
"dumpi-YYYY.MM.DD.HH.MM.SS-"
- Example dragonfly model config file with network traces can be found at:
src/models/network-workloads/conf/modelnet-mpi-test-dragonfly.conf
The routing algorithms can be set as adaptive, nonminimal, minimal and
prog-adaptive.
- Running CODES dragonfly model with AMG 27 rank trace in optimistic mode:
mpirun -np 4 ./src/models/network-workloads/model-net-mpi-replay --sync=3
--batch=2 --disable_compute=1 --workload_type="dumpi"
--workload_file=../../df_traces/AMG/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00-
--num_net_traces=27 --
../src/models/network-workloads/conf/modelnet-mpi-test-dragonfly.conf
[batch is ROSS specific parameter that specifies the number of iterations the
simulation must process before checking the top event scheduling loop for
anti-messages. A smaller batch size comes with fewer rollbacks. The GVT
synchronization is done after every batch*gvt-interval epochs (gvt-interval
is 16 by default).
num_net_traces is the number of MPI processes to be simulated from the trace
file. With the torus and dragonfly networks, the number of simulated network
nodes may not exactly match the number of MPI processes. This is because the
simulated network nodes increase in specific increments for e.g. the number
of routers in a dragonfly define the network size and the number of
dimensions, dimension length defines the network nodes in the torus. Due to
this mismatch, we must ensure that the network nodes in the config file are
equal to or greater than the MPI processes to be simulated from the trace.
disable_compute is an optional parameter which if set, will make the
simulation disregard the compute times from the MPI traces. ]
- Running CODES dragonfly model with AMG application trace, 27 ranks in serial
mode:
./src/models/network-workloads/model-net-mpi-replay --sync=1
--workload_type="dumpi"
--workload_file=../../df_traces/AMG/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00-
--num_net_traces=27 --disable_compute=1 --
../src/models/network-workloads/conf/modelnet-mpi-test-dragonfly.conf
......@@ -85,7 +85,7 @@ mpirun -np 4 ./tests/modelnet-test --sync=3 -- tests/conf/modelnet-test-torus.co
- Running CODES torus model with AMG 27 rank trace in optimistic mode:
mpirun -np 4 ./src/models/network-workloads/model-net-mpi-replay --sync=3
--batch=2 --workload_type="dumpi" --num_net_traces=27
--batch=2 --workload_type="dumpi" --num_net_traces=27 --disable_compute=1
--workload_file=../../df_traces/AMG/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00-
-- ../src/models/network-workloads/conf/modelnet-mpi-test-torus.conf
......@@ -102,14 +102,17 @@ mpirun -np 4 ./tests/modelnet-test --sync=3 -- tests/conf/modelnet-test-torus.co
of routers in a dragonfly define the network size and the number of
dimensions, dimension length defines the network nodes in the torus. Due to
this mismatch, we must ensure that the network nodes in the config file are
equal to or greater than the MPI processes to be simulated from the trace.]
equal to or greater than the MPI processes to be simulated from the trace.
disable_compute is an optional parameter which if set, will make the
simulation disregard the compute times from the MPI traces. ]
- Running CODES torus model with AMG application trace, 27 ranks in serial
mode:
./src/models/network-workloads/model-net-mpi-replay --sync=1
--workload_type=dumpi
--workload_type=dumpi --disable_compute=1
--workload_file=../../df_traces/AMG/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00-
--num_net_traces=27 --
../src/models/network-workloads/conf/modelnet-mpi-test-torus.conf
......
......@@ -10,6 +10,7 @@
#include <ross.h>
#define DEBUG_LP 892
#include "codes/jenkins-hash.h"
#include "codes/codes_mapping.h"
#include "codes/codes.h"
......@@ -35,8 +36,8 @@
#define DFLY_HASH_TABLE_SIZE 262144
// debugging parameters
#define TRACK 2
#define TRACK_PKT 45543
#define TRACK -1
#define TRACK_PKT -1
#define TRACK_MSG -1
#define PRINT_ROUTER_TABLE 1
#define DEBUG 0
......@@ -305,7 +306,7 @@ static int dragonfly_rank_hash_compare(
void *key, struct qhash_head *link)
{
struct dfly_hash_key *message_key = (struct dfly_hash_key *)key;
struct dfly_qhash_entry *tmp;
struct dfly_qhash_entry *tmp = NULL;
tmp = qhash_entry(link, struct dfly_qhash_entry, hash_link);
......@@ -317,10 +318,14 @@ static int dragonfly_rank_hash_compare(
}
static int dragonfly_hash_func(void *k, int table_size)
{
struct dfly_hash_key *tmp = (struct dfly_hash_key *)k;
uint32_t pc = 0, pb = 0;
bj_hashlittle2(tmp, sizeof(*tmp), &pc, &pb);
return (int)(pc % (uint32_t)(table_size - 1));
struct dfly_hash_key *tmp = (struct dfly_hash_key *)k;
//uint32_t pc = 0, pb = 0;
//bj_hashlittle2(tmp, sizeof(*tmp), &pc, &pb);
uint64_t key = (~tmp->message_id) + (tmp->message_id << 18);
key = key * 21;
key = ~key ^ (tmp->sender_id >> 4);
key = key * tmp->sender_id;
return (int)(key & (table_size - 1));
}
/* convert GiB/s and bytes to ns */
......@@ -433,9 +438,9 @@ static void dragonfly_read_config(const char * anno, dragonfly_param *params){
// shorthand
dragonfly_param *p = params;
configuration_get_value_int(&config, "PARAMS", "num_routers", anno,
int rc = configuration_get_value_int(&config, "PARAMS", "num_routers", anno,
&p->num_routers);
if(p->num_routers <= 0) {
if(rc) {
p->num_routers = 4;
fprintf(stderr, "Number of dimensions not specified, setting to %d\n",
p->num_routers);
......@@ -443,44 +448,44 @@ static void dragonfly_read_config(const char * anno, dragonfly_param *params){
p->num_vcs = 3;
configuration_get_value_int(&config, "PARAMS", "local_vc_size", anno, &p->local_vc_size);
if(!p->local_vc_size) {
rc = configuration_get_value_int(&config, "PARAMS", "local_vc_size", anno, &p->local_vc_size);
if(rc) {
p->local_vc_size = 1024;
fprintf(stderr, "Buffer size of local channels not specified, setting to %d\n", p->local_vc_size);
}
configuration_get_value_int(&config, "PARAMS", "global_vc_size", anno, &p->global_vc_size);
if(!p->global_vc_size) {
rc = configuration_get_value_int(&config, "PARAMS", "global_vc_size", anno, &p->global_vc_size);
if(rc) {
p->global_vc_size = 2048;
fprintf(stderr, "Buffer size of global channels not specified, setting to %d\n", p->global_vc_size);
}
configuration_get_value_int(&config, "PARAMS", "cn_vc_size", anno, &p->cn_vc_size);
if(!p->cn_vc_size) {
rc = configuration_get_value_int(&config, "PARAMS", "cn_vc_size", anno, &p->cn_vc_size);
if(rc) {
p->cn_vc_size = 1024;
fprintf(stderr, "Buffer size of compute node channels not specified, setting to %d\n", p->cn_vc_size);
}
configuration_get_value_int(&config, "PARAMS", "chunk_size", anno, &p->chunk_size);
if(!p->chunk_size) {
rc = configuration_get_value_int(&config, "PARAMS", "chunk_size", anno, &p->chunk_size);
if(rc) {
p->chunk_size = 512;
fprintf(stderr, "Chunk size for packets is specified, setting to %d\n", p->chunk_size);
}
configuration_get_value_double(&config, "PARAMS", "local_bandwidth", anno, &p->local_bandwidth);
if(!p->local_bandwidth) {
rc = configuration_get_value_double(&config, "PARAMS", "local_bandwidth", anno, &p->local_bandwidth);
if(rc) {
p->local_bandwidth = 5.25;
fprintf(stderr, "Bandwidth of local channels not specified, setting to %lf\n", p->local_bandwidth);
}
configuration_get_value_double(&config, "PARAMS", "global_bandwidth", anno, &p->global_bandwidth);
if(!p->global_bandwidth) {
rc = configuration_get_value_double(&config, "PARAMS", "global_bandwidth", anno, &p->global_bandwidth);
if(rc) {
p->global_bandwidth = 4.7;
fprintf(stderr, "Bandwidth of global channels not specified, setting to %lf\n", p->global_bandwidth);
}
configuration_get_value_double(&config, "PARAMS", "cn_bandwidth", anno, &p->cn_bandwidth);
if(!p->cn_bandwidth) {
rc = configuration_get_value_double(&config, "PARAMS", "cn_bandwidth", anno, &p->cn_bandwidth);
if(rc) {
p->cn_bandwidth = 5.25;
fprintf(stderr, "Bandwidth of compute node channels not specified, setting to %lf\n", p->cn_bandwidth);
}
......@@ -575,9 +580,10 @@ static void dragonfly_report_stats()
{
printf(" Average number of hops traversed %f average chunk latency %lf us maximum chunk latency %lf us avg message size %lf bytes finished messages %ld finished chunks %ld \n", (float)avg_hops/total_finished_chunks, avg_time/(total_finished_chunks*1000), max_time/1000, (float)final_msg_sz/total_finished_msgs, total_finished_msgs, total_finished_chunks);
if(routing == ADAPTIVE || routing == PROG_ADAPTIVE)
printf("\n ADAPTIVE ROUTING STATS: %d chunks routed minimally %d chunks routed non-minimally completed packets %lld ", total_minimal_packets, total_nonmin_packets, total_finished_chunks);
printf("\n ADAPTIVE ROUTING STATS: %d chunks routed minimally %d chunks routed non-minimally completed packets %lld \n",
total_minimal_packets, total_nonmin_packets, total_finished_chunks);
printf("\n Total packets generated %ld finished %ld ", total_gen, total_fin);
printf("\n Total packets generated %ld finished %ld \n", total_gen, total_fin);
}
return;
}
......@@ -1161,11 +1167,11 @@ void packet_send_rc(terminal_state * s, tw_bf * bf, terminal_message * msg,
if(bf->c4) {
s->in_send_loop = 1;
}
/*if(bf->c5)
if(bf->c5)
{
codes_local_latency_reverse(lp);
s->issueIdle = 1;
}*/
}
return;
}
/* sends the packet from the current dragonfly compute node to the attached router */
......@@ -1260,11 +1266,11 @@ void packet_send(terminal_state * s, tw_bf * bf, terminal_message * msg,
bf->c4 = 1;
s->in_send_loop = 0;
}
/*if(s->issueIdle) {
if(s->issueIdle) {
bf->c5 = 1;
s->issueIdle = 0;
model_net_method_idle_event(codes_local_latency(lp), 0, lp);
}*/
}
return;
}
......@@ -1315,15 +1321,13 @@ void packet_arrive_rc(terminal_state * s, tw_bf * bf, terminal_message * msg, tw
if(bf->c7)
{
if(hash_link)
printf("\n Num chunks %d ", tmp->num_chunks);
//assert(!hash_link);
assert(!hash_link);
N_finished_msgs--;
s->finished_msgs--;
total_msg_sz -= msg->total_size;
s->total_msg_size -= msg->total_size;
struct dfly_qhash_entry * d_entry_pop = rc_stack_pop(s->st);
struct dfly_qhash_entry * d_entry_pop = rc_stack_pop(s->st);
qhash_add(s->rank_tbl, &key, &(d_entry_pop->hash_link));
s->rank_tbl_pop++;
......@@ -1337,6 +1341,13 @@ void packet_arrive_rc(terminal_state * s, tw_bf * bf, terminal_message * msg, tw
assert(tmp);
tmp->num_chunks--;
if(bf->c5)
{
assert(hash_link);
qhash_del(hash_link);
free(tmp->remote_event_data);
free(tmp);
}
return;
}
void send_remote_event(terminal_state * s, terminal_message * msg, tw_lp * lp, tw_bf * bf, char * event_data, int remote_event_size)
......@@ -1395,7 +1406,7 @@ void packet_arrive(terminal_state * s, tw_bf * bf, terminal_message * msg,
if(tmp)
{
if(tmp->num_chunks >= total_chunks)
if(tmp->num_chunks >= total_chunks || tmp->num_chunks == 0)
{
tw_output(lp, "\n invalid number of chunks %d for LP %ld ", tmp->num_chunks, lp->gid);
tw_lp_suspend(lp, 0, 0);
......@@ -1503,7 +1514,6 @@ void packet_arrive(terminal_state * s, tw_bf * bf, terminal_message * msg,
assert(tmp);
tmp->num_chunks++;
/* if its the last chunk of the packet then handle the remote event data */
if(msg->chunk_id == num_chunks - 1)
{
......@@ -1531,7 +1541,6 @@ void packet_arrive(terminal_state * s, tw_bf * bf, terminal_message * msg,
* callee*/
assert(tmp->num_chunks <= total_chunks);
if(tmp->num_chunks == total_chunks)
{
bf->c7 = 1;
......
......@@ -282,46 +282,41 @@ static void torus_read_config(
// shorthand
torus_param *p = params;
configuration_get_value_int(&config, "PARAMS", "n_dims", anno, &p->n_dims);
if(!p->n_dims) {
int rc = configuration_get_value_int(&config, "PARAMS", "n_dims", anno, &p->n_dims);
if(rc) {
p->n_dims = 4; /* a 4-D torus */
fprintf(stderr,
"Warning: Number of dimensions not specified, setting to %d\n",
p->n_dims);
}
configuration_get_value_double(&config, "PARAMS", "link_bandwidth", anno,
rc = configuration_get_value_double(&config, "PARAMS", "link_bandwidth", anno,
&p->link_bandwidth);
if(!p->link_bandwidth) {
if(rc) {
p->link_bandwidth = 2.0; /*default bg/q configuration */
fprintf(stderr, "Link bandwidth not specified, setting to %lf\n",
p->link_bandwidth);
}
configuration_get_value_int(&config, "PARAMS", "buffer_size", anno, &p->buffer_size);
if(!p->buffer_size) {
rc = configuration_get_value_int(&config, "PARAMS", "buffer_size", anno, &p->buffer_size);
if(rc) {
p->buffer_size = 2048;
fprintf(stderr, "Buffer size not specified, setting to %d",
p->buffer_size);
}
configuration_get_value_int(&config, "PARAMS", "chunk_size", anno, &p->chunk_size);
if(!p->chunk_size) {
p->chunk_size = 32;
rc = configuration_get_value_int(&config, "PARAMS", "chunk_size", anno, &p->chunk_size);
if(rc) {
p->chunk_size = 128;
fprintf(stderr, "Warning: Chunk size not specified, setting to %d\n",
p->chunk_size);
}
configuration_get_value_int(&config, "PARAMS", "num_vc", anno, &p->num_vc);
if(!p->num_vc) {
/* by default, we have one for taking packets,
* another for taking credit*/
p->num_vc = 1;
fprintf(stderr, "Warning: num_vc not specified, setting to %d\n",
p->num_vc);
}
int rc = configuration_get_value(&config, "PARAMS", "dim_length", anno,
rc = configuration_get_value(&config, "PARAMS", "dim_length", anno,
dim_length_str, MAX_NAME_LENGTH);
if (rc == 0){
tw_error(TW_LOC, "couldn't read PARAMS:dim_length");
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment