Commit 592f7e82 authored by Misbah Mubarak's avatar Misbah Mubarak

Updating instructions for running DUMPI network traces on CODES

parent 9e8d29e4
......@@ -4,10 +4,12 @@
node. This traffic pattern is uniformly distributed throughout the
network and gives a better performance with minimal routing as compared
to non-minimal or adaptive routing.
- Nearest group traffic: with minimal routing, it sends traffic to the single global channel connecting two groups (it congests the network when using minimal routing).
This pattern performs better with non-minimal and adaptive routing
algorithms.
- Nearest neighbor traffic: it sends traffic to the next node, potentially connected to the same router.
- Nearest group traffic: with minimal routing, it sends traffic to the
single global channel connecting two groups (it congests the network when
using minimal routing). This pattern performs better with non-minimal
and adaptive routing algorithms.
- Nearest neighbor traffic: it sends traffic to the next node, potentially
connected to the same router.
SAMPLING:
- The modelnet_enable_sampling function takes a sampling interval "t" and
......@@ -48,7 +50,8 @@ cause congestion in the network).
num_msgs: number of messages generated per terminal. Each message has a size of
2048 bytes. By default, 20 messages per terminal are generated.
traffic: 1 for uniform random traffic, 2 for nearest group traffic and 3 for nearest neighbor traffic.
traffic: 1 for uniform random traffic, 2 for nearest group traffic and 3 for
nearest neighbor traffic.
lp-io-dir: generates network traffic information on dragonfly terminals and
routers. Here is information on individual files:
......
......@@ -6,10 +6,10 @@
../configure --enable-test --disable-shared --prefix=/home/mubarm/dumpi/dumpi/install CC=mpicc CXX=mpicxx
2- Configure codes-base with DUMPI. Make sure the CC environment variable
2- Configure codes with DUMPI. Make sure the CC environment variable
refers to a MPI compiler
./configure --with-ross=/path/to/ross/install --with-dumpi=/path/to/dumpi/install
./configure PKG_CONFIG_PATH=$PATH --with-dumpi=/path/to/dumpi/install
--prefix=/path/to/codes-base/install CC=mpicc
3- Build codes-base (See codes-base INSTALL for instructions on building codes-base with dumpi)
......@@ -22,53 +22,42 @@
http://portal.nersc.gov/project/CAL/designforward.htm
----------------- RUNNING CODES NETWORK WORKLOAD TEST PROGRAM -----------------------
6- Download and untar the DUMPI AMG application trace for 27 MPI ranks using the following download link:
----------------- RUNNING CODES MPI SIMULATION LAYER -----------------------
6- Download and untar the DUMPI AMG application trace for 1728 MPI ranks using the following download link:
wget http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n27_dumpi.tar.gz
wget http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n1728_dumpi.tar.gz
7- Run the test program for codes-nw-workload using.
8- Configure model-net config file (For this example config file is available at
src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf)
mpirun -np 4 ./src/models/mpi-trace-replay/model-net-dumpi-traces-dump --sync=3 --workload_type=dumpi --workload_file=/home/mubarm/df_traces/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00- -- ../src/models/mpi-trace-replay/conf/modelnet-mpi-test.conf
The program shows the number of sends, receives, collectives and wait operations in the DUMPI trace log.
Note: If using a different DUMPI trace file, make sure to update the modelnet-mpi-test.conf file in the config directory.
----------------- RUNNING MODEL-NET WITH CODES NW WORKLOADS -----------------------------
8- Configure model-net using its config file (Example .conf files available at src/models/mpi-trace-replay/)
Make sure the number of nw-lp and model-net LP are the same in the config file.
9- Run the DUMPI trace replay simulation on top of model-net using:
(/dumpi-2014-04-05.22.12.17.37- is the prefix of the DUMPI trace file.
We skip the last 4 digit prefix of the DUMPI trace files).
9- From the main source directory of codes-net, run the DUMPI trace replay simulation on top of
model-net using (/dumpi-2014-04-05.22.12.17.37- is the prefix of the all DUMPI trace files.
We skip the last 4 digit prefix of the DUMPI trace file names).
./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=1 --workload_file=/path/to/dumpi/trace/directory/dumpi-2014-04-05.22.12.17.37- - --workload_type="dumpi" -- src/models/mpi-trace-replay/conf/modelnet-mpi-test.conf
./src/network-workloads//model-net-mpi-replay --sync=1
--num_net_traces=1728 --workload_file=/path/to/dumpi/trace/directory/dumpi-2014.03.03.15.09.03-
--workload_type="dumpi" --lp-io-dir=amg-1728-trace --lp-io-use-suffix=1
-- ../src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf
The simulation runs in ROSS serial, conservative and optimistic modes.
10- Some example runs with small-scale traces
(i) AMG 8 MPI tasks http://portal.nersc.gov/project/CAL/designforward.htm#AMG
** Torus network model
mpirun -np 4 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=962144 --workload_file=/home/mubarm/dumpi/df_AMG_n27_dumpi/dumpi-2014.03.03.14.12.46- --workload_type="dumpi" --batch=2 --gvt-interval=2 --num_net_traces=27 -- tests/conf/modelnet-mpi-test-torus.conf
** Simplenet network model
Note: Dragonfly and torus networks may have more number of nodes in the network than the number network traces (Some network nodes will only pass messages and they will not end up loading the traces). Thats why --num_net_traces argument is used to specify exact number of traces available in the DUMPI directory if there is a mis-match between number of network nodes and traces.
mpirun -np 8 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --workload_file=/home/mubarm/dumpi/df_AMG_n27_dumpi/dumpi-2014.03.03.14.12.46- --workload_type="dumpi" --batch=2 --gvt-interval=2 -- tests/conf/modelnet-mpi-test.conf
10- Running the simulation in optimistic mode
mpirun -np 4 ./src/network-workloads//model-net-mpi-replay
--batch=32 --gvt-interval=128 --sync=3
--num_net_traces=13824 --workload_type=dumpi --lp-io-dir=amg_1728-trace
--lp-io-use-suffix=1
--workload_file=/projects/radix-io/mubarak/df_traces/directory/dumpi-2014.03.03.15.09.03-
-- src/network-workloads//conf/modelnet-mpi-test-dfly-amg-1728.conf
** Dragonfly network model
mpirun -np 8 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=2962144 --workload_file=/home/mubarm/dumpi/df_AMG_n27_dumpi/dumpi-2014.03.03.14.12.46- --workload_type="dumpi" --batch=2 --gvt-interval=2 --num_net_traces=27 -- src/models/mpi-trace-replay//conf/modelnet-mpi-test-dragonfly.conf
Note: Dragonfly and torus networks may have more number of nodes in the network than the number network traces (Some network nodes will only pass messages and they will not end up loading the traces). Thats why --num_net_traces argument is used to specify exact number of traces available in the DUMPI directory if there is a mis-match between number of network nodes and traces.
---------------- Running Test Program (needs update) --------------------------
11- Run the test program for codes-nw-workload using.
(ii) Crystal router 10 MPI tasks http://portal.nersc.gov/project/CAL/designforward.htm#CrystalRouter
mpirun -np 4 ./src/models/mpi-trace-replay/model-net-dumpi-traces-dump --sync=3 --workload_type=dumpi --workload_file=/home/mubarm/df_traces/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00- -- ../src/models/mpi-trace-replay/conf/modelnet-mpi-test.conf
** Simple-net network model
mpirun -np 10 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=185536 --workload_file=/home/mubarm/dumpi/cry_router/dumpi--2014.04.23.12.08.27- --workload_type="dumpi" -- src/models/mpi-trace-replay/conf/modelnet-mpi-test-cry-router.conf
The program shows the number of sends, receives, collectives and wait operations in the DUMPI trace log.
(iii) MiniFE 18 MPI tasks http://portal.nersc.gov/project/CAL/designforward.htm#MiniFE
Note: If using a different DUMPI trace file, make sure to update the modelnet-mpi-test.conf file in the config directory.
** Simple-net network model
mpirun -np 18 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=6185536 --workload_file=/home/mubarm/dumpi/dumpi_data_18/dumpi-2014.04.22.12.17.37- --workload_type="dumpi" -- src/models/mpi-trace-replay/conf/modelnet-mpi-test-mini-fe.conf
......@@ -24,5 +24,5 @@ PARAMS
global_bandwidth="4.7";
cn_bandwidth="5.25";
message_size="560";
routing="minimal";
routing="adaptive";
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment