diff --git a/src/network-workloads/README_synthetic.txt b/src/network-workloads/README_synthetic.txt index 0b9d5426c2e8860fc32ec846e598288a4dd918be..eacec5e534b979ace3095d73ef3df97823f59d8a 100644 --- a/src/network-workloads/README_synthetic.txt +++ b/src/network-workloads/README_synthetic.txt @@ -4,10 +4,12 @@ node. This traffic pattern is uniformly distributed throughout the network and gives a better performance with minimal routing as compared to non-minimal or adaptive routing. - - Nearest group traffic: with minimal routing, it sends traffic to the single global channel connecting two groups (it congests the network when using minimal routing). - This pattern performs better with non-minimal and adaptive routing - algorithms. - - Nearest neighbor traffic: it sends traffic to the next node, potentially connected to the same router. + - Nearest group traffic: with minimal routing, it sends traffic to the + single global channel connecting two groups (it congests the network when + using minimal routing). This pattern performs better with non-minimal + and adaptive routing algorithms. + - Nearest neighbor traffic: it sends traffic to the next node, potentially + connected to the same router. SAMPLING: - The modelnet_enable_sampling function takes a sampling interval "t" and @@ -48,7 +50,8 @@ cause congestion in the network). num_msgs: number of messages generated per terminal. Each message has a size of 2048 bytes. By default, 20 messages per terminal are generated. -traffic: 1 for uniform random traffic, 2 for nearest group traffic and 3 for nearest neighbor traffic. +traffic: 1 for uniform random traffic, 2 for nearest group traffic and 3 for +nearest neighbor traffic. lp-io-dir: generates network traffic information on dragonfly terminals and routers. Here is information on individual files: diff --git a/src/network-workloads/README_traces.txt b/src/network-workloads/README_traces.txt index 9b0a40f845b58bea7853101bb03da2534ca1936b..41d1c80cce324f3fb237406410cfbbb87fdeebdc 100644 --- a/src/network-workloads/README_traces.txt +++ b/src/network-workloads/README_traces.txt @@ -6,10 +6,10 @@ ../configure --enable-test --disable-shared --prefix=/home/mubarm/dumpi/dumpi/install CC=mpicc CXX=mpicxx -2- Configure codes-base with DUMPI. Make sure the CC environment variable +2- Configure codes with DUMPI. Make sure the CC environment variable refers to a MPI compiler - ./configure --with-ross=/path/to/ross/install --with-dumpi=/path/to/dumpi/install + ./configure PKG_CONFIG_PATH=$PATH --with-dumpi=/path/to/dumpi/install --prefix=/path/to/codes-base/install CC=mpicc 3- Build codes-base (See codes-base INSTALL for instructions on building codes-base with dumpi) @@ -22,53 +22,42 @@ http://portal.nersc.gov/project/CAL/designforward.htm ------------------ RUNNING CODES NETWORK WORKLOAD TEST PROGRAM ----------------------- -6- Download and untar the DUMPI AMG application trace for 27 MPI ranks using the following download link: +----------------- RUNNING CODES MPI SIMULATION LAYER ----------------------- +6- Download and untar the DUMPI AMG application trace for 1728 MPI ranks using the following download link: -wget http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n27_dumpi.tar.gz +wget http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n1728_dumpi.tar.gz -7- Run the test program for codes-nw-workload using. +8- Configure model-net config file (For this example config file is available at +src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf) -mpirun -np 4 ./src/models/mpi-trace-replay/model-net-dumpi-traces-dump --sync=3 --workload_type=dumpi --workload_file=/home/mubarm/df_traces/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00- -- ../src/models/mpi-trace-replay/conf/modelnet-mpi-test.conf - -The program shows the number of sends, receives, collectives and wait operations in the DUMPI trace log. - -Note: If using a different DUMPI trace file, make sure to update the modelnet-mpi-test.conf file in the config directory. - ------------------ RUNNING MODEL-NET WITH CODES NW WORKLOADS ----------------------------- -8- Configure model-net using its config file (Example .conf files available at src/models/mpi-trace-replay/) - Make sure the number of nw-lp and model-net LP are the same in the config file. +9- Run the DUMPI trace replay simulation on top of model-net using: + (/dumpi-2014-04-05.22.12.17.37- is the prefix of the DUMPI trace file. + We skip the last 4 digit prefix of the DUMPI trace files). -9- From the main source directory of codes-net, run the DUMPI trace replay simulation on top of - model-net using (/dumpi-2014-04-05.22.12.17.37- is the prefix of the all DUMPI trace files. - We skip the last 4 digit prefix of the DUMPI trace file names). - - ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=1 --workload_file=/path/to/dumpi/trace/directory/dumpi-2014-04-05.22.12.17.37- - --workload_type="dumpi" -- src/models/mpi-trace-replay/conf/modelnet-mpi-test.conf + ./src/network-workloads//model-net-mpi-replay --sync=1 + --num_net_traces=1728 --workload_file=/path/to/dumpi/trace/directory/dumpi-2014.03.03.15.09.03- + --workload_type="dumpi" --lp-io-dir=amg-1728-trace --lp-io-use-suffix=1 + -- ../src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf The simulation runs in ROSS serial, conservative and optimistic modes. -10- Some example runs with small-scale traces - -(i) AMG 8 MPI tasks http://portal.nersc.gov/project/CAL/designforward.htm#AMG - - ** Torus network model - mpirun -np 4 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=962144 --workload_file=/home/mubarm/dumpi/df_AMG_n27_dumpi/dumpi-2014.03.03.14.12.46- --workload_type="dumpi" --batch=2 --gvt-interval=2 --num_net_traces=27 -- tests/conf/modelnet-mpi-test-torus.conf - - ** Simplenet network model + Note: Dragonfly and torus networks may have more number of nodes in the network than the number network traces (Some network nodes will only pass messages and they will not end up loading the traces). Thats why --num_net_traces argument is used to specify exact number of traces available in the DUMPI directory if there is a mis-match between number of network nodes and traces. - mpirun -np 8 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --workload_file=/home/mubarm/dumpi/df_AMG_n27_dumpi/dumpi-2014.03.03.14.12.46- --workload_type="dumpi" --batch=2 --gvt-interval=2 -- tests/conf/modelnet-mpi-test.conf +10- Running the simulation in optimistic mode + + mpirun -np 4 ./src/network-workloads//model-net-mpi-replay + --batch=32 --gvt-interval=128 --sync=3 + --num_net_traces=13824 --workload_type=dumpi --lp-io-dir=amg_1728-trace + --lp-io-use-suffix=1 + --workload_file=/projects/radix-io/mubarak/df_traces/directory/dumpi-2014.03.03.15.09.03- + -- src/network-workloads//conf/modelnet-mpi-test-dfly-amg-1728.conf - ** Dragonfly network model - mpirun -np 8 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=2962144 --workload_file=/home/mubarm/dumpi/df_AMG_n27_dumpi/dumpi-2014.03.03.14.12.46- --workload_type="dumpi" --batch=2 --gvt-interval=2 --num_net_traces=27 -- src/models/mpi-trace-replay//conf/modelnet-mpi-test-dragonfly.conf - - Note: Dragonfly and torus networks may have more number of nodes in the network than the number network traces (Some network nodes will only pass messages and they will not end up loading the traces). Thats why --num_net_traces argument is used to specify exact number of traces available in the DUMPI directory if there is a mis-match between number of network nodes and traces. +---------------- Running Test Program (needs update) -------------------------- +11- Run the test program for codes-nw-workload using. -(ii) Crystal router 10 MPI tasks http://portal.nersc.gov/project/CAL/designforward.htm#CrystalRouter +mpirun -np 4 ./src/models/mpi-trace-replay/model-net-dumpi-traces-dump --sync=3 --workload_type=dumpi --workload_file=/home/mubarm/df_traces/df_AMG_n27_dumpi/dumpi-2014.03.03.14.55.00- -- ../src/models/mpi-trace-replay/conf/modelnet-mpi-test.conf - ** Simple-net network model - mpirun -np 10 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=185536 --workload_file=/home/mubarm/dumpi/cry_router/dumpi--2014.04.23.12.08.27- --workload_type="dumpi" -- src/models/mpi-trace-replay/conf/modelnet-mpi-test-cry-router.conf +The program shows the number of sends, receives, collectives and wait operations in the DUMPI trace log. -(iii) MiniFE 18 MPI tasks http://portal.nersc.gov/project/CAL/designforward.htm#MiniFE +Note: If using a different DUMPI trace file, make sure to update the modelnet-mpi-test.conf file in the config directory. -** Simple-net network model - mpirun -np 18 ./src/models/mpi-trace-replay/model-net-mpi-wrklds --sync=3 --extramem=6185536 --workload_file=/home/mubarm/dumpi/dumpi_data_18/dumpi-2014.04.22.12.17.37- --workload_type="dumpi" -- src/models/mpi-trace-replay/conf/modelnet-mpi-test-mini-fe.conf diff --git a/src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf b/src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf index b906f39de7f5c3c600609a242478e7871ed3762e..b5dad38cc734fba03ef8fb60604fb01c74aea81c 100644 --- a/src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf +++ b/src/network-workloads/conf/modelnet-mpi-test-dfly-amg-1728.conf @@ -24,5 +24,5 @@ PARAMS global_bandwidth="4.7"; cn_bandwidth="5.25"; message_size="560"; - routing="minimal"; + routing="adaptive"; }