Commit 1ea1f8d4 authored by Nikhil's avatar Nikhil

Add support for static routing.

Documentation in src/networks/model-net/doc/README.fattree.txt

This commit is joint work by Jens Domke and Nikhil Jain.

Jens did all the ground work for dumping the topology and reading the routing
tables. He also wrote the scripts that perform the intermediate steps that
generate the routing tables using the topology.

Nikhil integrated Jens's changes into CODES by breaking them into a 3 step
process as described in src/networks/model-net/doc/README.fattree.txt.

Change-Id: I1478fb2ab83b9ded0673d8f897e09a423868795a
parent fa409abd
*** README file for fattree network model ***
Each repetition represents a leaf level switch, nodes connected to it, and
higher level switches that may be needed to construct the fat-tree.
modelnet_fattree = radix of switch/2
fattree_switch = number of levels in the fattree (2 or 3)
Supported PARAMS:
packet_size, chunk_size (ideally kept same)
modelnet_scheduler - NIC message scheduler
modelnet_order=( "fattree" );
router_delay : delay caused by switched in ns
num_levels : number of levels in the fattree (same as fattree_switch)
switch_count : number of leaf level switches (same as repetitions)
switch_radix : radix of the switches
vc_size : size of switch VCs in bytes
cn_vc_size : size of VC between NIC and switch in bytes
link_bandwidth, cn_bandwidth : in GB/s
routing : {adaptive, static}
If static routing is chosen, two more PARAMS must be provided:
routing_folder : folder that contain lft files generated using method described below.
dot_file : name used for dotfile generation in the method described below.
(dump_topo should be set to 0 or not set when during simulations)
To generate static routing tables, first do an "empty" run to dump the topology
of the fat-tree by setting the following PARAMS:
routing : static
routing_folder : folder to which topology files should be written
dot_file : prefix used for creating topology files inside the folder
dump_topo : 1
When dump_topo is set, the simulator dumps the topology inside the folder
specified by routing_folder and exits. Next, follow these steps created by Jens
to generate the routing tables stored as LFT files:
(you should replace $P_PATH with your path)
1. Install fall-in-place toolchain: (patch files can be found in src/util/patches folder of CODES):
wget http://htor.inf.ethz.ch/sec/fts.tgz
tar xzf fts.tgz
cd fault_tolerance_simulation/
rm 0001-*.patch 0002-*.patch 0003-*.patch 0004-*.patch 0005-*.patch
tar xzf $P_PATH/sar.patches.tgz
wget http://downloads.openfabrics.org/management/opensm-3.3.20.tar.gz
mv opensm-3.3.20.tar.gz opensm.tar.gz
wget http://downloads.openfabrics.org/ibutils/ibutils-1.5.7-0.2.gbd7e502.tar.gz
mv ibutils-1.5.7-0.2.gbd7e502.tar.gz ibutils.tar.gz
wget http://downloads.openfabrics.org/management/infiniband-diags-1.6.7.tar.gz
mv infiniband-diags-1.6.7.tar.gz infiniband-diags.tar.gz
wget https://www.openfabrics.org/downloads/management/libibmad-1.3.12.tar.gz
mv libibmad-1.3.12.tar.gz libibmad.tar.gz
wget https://www.openfabrics.org/downloads/management/libibumad-1.3.10.2.tar.gz
mv libibumad-1.3.10.2.tar.gz libibumad.tar.gz
patch -p1 < $P_PATH/fts.patch
./simuate.py -s
2. Add LFT creating scripts to the fall-in-place toolchain.
cd $HOME/simulation/scripts
patch -p1 < $P_PATH/lft.patch
chmod +x post_process_*
chmod +x create_static_lft.sh
3. Choose a routing algorithm which should be used by OpenSM
(possible options: updn, dnup, ftree, lash, dor, torus-2QoS, dfsssp, sssp)
export OSM_ROUTING="ftree"
~/simulation/scripts/create_static_lft.sh routing_folder dot_file
(here routing_folder and dot_file should be same as the one used during the run used to dump the topology)
Now, the routing table stored as LFT files should be in the routing_folder.
This diff is collapsed.
This diff is collapsed.
diff -Nur scripts.orig/create_static_lft.sh scripts/create_static_lft.sh
--- scripts.orig/create_static_lft.sh 1969-12-31 16:00:00.000000000 -0800
+++ scripts/create_static_lft.sh 2016-08-16 11:08:06.058810000 -0700
@@ -0,0 +1,47 @@
+#!/bin/bash
+
+if [ $1 != "" ]; then
+ SIM_DIR="`readlink -f $1`"
+ DOT_FILE="${SIM_DIR}/$2"
+elif [ -z ${WRITE_TOPOLOGY_DOT_FILE} ]; then
+ echo "ERR: env variable WRITE_TOPOLOGY_DOT_FILE not specified"
+ exit 1
+else
+ SIM_DIR="`readlink -f ${CODES_SIM_IO_DIR}`"
+ DOT_FILE="${SIM_DIR}/${WRITE_TOPOLOGY_DOT_FILE}"
+fi
+
+if [ -f "${DOT_FILE}.dot" ]; then
+ echo "dot file already exists."
+else
+ $HOME/simulation/scripts/post_process_dot.sh ${DOT_FILE}
+ if [ "x$?" != "x0" ]; then exit -1; fi
+fi
+
+echo "running createIBNet.py"
+$HOME/simulation/scripts/createIBNet.py -t DOT -i ${DOT_FILE}.dot -o ${SIM_DIR}/topo.net
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+rm -rf ${SIM_DIR}/ofedout/
+if [ -z ${OSM_ROUTING} ]; then
+ echo 'ERR: routing must be specified via `export OSM_ROUTING=...`'
+ echo ' (available options: updn, dnup, ftree, lash, dor, torus-2QoS,'
+ echo ' dfsssp, sssp)'
+ exit -1;
+fi
+echo "running simulate.py"
+$HOME/simulation/scripts/simulate.py -n ${SIM_DIR} -r ${OSM_ROUTING} -p exchange
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+mv ${SIM_DIR}/ofedout/ibdiagnet.fdbs ${SIM_DIR}/
+mv ${SIM_DIR}/ofedout/opensm-subnet.lst ${SIM_DIR}/
+echo "running post_process_lfts.py"
+$HOME/simulation/scripts/post_process_lfts.py ${SIM_DIR}/ibdiagnet.fdbs ${SIM_DIR}/opensm-subnet.lst ${SIM_DIR}/
+if [ "x$?" != "x0" ]; then exit -1; fi
+echo "Done with script"
+
+#if [ -z ${KEEP_INTERMEDIATE} ]; then
+# rm -rf ./checkConnectivity.log ./log.txt ./ofedout/ ./${WRITE_TOPOLOGY_DOT_FILE}.dot ./topo.* ./ibdiagnet.fdbs ./opensm-subnet.lst
+#fi
+
+exit 0
diff -Nur scripts.orig/get_static_lft_for_codes.sh scripts/get_static_lft_for_codes.sh
--- scripts.orig/get_static_lft_for_codes.sh 1969-12-31 16:00:00.000000000 -0800
+++ scripts/get_static_lft_for_codes.sh 2016-08-16 11:08:06.058810000 -0700
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+if [ -z ${WRITE_TOPOLOGY_DOT_FILE} ]; then
+ echo "ERR: env variable WRITE_TOPOLOGY_DOT_FILE not specified"
+ exit 1
+else
+ SIM_DIR="`readlink -f ${CODES_SIM_IO_DIR}`"
+ DOT_FILE="${SIM_DIR}/${WRITE_TOPOLOGY_DOT_FILE}"
+fi
+
+$HOME/simulation/scripts/post_process_dot.sh ${DOT_FILE}
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+$HOME/simulation/scripts/createIBNet.py -t DOT -i ${DOT_FILE}.dot -o ${SIM_DIR}/topo.net
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+rm -rf ${SIM_DIR}/ofedout/
+if [ -z ${OSM_ROUTING} ]; then
+ echo 'ERR: routing must be specified via `export OSM_ROUTING=...`'
+ echo ' (available options: updn, dnup, ftree, lash, dor, torus-2QoS,'
+ echo ' dfsssp, sssp)'
+ exit -1;
+fi
+$HOME/simulation/scripts/simulate.py -n ${SIM_DIR} -r ${OSM_ROUTING} -p exchange
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+mv ${SIM_DIR}/ofedout/ibdiagnet.fdbs ${SIM_DIR}/
+mv ${SIM_DIR}/ofedout/opensm-subnet.lst ${SIM_DIR}/
+$HOME/simulation/scripts/post_process_lfts.py ${SIM_DIR}/ibdiagnet.fdbs ${SIM_DIR}/opensm-subnet.lst ${SIM_DIR}/
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+#if [ -z ${KEEP_INTERMEDIATE} ]; then
+# rm -rf ./checkConnectivity.log ./log.txt ./ofedout/ ./${WRITE_TOPOLOGY_DOT_FILE}.dot ./topo.* ./ibdiagnet.fdbs ./opensm-subnet.lst
+#fi
+
+exit 0
diff -Nur scripts.orig/post_process_dot.sh scripts/post_process_dot.sh
--- scripts.orig/post_process_dot.sh 1969-12-31 16:00:00.000000000 -0800
+++ scripts/post_process_dot.sh 2016-08-15 14:31:22.976049000 -0700
@@ -0,0 +1,24 @@
+#!/bin/bash
+
+if [ -z ${1} ]; then
+ echo "ERR: input missing; need path to temporary dot files"
+ exit 1
+fi
+
+PATH_TO_DOT="${1}"
+
+rm -f ${PATH_TO_DOT}.dot
+echo 'digraph {' >> ${PATH_TO_DOT}.dot
+
+# first get all node defs
+cat ${PATH_TO_DOT}.dot.* | grep -v '\->\|\-\-' | sort >> ${PATH_TO_DOT}.dot
+# then get all edges/links of the graph
+cat ${PATH_TO_DOT}.dot.* | grep '\->\|\-\-' | sort >> ${PATH_TO_DOT}.dot
+
+echo '}' >> ${PATH_TO_DOT}.dot
+
+# cleanup (we don't want old partial dot files laying around when downsizing
+# the number of mpi ranks)
+rm -f ${PATH_TO_DOT}.dot.*
+
+exit 0
diff -Nur scripts.orig/post_process_lfts.py scripts/post_process_lfts.py
--- scripts.orig/post_process_lfts.py 1969-12-31 16:00:00.000000000 -0800
+++ scripts/post_process_lfts.py 2016-08-15 15:41:29.189179000 -0700
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+
+import os, re, sys
+
+try:
+ path, filename = os.path.split(os.path.normpath(sys.argv[1]))
+ if path == '': path = os.getcwd()
+ fdbsFile = os.path.join(path, filename)
+
+ path, filename = os.path.split(os.path.normpath(sys.argv[2]))
+ if path == '': path = os.getcwd()
+ lstFile = os.path.join(path, filename)
+
+ outdir = os.path.normpath(sys.argv[3])
+except:
+ sys.exit('Usage: post_process_lfts.py ./ibdiagnet.fdbs ./opensm-subnet.lst')
+
+if not os.path.exists(fdbsFile) or not os.path.exists(lstFile):
+ sys.exit('ERR: file %s or %s does not exist' % (fdbsFile, lstFile))
+
+lid_to_guid_map = {}
+p = re.compile('{\s+([a-zA-Z0-9_-]+)\s+Ports:(\w+)\s+SystemGUID:(\w+)\s+NodeGUID:(\w+)\s+PortGUID:(\w+)\s+VenID:(\w+)\s+DevID:(\w+)\s+Rev:(\w+)\s+{(.+)}\s+LID:(\w+)\s+PN:(\w+)\s+}\s+{\s+([a-zA-Z0-9_-]+)\s+Ports:(\w+)\s+SystemGUID:(\w+)\s+NodeGUID:(\w+)\s+PortGUID:(\w+)\s+VenID:(\w+)\s+DevID:(\w+)\s+Rev:(\w+)\s+{(.+)}\s+LID:(\w+)\s+PN:(\w+)\s+}\s+.+')
+for line in open(lstFile, 'r'):
+ if p.match(line):
+ m = p.match(line)
+ node1, ports1, sguid1, nguid1, pguid1, vid1, did1, rev1, name1, lid1, pn1 = \
+ m.group(1), int(m.group(2),16), m.group(3), m.group(4), int(m.group(5),16), \
+ m.group(6), m.group(7), m.group(8), m.group(9), int(m.group(10),16), \
+ int(m.group(11),16)
+ node2, ports2, sguid2, nguid2, pguid2, vid2, did2, rev2, name2, lid2, pn2 = \
+ m.group(12), int(m.group(13),16), m.group(14), m.group(15), int(m.group(16),16), \
+ m.group(17), m.group(18), m.group(19), m.group(20), int(m.group(21),16), \
+ int(m.group(22),16)
+ nguid1, nguid2 = nguid1.lower(), nguid2.lower()
+
+ # for some strange reason osm is adding +1 to the caguid to get
+ # the port guid, even if we have a single port hca
+ if name1.find('H') == 0:
+ lid_to_guid_map[lid1] = pguid1 - 1
+ else:
+ lid_to_guid_map[lid1] = pguid1
+ if name2.find('H') == 0:
+ lid_to_guid_map[lid2] = pguid2 - 1
+ else:
+ lid_to_guid_map[lid2] = pguid2
+
+sw = re.compile('.*Switch\s*0x(\w+)')
+lft = re.compile('^\s*0x(\w+)\s*:\s*(\d+)')
+out = open('/dev/null', 'r')
+for line in open(fdbsFile, 'r'):
+ if sw.match(line):
+ m = sw.match(line)
+ sw_guid = int(m.group(1),16)
+ out.close()
+ out = open(os.path.join(outdir, '0x%016x.lft' % sw_guid), 'w+')
+ elif lft.match(line):
+ m = lft.match(line)
+ lid, port = int(m.group(1),16), int(m.group(2))
+ out.write("0x%016x %d\n" % (lid_to_guid_map[lid], port))
+out.close()
+
+sys.exit(0)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment