Commit 179e3cc7 authored by Noah Wolfe's avatar Noah Wolfe

Merge remote-tracking branch 'fattree-remote/static_routing' into fattree-merge

 - Pulling in Jens/Nikhil Fat Tree static routing from remote codes-fattree repo
parents 2088171c 1ea1f8d4
*** README file for fattree network model ***
Each repetition represents a leaf level switch, nodes connected to it, and
higher level switches that may be needed to construct the fat-tree.
modelnet_fattree = radix of switch/2
fattree_switch = number of levels in the fattree (2 or 3)
Supported PARAMS:
packet_size, chunk_size (ideally kept same)
modelnet_scheduler - NIC message scheduler
modelnet_order=( "fattree" );
router_delay : delay caused by switched in ns
num_levels : number of levels in the fattree (same as fattree_switch)
switch_count : number of leaf level switches (same as repetitions)
switch_radix : radix of the switches
vc_size : size of switch VCs in bytes
cn_vc_size : size of VC between NIC and switch in bytes
link_bandwidth, cn_bandwidth : in GB/s
routing : {adaptive, static}
If static routing is chosen, two more PARAMS must be provided:
routing_folder : folder that contain lft files generated using method described below.
dot_file : name used for dotfile generation in the method described below.
(dump_topo should be set to 0 or not set when during simulations)
To generate static routing tables, first do an "empty" run to dump the topology
of the fat-tree by setting the following PARAMS:
routing : static
routing_folder : folder to which topology files should be written
dot_file : prefix used for creating topology files inside the folder
dump_topo : 1
When dump_topo is set, the simulator dumps the topology inside the folder
specified by routing_folder and exits. Next, follow these steps created by Jens
to generate the routing tables stored as LFT files:
(you should replace $P_PATH with your path)
1. Install fall-in-place toolchain: (patch files can be found in src/util/patches folder of CODES):
wget http://htor.inf.ethz.ch/sec/fts.tgz
tar xzf fts.tgz
cd fault_tolerance_simulation/
rm 0001-*.patch 0002-*.patch 0003-*.patch 0004-*.patch 0005-*.patch
tar xzf $P_PATH/sar.patches.tgz
wget http://downloads.openfabrics.org/management/opensm-3.3.20.tar.gz
mv opensm-3.3.20.tar.gz opensm.tar.gz
wget http://downloads.openfabrics.org/ibutils/ibutils-1.5.7-0.2.gbd7e502.tar.gz
mv ibutils-1.5.7-0.2.gbd7e502.tar.gz ibutils.tar.gz
wget http://downloads.openfabrics.org/management/infiniband-diags-1.6.7.tar.gz
mv infiniband-diags-1.6.7.tar.gz infiniband-diags.tar.gz
wget https://www.openfabrics.org/downloads/management/libibmad-1.3.12.tar.gz
mv libibmad-1.3.12.tar.gz libibmad.tar.gz
wget https://www.openfabrics.org/downloads/management/libibumad-1.3.10.2.tar.gz
mv libibumad-1.3.10.2.tar.gz libibumad.tar.gz
patch -p1 < $P_PATH/fts.patch
./simuate.py -s
2. Add LFT creating scripts to the fall-in-place toolchain.
cd $HOME/simulation/scripts
patch -p1 < $P_PATH/lft.patch
chmod +x post_process_*
chmod +x create_static_lft.sh
3. Choose a routing algorithm which should be used by OpenSM
(possible options: updn, dnup, ftree, lash, dor, torus-2QoS, dfsssp, sssp)
export OSM_ROUTING="ftree"
~/simulation/scripts/create_static_lft.sh routing_folder dot_file
(here routing_folder and dot_file should be same as the one used during the run used to dump the topology)
Now, the routing table stored as LFT files should be in the routing_folder.
This diff is collapsed.
This diff is collapsed.
diff -Nur scripts.orig/create_static_lft.sh scripts/create_static_lft.sh
--- scripts.orig/create_static_lft.sh 1969-12-31 16:00:00.000000000 -0800
+++ scripts/create_static_lft.sh 2016-08-16 11:08:06.058810000 -0700
@@ -0,0 +1,47 @@
+#!/bin/bash
+
+if [ $1 != "" ]; then
+ SIM_DIR="`readlink -f $1`"
+ DOT_FILE="${SIM_DIR}/$2"
+elif [ -z ${WRITE_TOPOLOGY_DOT_FILE} ]; then
+ echo "ERR: env variable WRITE_TOPOLOGY_DOT_FILE not specified"
+ exit 1
+else
+ SIM_DIR="`readlink -f ${CODES_SIM_IO_DIR}`"
+ DOT_FILE="${SIM_DIR}/${WRITE_TOPOLOGY_DOT_FILE}"
+fi
+
+if [ -f "${DOT_FILE}.dot" ]; then
+ echo "dot file already exists."
+else
+ $HOME/simulation/scripts/post_process_dot.sh ${DOT_FILE}
+ if [ "x$?" != "x0" ]; then exit -1; fi
+fi
+
+echo "running createIBNet.py"
+$HOME/simulation/scripts/createIBNet.py -t DOT -i ${DOT_FILE}.dot -o ${SIM_DIR}/topo.net
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+rm -rf ${SIM_DIR}/ofedout/
+if [ -z ${OSM_ROUTING} ]; then
+ echo 'ERR: routing must be specified via `export OSM_ROUTING=...`'
+ echo ' (available options: updn, dnup, ftree, lash, dor, torus-2QoS,'
+ echo ' dfsssp, sssp)'
+ exit -1;
+fi
+echo "running simulate.py"
+$HOME/simulation/scripts/simulate.py -n ${SIM_DIR} -r ${OSM_ROUTING} -p exchange
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+mv ${SIM_DIR}/ofedout/ibdiagnet.fdbs ${SIM_DIR}/
+mv ${SIM_DIR}/ofedout/opensm-subnet.lst ${SIM_DIR}/
+echo "running post_process_lfts.py"
+$HOME/simulation/scripts/post_process_lfts.py ${SIM_DIR}/ibdiagnet.fdbs ${SIM_DIR}/opensm-subnet.lst ${SIM_DIR}/
+if [ "x$?" != "x0" ]; then exit -1; fi
+echo "Done with script"
+
+#if [ -z ${KEEP_INTERMEDIATE} ]; then
+# rm -rf ./checkConnectivity.log ./log.txt ./ofedout/ ./${WRITE_TOPOLOGY_DOT_FILE}.dot ./topo.* ./ibdiagnet.fdbs ./opensm-subnet.lst
+#fi
+
+exit 0
diff -Nur scripts.orig/get_static_lft_for_codes.sh scripts/get_static_lft_for_codes.sh
--- scripts.orig/get_static_lft_for_codes.sh 1969-12-31 16:00:00.000000000 -0800
+++ scripts/get_static_lft_for_codes.sh 2016-08-16 11:08:06.058810000 -0700
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+if [ -z ${WRITE_TOPOLOGY_DOT_FILE} ]; then
+ echo "ERR: env variable WRITE_TOPOLOGY_DOT_FILE not specified"
+ exit 1
+else
+ SIM_DIR="`readlink -f ${CODES_SIM_IO_DIR}`"
+ DOT_FILE="${SIM_DIR}/${WRITE_TOPOLOGY_DOT_FILE}"
+fi
+
+$HOME/simulation/scripts/post_process_dot.sh ${DOT_FILE}
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+$HOME/simulation/scripts/createIBNet.py -t DOT -i ${DOT_FILE}.dot -o ${SIM_DIR}/topo.net
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+rm -rf ${SIM_DIR}/ofedout/
+if [ -z ${OSM_ROUTING} ]; then
+ echo 'ERR: routing must be specified via `export OSM_ROUTING=...`'
+ echo ' (available options: updn, dnup, ftree, lash, dor, torus-2QoS,'
+ echo ' dfsssp, sssp)'
+ exit -1;
+fi
+$HOME/simulation/scripts/simulate.py -n ${SIM_DIR} -r ${OSM_ROUTING} -p exchange
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+mv ${SIM_DIR}/ofedout/ibdiagnet.fdbs ${SIM_DIR}/
+mv ${SIM_DIR}/ofedout/opensm-subnet.lst ${SIM_DIR}/
+$HOME/simulation/scripts/post_process_lfts.py ${SIM_DIR}/ibdiagnet.fdbs ${SIM_DIR}/opensm-subnet.lst ${SIM_DIR}/
+if [ "x$?" != "x0" ]; then exit -1; fi
+
+#if [ -z ${KEEP_INTERMEDIATE} ]; then
+# rm -rf ./checkConnectivity.log ./log.txt ./ofedout/ ./${WRITE_TOPOLOGY_DOT_FILE}.dot ./topo.* ./ibdiagnet.fdbs ./opensm-subnet.lst
+#fi
+
+exit 0
diff -Nur scripts.orig/post_process_dot.sh scripts/post_process_dot.sh
--- scripts.orig/post_process_dot.sh 1969-12-31 16:00:00.000000000 -0800
+++ scripts/post_process_dot.sh 2016-08-15 14:31:22.976049000 -0700
@@ -0,0 +1,24 @@
+#!/bin/bash
+
+if [ -z ${1} ]; then
+ echo "ERR: input missing; need path to temporary dot files"
+ exit 1
+fi
+
+PATH_TO_DOT="${1}"
+
+rm -f ${PATH_TO_DOT}.dot
+echo 'digraph {' >> ${PATH_TO_DOT}.dot
+
+# first get all node defs
+cat ${PATH_TO_DOT}.dot.* | grep -v '\->\|\-\-' | sort >> ${PATH_TO_DOT}.dot
+# then get all edges/links of the graph
+cat ${PATH_TO_DOT}.dot.* | grep '\->\|\-\-' | sort >> ${PATH_TO_DOT}.dot
+
+echo '}' >> ${PATH_TO_DOT}.dot
+
+# cleanup (we don't want old partial dot files laying around when downsizing
+# the number of mpi ranks)
+rm -f ${PATH_TO_DOT}.dot.*
+
+exit 0
diff -Nur scripts.orig/post_process_lfts.py scripts/post_process_lfts.py
--- scripts.orig/post_process_lfts.py 1969-12-31 16:00:00.000000000 -0800
+++ scripts/post_process_lfts.py 2016-08-15 15:41:29.189179000 -0700
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+
+import os, re, sys
+
+try:
+ path, filename = os.path.split(os.path.normpath(sys.argv[1]))
+ if path == '': path = os.getcwd()
+ fdbsFile = os.path.join(path, filename)
+
+ path, filename = os.path.split(os.path.normpath(sys.argv[2]))
+ if path == '': path = os.getcwd()
+ lstFile = os.path.join(path, filename)
+
+ outdir = os.path.normpath(sys.argv[3])
+except:
+ sys.exit('Usage: post_process_lfts.py ./ibdiagnet.fdbs ./opensm-subnet.lst')
+
+if not os.path.exists(fdbsFile) or not os.path.exists(lstFile):
+ sys.exit('ERR: file %s or %s does not exist' % (fdbsFile, lstFile))
+
+lid_to_guid_map = {}
+p = re.compile('{\s+([a-zA-Z0-9_-]+)\s+Ports:(\w+)\s+SystemGUID:(\w+)\s+NodeGUID:(\w+)\s+PortGUID:(\w+)\s+VenID:(\w+)\s+DevID:(\w+)\s+Rev:(\w+)\s+{(.+)}\s+LID:(\w+)\s+PN:(\w+)\s+}\s+{\s+([a-zA-Z0-9_-]+)\s+Ports:(\w+)\s+SystemGUID:(\w+)\s+NodeGUID:(\w+)\s+PortGUID:(\w+)\s+VenID:(\w+)\s+DevID:(\w+)\s+Rev:(\w+)\s+{(.+)}\s+LID:(\w+)\s+PN:(\w+)\s+}\s+.+')
+for line in open(lstFile, 'r'):
+ if p.match(line):
+ m = p.match(line)
+ node1, ports1, sguid1, nguid1, pguid1, vid1, did1, rev1, name1, lid1, pn1 = \
+ m.group(1), int(m.group(2),16), m.group(3), m.group(4), int(m.group(5),16), \
+ m.group(6), m.group(7), m.group(8), m.group(9), int(m.group(10),16), \
+ int(m.group(11),16)
+ node2, ports2, sguid2, nguid2, pguid2, vid2, did2, rev2, name2, lid2, pn2 = \
+ m.group(12), int(m.group(13),16), m.group(14), m.group(15), int(m.group(16),16), \
+ m.group(17), m.group(18), m.group(19), m.group(20), int(m.group(21),16), \
+ int(m.group(22),16)
+ nguid1, nguid2 = nguid1.lower(), nguid2.lower()
+
+ # for some strange reason osm is adding +1 to the caguid to get
+ # the port guid, even if we have a single port hca
+ if name1.find('H') == 0:
+ lid_to_guid_map[lid1] = pguid1 - 1
+ else:
+ lid_to_guid_map[lid1] = pguid1
+ if name2.find('H') == 0:
+ lid_to_guid_map[lid2] = pguid2 - 1
+ else:
+ lid_to_guid_map[lid2] = pguid2
+
+sw = re.compile('.*Switch\s*0x(\w+)')
+lft = re.compile('^\s*0x(\w+)\s*:\s*(\d+)')
+out = open('/dev/null', 'r')
+for line in open(fdbsFile, 'r'):
+ if sw.match(line):
+ m = sw.match(line)
+ sw_guid = int(m.group(1),16)
+ out.close()
+ out = open(os.path.join(outdir, '0x%016x.lft' % sw_guid), 'w+')
+ elif lft.match(line):
+ m = lft.match(line)
+ lid, port = int(m.group(1),16), int(m.group(2))
+ out.write("0x%016x %d\n" % (lid_to_guid_map[lid], port))
+out.close()
+
+sys.exit(0)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment