Commit c8cf3cc6 authored by Caitlin Ross's avatar Caitlin Ross

Merge branch 'master' into ross-updates

parents eca27707 aab92d45
......@@ -6,7 +6,8 @@ Contributors to date (in chronological order, with current affiliations)
- Shane Snyder, Argonne National Labs
- Jonathan P. Jenkins
- Noah Wolfe, RPI
- Nikhil Jain, Lawrence Livermore Labs
- Nikhil Jain, Nvidia
- Giorgis Georgakoudis, Lawrence Livermore Labs
- Matthieu Dorier, Argonne National Labs
- Caitlin Ross, RPI
- Xu Yang, Amazon
......
## README for using ROSS instrumentation with CODES
For details about the ROSS instrumentation, see the [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/instrumentation/instrumentation.html)
For details about the ROSS instrumentation, see the [ROSS Instrumentation blog post](http://ross-org.github.io/instrumentation/instrumentation.html)
on the ROSS webpage.
......@@ -8,7 +8,7 @@ There are currently 4 types of instrumentation: GVT-based, real time sampling, v
See the ROSS documentation for more info on the specific options or use `--help` with your model.
To collect data about the simulation engine, no changes are needed to model code for any of the instrumentation modes.
Some additions to the model code is needed in order to turn on any model-level data collection.
See the "Model-level data sampling" section on [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/instrumentation/instrumentation.html).
See the "Model-level data sampling" section on [ROSS Instrumentation blog post](http://ross-org.github.io/instrumentation/instrumentation.html).
Here we describe CODES specific details.
### Register Instrumentation Callback Functions
......@@ -144,4 +144,3 @@ If you're using any of the following CODES models, you don't have to add anythin
- slimfly router and terminal LPs (slimfly.c)
- fat tree switch and terminal LPs (fat-tree.c)
- model-net-base-lp (model-net-lp.c)
......@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.67])
AC_INIT([codes], [1.0.0], [http://trac.mcs.anl.gov/projects/codes/newticket],[],[http://www.mcs.anl.gov/projects/codes/])
AC_INIT([codes], [1.1.0], [http://trac.mcs.anl.gov/projects/codes/newticket],[],[http://www.mcs.anl.gov/projects/codes/])
LT_INIT
......
......@@ -2,12 +2,12 @@ NOTE: see bottom of this file for suggested configurations on particular ANL
machines.
0 - Checkout, build, and install the trunk version of ROSS
(https://github.com/carothersc/ROSS). At the time of
(https://github.com/ross-org/ROSS). At the time of
release (0.6.0), ROSS's latest commit hash was 10d7a06b2d, so this revision is
"safe" in the unlikely case incompatible changes come along in the future. If
working from the CODES master branches, use the ROSS master branch.
git clone http://github.com/carothersc/ROSS.git
git clone http://github.com/ross-org/ROSS.git
# if using 0.5.2 release: git checkout d3bdc07
cd ROSS
mkdir build
......@@ -22,7 +22,7 @@ working from the CODES master branches, use the ROSS master branch.
ROSS/install/ directory>
For more details on installing ROSS, go to
https://github.com/carothersc/ROSS/blob/master/README.md .
https://github.com/ross-org/ROSS/blob/master/README.md .
If using ccmake to configure, don't forget to set CMAKE_C_COMPILER and
CMAKE_CXX_COMPILER to mpicc/mpicxx
......
......@@ -22,7 +22,7 @@ https://lists.mcs.anl.gov/mailman/listinfo/codes-ross-users
== ROSS
* main site, repository, etc.: https://github.com/carothersc/ROSS
* main site, repository, etc.: https://github.com/ross-org/ROSS
* both the site and repository contain good documentation as well - refer to
it for an in-depth introduction and overview of ROSS proper
......@@ -202,6 +202,15 @@ should be everything up to the rank number. E.g., if the dumpi files are of the
form "dumpi-YYYY.MM.DD.HH.MM.SS-XXXX.bin", then the input should be
"dumpi-YYYY.MM.DD.HH.MM.SS-"
=== Quality of Service
Two models (dragonfly-dally.C and dragonfly-plus.C) can now support traffic
differentiation and prioritization. The models support quality of service by
directing the network traffic on separate class of virtual channels. Additional
documentation on using traffic classes can be found at the wiki link:
https://xgitlab.cels.anl.gov/codes/codes/wikis/Quality-of-Service
=== Workload generator helpers
The codes-jobmap API (codes/codes-jobmap.h) specifies mechanisms to initialize
......
......@@ -28,7 +28,7 @@ https://xgitlab.cels.anl.gov/codes/codes/wikis/Using-ROSS-Instrumentation-with-C
Compatible with ROSS version that enables statistics collection of simulation
performance. For details see:
http://carothersc.github.io/ROSS/instrumentation/instrumentation.html
http://ross-org.github.io/instrumentation/instrumentation.html
Online workload replay functionality has been added that allows SWM workloads
to be simulated insitu on the network models. WIP to integrate Conceptual
......
......@@ -188,7 +188,7 @@ easily shared and reused. It also includes a few tips to help avoid common
simulation bugs.
For more information, ROSS has a bunch of documentation available in their
repository/wiki - see \url{https://github.com/carothersc/ROSS}.
repository/wiki - see \url{https://github.com/ross-org/ROSS}.
\end{abstract}
\section{CODES: modularizing models}
......
......@@ -44,7 +44,7 @@ Notes on how to release a new version of CODES
4. Upload the release tarball
- Our release directory is at ftp.mcs.anl.gov/pub/CODES/releases . There's no
web interface, so you have to get onto an MCS workstation and copy the
release in that way (the ftp server is mounted at /homes/ftp).
release in that way (the ftp server is mounted at /mcs/ftp.mcs.anl.gov).
5. Update website
- Project wordpress: http://www.mcs.anl.gov/projects/codes/ (you need
......
......@@ -18,6 +18,7 @@ argobots_libs=@ARGOBOTS_LIBS@
argobots_cflags=@ARGOBOTS_CFLAGS@
swm_libs=@SWM_LIBS@
swm_cflags=@SWM_CFLAGS@
swm_datarootdir=@SWM_DATAROOTDIR@
Name: codes-base
Description: Base functionality for CODES storage simulation
......@@ -25,4 +26,4 @@ Version: @PACKAGE_VERSION@
URL: http://trac.mcs.anl.gov/projects/CODES
Requires:
Libs: -L${libdir} -lcodes ${ross_libs} ${argobots_libs} ${swm_libs} ${darshan_libs} ${dumpi_libs} ${cortex_libs}
Cflags: -I${includedir} ${swm_datarootdir} ${ross_cflags} ${darshan_cflags} ${swm_cflags} ${argobots_cflags} ${dumpi_cflags} ${cortex_cflags}
Cflags: -I${includedir} -I${swm_datarootdir} ${ross_cflags} ${darshan_cflags} ${swm_cflags} ${argobots_cflags} ${dumpi_cflags} ${cortex_cflags}
......@@ -11,15 +11,8 @@ import sys
from enum import Enum
import struct
import numpy as np
argv = sys.argv
import random
import os
import copy
class RandomError(Exception):
def __init__(self, message):
self.message = message
super().__init__(message)
argv = sys.argv
class Loudness(Enum):
DEBUG = 0 #prints all output
......@@ -32,6 +25,7 @@ global DRYRUN
global LOUDNESS
global SHOW_ADJACENCY
global NO_OUTPUT_FILE
global TRUE_RANDOM
LOUDNESS = Loudness.STANDARD
DRYRUN = 0
......@@ -165,30 +159,33 @@ class DragonflyPlusNetwork(object):
def generateGlobalGroupConnections(self):
log("Dragonfly Plus Network: Generating Global Group Connections", Loudness.STANDARD)
for group in self.groups:
other_groups = group.getOtherGroupIDsStartingAfterMe(self.num_groups)
for ogid in other_groups:
og = self.groups[ogid]
gcb = GroupConnectionBundle(group, og, self.num_global_links_between_groups)
group.groupConnBundles[ogid] = gcb
pair_set = set()
for i in range(self.num_groups):
for j in range(self.num_groups):
if i != j:
pair_set.add(frozenset((i,j)))
group_copy = copy.deepcopy(self.groups)
for pair in pair_set:
groups = [-1,-1]
for i,group_id in enumerate(pair):
groups[i] = group_id
if groups[0] == -1 or groups[1] == -1:
raise Exception("DragonflyPlusNetwork: Bad Generation of Group Pairs")
group1 = self.groups[groups[0]]
group2 = self.groups[groups[1]]
for i in range(self.num_global_links_between_groups):
the_group_connection = GroupConnection(group1,group2)
group1.addGlobalConnection(the_group_connection)
group2.addGlobalConnection(the_group_connection)
tries = 0
success = False
while success is False:
try:
for group in self.groups:
group.assignGlobalConnectionsToRouters()
success = True
except RandomError:
tries += 1
self.groups = copy.deepcopy(group_copy)
if tries%20 == 0:
log("Failed after %d tries, trying again..."%tries, Loudness.STANDARD)
else:
log("Failed after %d tries, trying again..."%tries, Loudness.LOUD)
group.assignRoutersToGlobalConnections()
for group in self.groups:
group.bakeGlobalConnections()
def getNumGlobalConnsPerSpine(self):
......@@ -279,19 +276,15 @@ class DragonflyPlusNetwork(object):
if glob is not glob_conns:
failed = True
if failed:
raise Exception("DragonflyPlusNetwork: Failed Verification: Fairness")
log("Verifying Dragonfly Nature...", Loudness.STANDARD)
for g in self.groups:
for gcb in g.groupConnBundles.values():
if gcb.assigned_num_gc_between != self.num_global_links_between_groups:
raise Exception("DragonflyPlusNetwork: Invalid number of connections between groups")
if failed:
raise Exception("DragonflyPlusNetwork: Failed Verification: Fairness")
for g in self.groups:
if len(set(g.groupConns)) != self.num_global_links_pg:
raise Exception("DragonflyPlusNetwork: Not Enough Group Connections for Group %d (%d != %d)"%(g.group_id,len(set(g.groupConns)), self.num_global_links_pg))
raise Exception("DragonflyPlusNetwork: Not Enough Group Connections")
log("Verifying Inter Group Connection Uniformity...", Loudness.STANDARD)
num_gc_between_0_1 = len(self.groups[0].getConnectionsToGroup(1))
......@@ -299,7 +292,18 @@ class DragonflyPlusNetwork(object):
other_groups = g.getOtherGroupIDsStartingAfterMe(self.num_groups)
for other_group_id in other_groups:
if len(g.getConnectionsToGroup(other_group_id)) != num_gc_between_0_1:
raise Exception("DragonflyPlusNetwork: Failed Verification: InterGroup Connection Uniformity")
raise Exception("DragonflyPlusNetwork: Failed Verification: InterGroup Connection Uniformity: %d != %d")
log("Verifying Number of Links Generated...", Loudness.STANDARD)
link_sum = 0
for row in A:
for item in row:
link_sum += item
if link_sum != (self.router_radix * ((self.num_leaf_pg + self.num_spine_pg)*self.num_groups)) - (self.num_leaf_pg * self.num_hosts_per_leaf * self.num_groups): #number of links per router - number of terminals (because those links weren't generated)
raise Exception("DragonflyPlusNetwork: Failed Verification: Number of links generated doesn't match expected")
def commitConnection(self,conn, connType):
if connType is ConnType.LOCAL:
......@@ -365,13 +369,20 @@ class Group(object):
self.network = network
self.groupConns = []
self.groupConnBundles = {}
def addRouter(self,router):
self.group_routers.append(router)
self.used_radix += router.inter_radix
def addGlobalConnection(self, group_conn):
if group_conn.src_group == self:
other_group_id = group_conn.dest_group.group_id
else:
other_group_id = group_conn.src_group.group_id
log("Group %d -> Group %d" % (self.group_id, other_group_id), Loudness.LOUD)
self.groupConns.append(group_conn)
def getSpineRouters(self):
return [r for r in self.group_routers if r.routerType is RouterType.SPINE]
......@@ -379,45 +390,13 @@ class Group(object):
def getLeafRouters(self):
return [r for r in self.group_routers if r.routerType is RouterType.LEAF]
def getRoutersWithOpenPorts(self,routerType, connType):
if routerType is RouterType.SPINE:
if connType is ConnType.GLOBAL:
return [r for r in self.group_routers if (r.routerType is RouterType.SPINE) if (len(r.global_connections) < r.inter_radix) ]
else:
return [r for r in self.group_routers if (r.routerType is RouterType.SPINE) if (len(r.local_connections) < r.intra_radix) ]
else:
if connType is ConnType.GLOBAL:
return [r for r in self.group_routers if (r.routerType is RouterType.LEAF) if (len(r.global_connections) < r.inter_radix) ]
else:
return [r for r in self.group_routers if (r.routerType is RouterType.LEAF) if (len(r.local_connections) < r.intra_radix) ]
def getRandomOpenRouter(self, routerType, connType, exceptions=None):
avail_routers = self.getRoutersWithOpenPorts(routerType, connType)
if exceptions == None:
if (len(avail_routers) == 0):
raise RandomError("Randomized Dead End (exceptions == none)!")
rand_sel = random.randint(0,len(avail_routers)-1)
return avail_routers[rand_sel]
else:
avail_routers_set = set(avail_routers)
exception_set = set(exceptions)
remaining_routers = list(avail_routers_set - exception_set)
if (len(remaining_routers) == 0):
raise RandomError("Randomized Dead End! (exceptions == something)")
rand_sel = random.randint(0, len(remaining_routers) -1 )
return remaining_routers[rand_sel]
def getOtherGroupIDsStartingAfterMe(self,num_groups):
my_group_id = self.group_id
all_group_ids = [i for i in range(num_groups) if i != my_group_id]
return np.roll(all_group_ids, -1*my_group_id)
def getConnectionsToGroup(self,other_group_id):
return [conn for conn in self.groupConns if conn.dest_router.group_id == other_group_id]
return [conn for conn in self.groupConns if conn.dest_group.group_id == other_group_id or conn.src_group.group_id == other_group_id]
def generateLocalConnections(self):
log("Group %d: generating local connections" % self.group_id, Loudness.LOUD)
......@@ -429,22 +408,33 @@ class Group(object):
for lrtr in leaf_routers:
srtr.connectTo(lrtr, ConnType.LOCAL)
def assignGlobalConnectionsToRouters(self):
def assignRoutersToGlobalConnections(self):
log("Group %d: assigning global connections" % self.group_id, Loudness.LOUD)
for gcb in self.groupConnBundles.values():
for i in range(gcb.num_gc_between):
if (gcb.assigned_num_gc_between < gcb.num_gc_between):
src_rtr = gcb.src_group.getRandomOpenRouter(RouterType.SPINE, ConnType.GLOBAL)
dest_rtr = gcb.dest_group.getRandomOpenRouter(RouterType.SPINE, ConnType.GLOBAL, exceptions=src_rtr.getRoutersIConnectTo(ConnType.GLOBAL)) #TODO this exceptions prevents parallel connections from being valid
my_spine_routers = self.getSpineRouters()
(src_conn, dest_conn ) = src_rtr.connectTo(dest_rtr, ConnType.GLOBAL)
self.groupConns.append(src_conn)
gcb.dest_group.groupConns.append(dest_conn)
if TRUE_RANDOM:
random.shuffle(self.groupConns)
group_conns_used = 0
while (group_conns_used < len(self.groupConns)):
my_spine_routers = random.sample(my_spine_routers, len(my_spine_routers))
for router in my_spine_routers:
router.num_unbaked_global_connections += 1
self.groupConns[group_conns_used].setEndpoint(router)
group_conns_used += 1
def bakeGlobalConnections(self):
log("Group %d: baking global connections" % self.group_id, Loudness.LOUD)
for i, group_conn in enumerate(self.groupConns):
if group_conn.routers[0].group_id == self.group_id:
group_conn.routers[0].connectToOneWay(group_conn.routers[1], ConnType.GLOBAL)
elif group_conn.routers[1].group_id == self.group_id:
group_conn.routers[1].connectToOneWay(group_conn.routers[0], ConnType.GLOBAL)
else:
raise Exception("BakeGlobalConnections: Something went wrong...")
dest_gcb = gcb.dest_group.groupConnBundles[self.group_id] #the group connection bundle from dest to src
gcb.assignConnection(src_conn)
dest_gcb.assignConnection(dest_conn)
......@@ -458,10 +448,20 @@ class Router(object):
self.routerType = routerType
self.local_connections = []
self.global_connections = []
self.num_unbaked_global_connections = 0
self.network = network
log("New Router: GID: %d LID: %d Group %d" % (self.gid, self.local_id, self.group_id), Loudness.DEBUG)
# local_spinal_id = self.local_id - self.network.num_leaf_pg #what is my ID in terms of num spine
# other_groups = self.network.groups[group_id].getOtherGroupIDsStartingAfterMe(self.network.num_groups)
# other_groups_i_connect_to_std = [g for i,g in enumerate(other_groups) if i]
# other_groups_i_connect_to_start_index_std = (local_spinal_id * self.network.num_spine_pg) % len(other_groups)
# other_groups_i_connect_to_start_index_addtl_gc = (other_groups_i_connect_to_start_index_std - (self.network.num_global_links_per_spine - self.network.num_spine_pg)) % len(other_groups)
def connectTo(self, other_rtr, connType):
if connType is ConnType.GLOBAL:
assert(self.routerType == RouterType.SPINE)
......@@ -472,8 +472,6 @@ class Router(object):
self.addConnection(conn, connType)
other_rtr.addConnection(oconn, connType)
return (conn, oconn)
def connectToOneWay(self, other_rtr, connType): #connects without connecting backward - for use if you know your loop will double count
if connType is ConnType.GLOBAL:
assert(self.routerType == RouterType.SPINE)
......@@ -498,25 +496,6 @@ class Router(object):
self.network.commitConnection(conn, conntype)
def getRoutersIConnectTo(self, connType):
other_routers = []
if connType is ConnType.GLOBAL:
other_routers.extend([conn.dest_router for conn in self.global_connections])
if connType is ConnType.LOCAL:
other_routers.extend([conn.dest_router for conn in self.local_connections])
return other_routers
def getGroupsIConnectTo(self):
other_groups = set()
for gc in self.global_connections:
if gc.dest_group not in other_groups:
other_groups.add(gc.dest_group)
return other_groups
def __hash__(self):
return self.gid
class Connection(object):
def __init__(self, src_router, dest_router, connType, shifted_by=0):
......@@ -543,29 +522,29 @@ class Connection(object):
else:
raise KeyError("Connection: Invalid __getitem__() key")
class GroupConnectionBundle(object):
def __init__(self, src_group, dest_group, num_gc_between):
for i in range(num_gc_between):
log("Group %d -> Group %d" % (src_group.group_id, dest_group.group_id), Loudness.LOUD)
class GroupConnection(object):
def __init__(self, src_group, dest_group):
self.src_group = src_group
self.dest_group = dest_group
self.num_gc_between = num_gc_between
self.assigned_num_gc_between = 0
self.assigned_conns = []
self.routers = []
def assignConnection(self, src_conn):
self.assigned_num_gc_between += 1
self.assigned_conns.append(src_conn)
if (len(self.assigned_conns) > self.num_gc_between):
raise Exception("GroupConnectionBundle: assigning too many connections!")
def setEndpoint(self, rtr):
if len(self.routers) == 2:
raise Exception("GroupConnection: Can't supply more than 2 endpoints to a group connection")
self.routers.append(rtr)
def parseOptionArguments():
global DRYRUN
global LOUDNESS
global SHOW_ADJACENCY
global NO_OUTPUT_FILE
global TRUE_RANDOM
if "--true-random" in argv:
TRUE_RANDOM = True
else:
TRUE_RANDOM = False
if "--debug" in argv:
LOUDNESS = Loudness.DEBUG
......@@ -652,5 +631,37 @@ def mainV3():
print(A.astype(int))
# def mainV2():
# if(len(argv) < 8):
# raise Exception("Correct usage: python %s <num_groups> <num_spine_pg> <num_leaf_pg> <router_radix> <terminals-per-leaf> <intra-file> <inter-file>" % sys.argv[0])
# num_groups = int(argv[1])
# num_spine_pg = int(argv[2])
# num_leaf_pg = int(argv[3])
# router_radix = int(argv[4])
# term_per_leaf = int(argv[5])
# intra_filename = argv[6]
# inter_filename = argv[7]
# parseOptionArguments()
# dfp_network = DragonflyPlusNetwork(num_groups, num_spine_pg, num_leaf_pg, router_radix, num_hosts_per_leaf=term_per_leaf)
# if not DRYRUN:
# dfp_network.writeIntraconnectionFile(intra_filename)
# dfp_network.writeInterconnectionFile(inter_filename)
# if LOUDNESS is not Loudness.QUIET:
# print("\nNOTE: THIS STILL CAN'T DO THE MED-LARGE TOPOLOGY RIGHT\n")
# print(dfp_network.getSummary())
# if SHOW_ADJACENCY == 1:
# print("\nPrinting Adjacency Matrix:")
# np.set_printoptions(linewidth=400,threshold=10000,edgeitems=200)
# A = dfp_network.getAdjacencyMatrix(AdjacencyType.ALL_CONNS)
# print(A.astype(int))
if __name__ == '__main__':
mainV3()
......@@ -4,7 +4,7 @@
# In hindsight this was a lot more complicated than I intended. It was looking to solve a complex problem that turned out to be invalid from the beginning.
### USAGE ###
# Correct usage: python3 script.py <num_groups> <num_spine_pg> <num_leaf_pg> <router_radix> <num_terminal_per_leaf> <intra-file> <inter-file>
# Correct usage: python3 dragonfly-plus-topo-gen-v2.py <router_radix> <num_gc_between_groups> <intra-file> <inter-file>
### ###
import sys
......@@ -573,37 +573,37 @@ def mainV3():
print(A.astype(int))
def mainV2():
if(len(argv) < 8):
raise Exception("Correct usage: python %s <num_groups> <num_spine_pg> <num_leaf_pg> <router_radix> <terminals-per-leaf> <intra-file> <inter-file>" % sys.argv[0])
# def mainV2():
# if(len(argv) < 8):
# raise Exception("Correct usage: python %s <num_groups> <num_spine_pg> <num_leaf_pg> <router_radix> <terminals-per-leaf> <intra-file> <inter-file>" % sys.argv[0])
num_groups = int(argv[1])
num_spine_pg = int(argv[2])
num_leaf_pg = int(argv[3])
router_radix = int(argv[4])
term_per_leaf = int(argv[5])
intra_filename = argv[6]
inter_filename = argv[7]
# num_groups = int(argv[1])
# num_spine_pg = int(argv[2])
# num_leaf_pg = int(argv[3])
# router_radix = int(argv[4])
# term_per_leaf = int(argv[5])
# intra_filename = argv[6]
# inter_filename = argv[7]
parseOptionArguments()
# parseOptionArguments()
dfp_network = DragonflyPlusNetwork(num_groups, num_spine_pg, num_leaf_pg, router_radix, num_hosts_per_leaf=term_per_leaf)
# dfp_network = DragonflyPlusNetwork(num_groups, num_spine_pg, num_leaf_pg, router_radix, num_hosts_per_leaf=term_per_leaf)
if not DRYRUN:
dfp_network.writeIntraconnectionFile(intra_filename)
dfp_network.writeInterconnectionFile(inter_filename)
# if not DRYRUN:
# dfp_network.writeIntraconnectionFile(intra_filename)
# dfp_network.writeInterconnectionFile(inter_filename)
if LOUDNESS is not Loudness.QUIET:
print("\nNOTE: THIS STILL CAN'T DO THE MED-LARGE TOPOLOGY RIGHT\n")
# if LOUDNESS is not Loudness.QUIET:
# print("\nNOTE: THIS STILL CAN'T DO THE MED-LARGE TOPOLOGY RIGHT\n")
print(dfp_network.getSummary())
# print(dfp_network.getSummary())
if SHOW_ADJACENCY == 1:
print("\nPrinting Adjacency Matrix:")
# if SHOW_ADJACENCY == 1:
# print("\nPrinting Adjacency Matrix:")
np.set_printoptions(linewidth=400,threshold=10000,edgeitems=200)
A = dfp_network.getAdjacencyMatrix(AdjacencyType.ALL_CONNS)
print(A.astype(int))
# np.set_printoptions(linewidth=400,threshold=10000,edgeitems=200)
# A = dfp_network.getAdjacencyMatrix(AdjacencyType.ALL_CONNS)
# print(A.astype(int))
if __name__ == '__main__':
mainV3()
......@@ -41,7 +41,7 @@ PARAMS
# bandwidth in GiB/s for compute node-router channels
cn_bandwidth="16.0";
# ROSS message size
message_size="656";
message_size="736";
# number of compute nodes connected to router, dictated by dragonfly config
# file
num_cns_per_router="2";
......
LPGROUPS
{
MODELNET_GRP
{
repetitions="1040";
# name of this lp changes according to the model
nw-lp="8";
# these lp names will be the same for dragonfly-custom model
modelnet_dragonfly_dally="8";
modelnet_dragonfly_dally_router="1";
}
}
PARAMS
{
# packet size in the network
packet_size="4096";
modelnet_order=( "dragonfly_dally","dragonfly_dally_router" );
# scheduler options
modelnet_scheduler="fcfs";
# chunk size in the network (when chunk size = packet size, packets will not be
# divided into chunks)
chunk_size="4096";
# modelnet_scheduler="round-robin";
num_router_rows="1";
# intra-group columns for routers
num_router_cols="16";
# number of groups in the network
num_groups="65";
# buffer size in bytes for local virtual channels
local_vc_size="16384";
#buffer size in bytes for global virtual channels
global_vc_size="16384";
#buffer size in bytes for compute node virtual channels
cn_vc_size="32768";
#bandwidth in GiB/s for local channels
local_bandwidth="2.0";
# bandwidth in GiB/s for global channels
global_bandwidth="2.0";
# bandwidth in GiB/s for compute node-router channels
cn_bandwidth="2.0";
# Number of row channels
num_row_chans="1";
# Number of column channels
num_col_chans="1";
# ROSS message size
message_size="656";
# number of compute nodes connected to router, dictated by dragonfly config
# file
num_cns_per_router="8";
# number of global channels per router
num_global_channels="8";
# network config file for intra-group connections
intra-group-connections="../src/network-workloads/conf/dragonfly-dally/dfdally_8k_intra";
# network config file for inter-group connections
inter-group-connections="../src/network-workloads/conf/dragonfly-dally/dfdally_8k_inter";
# routing protocol to be used
routing="prog-adaptive";
adaptive_threshold="131072";
minimal-bias="1";
df-dally-vc = "1";
}
......@@ -51,9 +51,9 @@ PARAMS
# number of global channels per router
num_global_channels="8";
# network config file for intra-group connections
intra-group-connections="../src/network-workloads/conf/dragonfly-dally/dfdally_8k_intra";
intra-group-connections="@abs_srcdir@/dfdally_8k_intra";
# network config file for inter-group connections
inter-group-connections="../src/network-workloads/conf/dragonfly-dally/dfdally_8k_inter";
inter-group-connections="@abs_srcdir@/dfdally_8k_inter";
# routing protocol to be used
routing="prog-adaptive";
adaptive_threshold="131072";
......
LPGROUPS
{
MODELNET_GRP
{
repetitions="1056";
# name of this lp changes according to the model
nw-lp="8";
# these lp names will be the same for dragonfly-custom model
modelnet_dragonfly_plus="8";
modelnet_dragonfly_plus_router="1";
}
}
PARAMS
{
# packet size in the network
packet_size="4096";
# order of LPs, mapping for modelnet grp
modelnet_order=( "dragonfly_plus","dragonfly_plus_router" );
# scheduler options
modelnet_scheduler="fcfs";
# chunk size in the network (when chunk size = packet size, packets will not be divided into chunks)
chunk_size="4096";
# number of spine routers per group
num_router_spine="16";
# number of leaf routers per group
num_router_leaf="16";
# number of links connecting between group levels per router
num_level_chans="1";
# number of groups in the network
num_groups="33";
# buffer size in bytes for local virtual channels
local_vc_size="32768";
# buffer size in bytes for global virtual channels
global_vc_size="32768";
# buffer size in bytes for compute node virtual channels
cn_vc_size="32768";
# bandwidth in GiB/s for local channels
local_bandwidth="25.0";
# bandwidth in GiB/s for global channels