The network layout for the custom dragonfly model is similar to Cray XC
systems. Each group has X number of routers arranged in rows and columns.
There are all-to-all connections among routers within the same row and same
column. If a packet is sent to a router that is within the same group but
lies on a different row and column, it will be first sent to an intermediate
router that has a direct connection to the destination router.
There can be multiple global channels between two groups as specified by the
network configuration files. Currently, Edison has 12 global channels
connecting any two groups in the network.
For more details on the Cray dragonfly topology, see "Cray cascade: a
scalable HPC system based on a Dragonfly network" by Greg Faanes et al.
in Supercomputing 2012.
Edison Configuration: The network configuration file from Cray Edison system
at NERSC can be used to setup the dragonfly network topology. The
configuration has 5,760 nodes, 1440 routers and 15 groups. The instructions
on how to geneate the network configuration files for Edison can be found at:
Custom Configuration: The model can be configured with an arbitrary number of
groups and number of routers within a group. A configuration build tool can be found at:
./connections_general_patched g r c can cir cic intra-file inter-file
Arguments:* **g**: number of groups in the network* **r**: number of router rows within a group* **c**: number of router columns within a group* **can**: connections across groups (number of redundant channels between groups)* **cir**: connections in row (number of channels between two routers in same row)* **cic**: connections in column ( number of channels between two routers in same column)* **intra-file**: output files for intra-group connections* **inter-file**: output file for inter-group connections
The scripts and code for translating existing topologies and generating
cray-style dragonfly topologies have been contributed by Nikhil Jain, Abhinav
Bhatele and Peer-Timo Breemer from LLNL.
For details on cray XC dragonfly network topology, see the following paper:
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray cascade: a scalable HPC system based on a Dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, , Article 103 , 9 pages.
Minimal: Within a group, a minimal route will traverse three intermediate
hops at the maximum (a source router, an intermediate router if source and
destination routers do not share the same row or column and the destination
router). Across groups, a minimal route will choose the shortest possible
connection to the destination group. A global minimal route will traverse a
maximum of 6 router hops (including source and destination).
Non-Minimal: Within a group, a non-minimal route will direct the packets to a
randomly selected router first. A global non-minimal route will involve
routing to a randomly selected intermediate router from the network first. A
global non-minimal route can traverse up to 11 router hops (including source
Local adaptive: Local adaptive routing takes a non-minimal route within a
group if it detects congestion on the minimal route (queues on minimal port
are used to detect congestion).
Global Adaptive: Global adaptive routing takes a global non-minimal route if
it detects congestion on the minimal route (queues on minimal port are used
to detect congestion).
Progressive Adaptive: Progressive adaptive routing re-evaluates the decision
to take a minimal or a non-minimal route as long as the packet stays in the
source group. If a non-minimal route is decided at some point in the source
group, the decision is no more re-evaluated.
Synthetic Traffic Patterns:
[With custom dragonfly network having 6,400 network nodes, 1600 routers and
20 groups. Each group has 80 routers arranged in a 20x4 matrix]