...

Commits (70)
 Contributors to date (in chronological order, with current affiliations) - Ning Liu, IBM - Jason Cope, DDN - Philip Carns, Argonne National Labs - Misbah Mubarak, Argonne National Labs - Shane Snyder, Argonne National Labs - Jonathan P. Jenkins - Noah Wolfe, RPI - Nikhil Jain, Lawrence Livermore Labs - Matthieu Dorier, Argonne National Labs - Caitlin Ross, RPI - Xu Yang, Amazon Contributors to date (with affiliations at time of contribution) - Philip Carns, Argonne National Laboratory - Misbah Mubarak, Argonne National Laboratory - Shane Snyder, Argonne National Laboratory - Jonathan P. Jenkins, Argonne National Laboratory - Noah Wolfe, Rensselaer Polytechnic Institute - Nikhil Jain, Lawrence Livermore National Laboratory - Jens Domke, Univ. of Dresden - Giorgis Georgakoudis, Lawrence Livermore National Laboratory - Matthieu Dorier, Argonne National Laboratory - Caitlin Ross, Rennselaer Polytechnic Institute - Xu Yang, Illinois Institute of Tech. - Jens Domke, Tokyo Institute of Tech. - Xin Wang, IIT - Xin Wang, Illinois Institute of Tech. - Neil McGlohon, Rensselaer Polytechnic Institute - Elsa Gonsiorowski, Rensselaer Polytechnic Institute - Justin M. Wozniak, Argonne National Laboratory - Robert B. Ross, Argonne National Laboratory - Lee Savoie, Univ. of Arizona - Ning Liu, Rensselaer Polytechnic Institute - Jason Cope, Argonne National Laboratory Contributions: Contributions of external (non-Argonne) collaborators: Misbah Mubarak (ANL) - Introduced 1-D dragonfly and enhanced torus network model. - Added quality of service in dragonfly and megafly network models. - Added MPI simulation layer to simulate MPI operations. - Updated and merged burst buffer storage model with 2-D dragonfly. - Added and validated 2-D dragonfly network model. - Added multiple workload sources including MPI communication, Scalable Workload Models, DUMPI communication traces. - Added online simulation capability with Argobots and SWMs. - Instrumented the network models to report time-stepped series statistics. - Bug fixes for network, storage and workload models with CODES. Neil McGlohon (RPI) - Introduced Dragonfly Plus/Megafly network model. - Merged 1-D dragonfly and 2-D dragonfly network models. - Updated adaptive routing in megafly and 1-D dragonfly network models. - Extended slim fly network model's dual-rail mode to arbitrary number of rails (pending). Nikhil Jain, Abhinav Bhatele (LLNL) - Improvements in credit-based flow control of CODES dragonfly and torus network models. ... ... @@ -29,6 +55,12 @@ Jens Domke (U. of Dresden) - Static routing in fat tree network model including ground work for dumping the topology and reading the routing tables. John Jenkins - Introduced storage models in a separate codes-storage-repo. - Enhanced the codes-mapping APIs to map advanced combinations on PEs. - Bug fixing with network models. - Bug fixing with MPI simulation layer. Xu Yang (IIT) - Added support for running multiple application workloads with CODES MPI Simulation layer, along with supporting scripts and utilities. ... ... @@ -39,6 +71,7 @@ Noah Wolfe (RPI): - Added a fat tree network model that supports full and pruned fat tree network. - Added a multi-rail implementation for the fat tree networks (pending). - Added a dual-rail implementation for slim fly networks (pending). - Bug reporter for CODES network models. Caitlin Ross (RPI): ... ...
 COPYRIGHT The following is a notice of limited availability of the code, and disclaimer which must be included in the prologue of the code and in all source listings of the code. Copyright Notice + 2013 University of Chicago Permission is hereby granted to use, reproduce, prepare derivative works, and to redistribute to others. This software was authored by: Mathematics and Computer Science Division Argonne National Laboratory, Argonne IL 60439 (and) Computer Science Department Rensselaer Polytechnic Institute, Troy NY 12180 GOVERNMENT LICENSE Portions of this material resulted from work developed under a U.S. Government Contract and are subject to the following license: the Government is granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable worldwide license in this computer software to reproduce, prepare derivative works, and perform publicly and display publicly. DISCLAIMER This computer code material was prepared, in part, as an account of work sponsored by an agency of the United States Government. Neither the United States, nor the University of Chicago, nor any of their employees, makes any warranty express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.
 ************** Copyright © 2019, UChicago Argonne, LLC *************** All Rights Reserved Software Name: CO-Design of Exascale Storage and Network Architectures (CODES) By: Argonne National Laboratory, Rensselaer Polytechnic Institute, Lawrence Livermore National Laboratory, and Illinois Institute of Technology OPEN SOURCE LICENSE Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. ****************************************************************************************************** DISCLAIMER THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ***************************************************************************************************
 ************** Copyright © 2019, UChicago Argonne, LLC *************** All Rights Reserved Software Name: CO-Design of Exascale Storage and Network Architectures (CODES) By: Argonne National Laboratory, Rensselaer Polytechnic Institute, Lawrence Livermore National Laboratory, and Illinois Institute of Technology OPEN SOURCE LICENSE Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. ****************************************************************************************************** DISCLAIMER THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ***************************************************************************************************
 # **IMPORTANT NOTICE** ## THE CODES PROJECT HAS BEEN MOVED https://github.com/codes-org/codes As of June 21, 2019, the location of the repo has been moved to GitHub as part of the codebase's shift to its open source BSD License. All issues will be moved in some form to the new location followed by merge requests (if possible). I will do my best to keep the master branch of the two repositories up to date with each other. That being said, if there is any conflict, the GitHub repo should be taken as the truth. Thank you, Neil McGlohon https://github.com/codes-org/codes # CODES Discrete-event Simulation Framework https://xgitlab.cels.anl.gov/codes/codes/wikis/home Discrete event driven simulation of HPC system architectures and subsystems has emerged as a productive and cost-effective means to evaluating potential HPC designs, along with capabilities for executing simulations of extreme scale systems. The goal of the CODES project is use highly parallel simulation to explore the design of exascale storage/network architectures and distributed data-intensive science facilities. Discrete event driven simulation of HPC system architectures and subsystems has emerged as a productive and cost-effective means to evaluating potential HPC designs, along with capabilities for executing simulations of extreme scale systems. The goal of the CODES project is to use highly parallel simulation to explore the design of exascale storage/network architectures and distributed data-intensive science facilities. Our simulations build upon the Rensselaer Optimistic Simulation System (ROSS), a discrete event simulation framework that allows simulations to be run in parallel, decreasing the simulation run time of massive simulations to hours. We are using ROSS to explore topics including large-scale storage systems, I/O workloads, HPC network fabrics, distributed science systems, and data-intensive computation environments. ... ...
 ## README for using ROSS instrumentation with CODES For details about the ROSS instrumentation, see the [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/instrumentation/instrumentation.html) For details about the ROSS instrumentation, see the [ROSS Instrumentation blog post](http://ross-org.github.io/instrumentation/instrumentation.html) on the ROSS webpage. There are currently 4 types of instrumentation: GVT-based, real time sampling, virtual time sampling, and event tracing. See the ROSS documentation for more info on the specific options or use --help with your model. To collect data about the simulation engine, no changes are needed to model code for any of the instrumentation modes. Some additions to the model code is needed in order to turn on any model-level data collection. See the "Model-level data sampling" section on [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/instrumentation/instrumentation.html). See the "Model-level data sampling" section on [ROSS Instrumentation blog post](http://ross-org.github.io/instrumentation/instrumentation.html). Here we describe CODES specific details. ### Register Instrumentation Callback Functions ... ... @@ -37,13 +37,13 @@ The second pointer is for the data to be sampled at the GVT or real time samplin In this case the LPs have different function pointers since we want to collect different types of data for the two LP types. For the terminal, I set the appropriate size of the data to be collected, but for the router, the size of the data is dependent on the radix for the dragonfly configuration being used, which isn't known until runtime. *Note*: You can only reuse the function for event tracing for LPs that use the same type of message struct. *Note*: You can only reuse the function for event tracing for LPs that use the same type of message struct. For example, the dragonfly terminal and router LPs both use the terminal_message struct, so they can use the same functions for event tracing. However the model net base LP uses the model_net_wrap_msg struct, so it gets its own event collection function and st_trace_type struct, in order to read the event type correctly from the model. use the same functions for event tracing. However the model net base LP uses the model_net_wrap_msg struct, so it gets its own event collection function and st_trace_type struct, in order to read the event type correctly from the model. In the ROSS instrumentation documentation, there are two methods provided for letting ROSS know about these st_model_types structs. In CODES, this step is a little different, as codes_mapping_setup() calls tw_lp_settype(). In the ROSS instrumentation documentation, there are two methods provided for letting ROSS know about these st_model_types structs. In CODES, this step is a little different, as codes_mapping_setup() calls tw_lp_settype(). Instead, you add a function to return this struct for each of your LP types: C static const st_model_types *dragonfly_get_model_types(void) ... ... @@ -73,7 +73,7 @@ static void router_register_model_types(st_model_types *base_type) At this point, there are two different steps to follow depending on whether the model is one of the model-net models or not. ##### Model-net Models In the model_net_method struct, two fields have been added: mn_model_stat_register and mn_get_model_stat_types. In the model_net_method struct, two fields have been added: mn_model_stat_register and mn_get_model_stat_types. You need to set these to the functions described above. For example: C ... ... @@ -109,27 +109,27 @@ st_model_types svr_model_types[] = { static void svr_register_model_types() { st_model_type_register("server", &svr_model_types[0]); st_model_type_register("ns-lp", &svr_model_types[0]); } int main(int argc, char **argv) { // ... some set up removed for brevity model_net_register(); svr_add_lp_type(); if (g_st_ev_trace || g_st_model_stats) svr_register_model_types(); codes_mapping_setup(); //... }  g_st_ev_trace is a ROSS flag for determining if event tracing is turned on and g_st_model_stats` determines if the GVT-based or real time instrumentation modes are collecting model-level data as well. modes are collecting model-level data as well. ### CODES LPs that currently have event type collection implemented: ... ... @@ -144,4 +144,3 @@ If you're using any of the following CODES models, you don't have to add anythin - slimfly router and terminal LPs (slimfly.c) - fat tree switch and terminal LPs (fat-tree.c) - model-net-base-lp (model-net-lp.c)
 ... ... @@ -255,22 +255,34 @@ void ConnectionManager::add_connection(int dest_gid, ConnectionType type) switch (type) { case CONN_LOCAL: conn.port = this->get_used_ports_for(CONN_LOCAL); intraGroupConnections[conn.dest_lid].push_back(conn); _used_intra_ports++; if (intraGroupConnections.size() < _max_intra_ports) { conn.port = this->get_used_ports_for(CONN_LOCAL); intraGroupConnections[conn.dest_lid].push_back(conn); _used_intra_ports++; } else tw_error(TW_LOC,"Attempting to add too many local connections per router - exceeding configuration value: %d",_max_intra_ports); break; case CONN_GLOBAL: conn.port = _max_intra_ports + this->get_used_ports_for(CONN_GLOBAL); globalConnections[conn.dest_gid].push_back(conn); _used_inter_ports++; if(globalConnections.size() < _max_inter_ports) { conn.port = _max_intra_ports + this->get_used_ports_for(CONN_GLOBAL); globalConnections[conn.dest_gid].push_back(conn); _used_inter_ports++; } else tw_error(TW_LOC,"Attempting to add too many global connections per router - exceeding configuration value: %d",_max_inter_ports); break; case CONN_TERMINAL: conn.port = _max_intra_ports + _max_inter_ports + this->get_used_ports_for(CONN_TERMINAL); conn.dest_group_id = _source_group; terminalConnections[conn.dest_gid].push_back(conn); _used_terminal_ports++; if(terminalConnections.size() < _max_terminal_ports){ conn.port = _max_intra_ports + _max_inter_ports + this->get_used_ports_for(CONN_TERMINAL); conn.dest_group_id = _source_group; terminalConnections[conn.dest_gid].push_back(conn); _used_terminal_ports++; } else tw_error(TW_LOC,"Attempting to add too many terminal connections per router - exceeding configuration value: %d",_max_terminal_ports); break; default: ... ... @@ -534,18 +546,19 @@ void ConnectionManager::print_connections() int group_id = it->second.dest_group_id; int id,gid; if( get_port_type(port_num) == CONN_LOCAL ) { if( get_port_type(port_num) == CONN_LOCAL ) { id = it->second.dest_lid; gid = it->second.dest_gid; printf(" %d -> (%d,%d) : %d \n", port_num, id, gid, group_id); printf(" %d -> (%d,%d) : %d - LOCAL\n", port_num, id, gid, group_id); } else if (get_port_type(port_num) == CONN_GLOBAL) { id = it->second.dest_gid; printf(" %d -> %d : %d - GLOBAL\n", port_num, id, group_id); } else { else if (get_port_type(port_num) == CONN_TERMINAL) { id = it->second.dest_gid; printf(" %d -> %d : %d \n", port_num, id, group_id); printf(" %d -> %d : %d - TERMINAL\n", port_num, id, group_id); } ports_printed++; ... ...
 ... ... @@ -24,6 +24,7 @@ extern "C" { #include "net/dragonfly.h" #include "net/dragonfly-custom.h" #include "net/dragonfly-plus.h" #include "net/dragonfly-dally.h" #include "net/slimfly.h" #include "net/fattree.h" #include "net/loggp.h" ... ... @@ -133,6 +134,7 @@ typedef struct model_net_wrap_msg { terminal_message m_dfly; // dragonfly terminal_custom_message m_custom_dfly; // dragonfly-custom terminal_plus_message m_dfly_plus; // dragonfly plus terminal_dally_message m_dally_dfly; // dragonfly dally slim_terminal_message m_slim; // slimfly fattree_message m_fat; // fattree loggp_message m_loggp; // loggp ... ...
 ... ... @@ -76,6 +76,8 @@ typedef struct mn_stats mn_stats; X(EXPRESS_MESH_ROUTER, "modelnet_express_mesh_router", "express_mesh_router", &express_mesh_router_method)\ X(DRAGONFLY_PLUS, "modelnet_dragonfly_plus", "dragonfly_plus", &dragonfly_plus_method)\ X(DRAGONFLY_PLUS_ROUTER, "modelnet_dragonfly_plus_router", "dragonfly_plus_router", &dragonfly_plus_router_method)\ X(DRAGONFLY_DALLY, "modelnet_dragonfly_dally", "dragonfly_dally", &dragonfly_dally_method)\ X(DRAGONFLY_DALLY_ROUTER, "modelnet_dragonfly_dally_router", "dragonfly_dally_router", &dragonfly_dally_router_method)\ X(MAX_NETS, NULL, NULL, NULL) #define X(a,b,c,d) a, ... ...
 /* * Copyright (C) 2014 University of Chicago. * See COPYRIGHT notice in top-level directory. * */ #ifndef DRAGONFLY_DALLY_H #define DRAGONFLY_DALLY_H #ifdef __cplusplus extern "C" { #endif #include typedef struct terminal_dally_message terminal_dally_message; /* this message is used for both dragonfly compute nodes and routers */ struct terminal_dally_message { /* magic number */ int magic; /* flit travel start time*/ tw_stime travel_start_time; /* packet ID of the flit */ unsigned long long packet_ID; /* event type of the flit */ short type; /* category: comes from codes */ char category[CATEGORY_NAME_MAX]; /* store category hash in the event */ uint32_t category_hash; /* final destination LP ID, this comes from codes can be a server or any other LP type*/ tw_lpid final_dest_gid; /*sending LP ID from CODES, can be a server or any other LP type */ tw_lpid sender_lp; tw_lpid sender_mn_lp; // source modelnet id /* destination terminal ID of the dragonfly */ tw_lpid dest_terminal_id; /* source terminal ID of the dragonfly */ unsigned int src_terminal_id; /* message originating router id. MM: Can we calculate it through * sender_mn_lp??*/ unsigned int origin_router_id; /* number of hops traversed by the packet */ short my_N_hop; short my_l_hop, my_g_hop; short saved_channel; short saved_vc; int next_stop; short nonmin_done; /* Intermediate LP ID from which this message is coming */ unsigned int intm_lp_id; /* last hop of the message, can be a terminal, local router or global router */ short last_hop; /* For routing */ int intm_rtr_id; int saved_src_dest; int saved_src_chan; uint32_t chunk_id; uint32_t packet_size; uint32_t message_id; uint32_t total_size; int remote_event_size_bytes; int local_event_size_bytes; // For buffer message short vc_index; int output_chan; model_net_event_return event_rc; int is_pull; uint32_t pull_size; int path_type; /* for reverse computation */ short num_rngs; short num_cll; int qos_index; short last_saved_qos; short qos_reset1; short qos_reset2; tw_stime saved_available_time; tw_stime saved_avg_time; tw_stime saved_rcv_time; tw_stime saved_busy_time; tw_stime saved_total_time; tw_stime saved_sample_time; tw_stime msg_start_time; tw_stime saved_busy_time_ross; tw_stime saved_fin_chunks_ross; }; #ifdef __cplusplus } #endif #endif /* end of include guard: DRAGONFLY_H */ /* * Local variables: * c-indent-level: 4 * c-basic-offset: 4 * End: * * vim: ft=c ts=8 sts=4 sw=4 expandtab */
 ... ... @@ -83,6 +83,21 @@ struct terminal_plus_message int is_pull; uint32_t pull_size; /* for counting reverse calls */ short num_rngs; short num_cll; /* qos related attributes */ short last_saved_qos; short qos_reset1; short qos_reset2; /* new qos rc - These are calloced in forward events, free'd in RC or commit_f */ /* note: dynamic memory here is OK since it's only accessed by the LP that alloced it in the first place. */ short rc_is_qos_set; unsigned long long * rc_qos_data; int * rc_qos_status; /* for reverse computation */ int path_type; tw_stime saved_available_time; ... ...
 ... ... @@ -2,7 +2,7 @@ # Process this file with autoconf to produce a configure script. AC_PREREQ([2.67]) AC_INIT([codes], [1.0.0], [http://trac.mcs.anl.gov/projects/codes/newticket],[],[http://www.mcs.anl.gov/projects/codes/]) AC_INIT([codes], [1.1.0], [http://trac.mcs.anl.gov/projects/codes/newticket],[],[http://www.mcs.anl.gov/projects/codes/]) LT_INIT ... ... @@ -230,4 +230,7 @@ AC_CONFIG_FILES([Makefile]) AC_OUTPUT([maint/codes.pc]) AC_OUTPUT([src/network-workloads/conf/dragonfly-custom/modelnet-test-dragonfly-1728-nodes.conf]) AC_OUTPUT([src/network-workloads/conf/dragonfly-plus/modelnet-test-dragonfly-plus.conf]) AC_OUTPUT([src/network-workloads/conf/dragonfly-dally/modelnet-test-dragonfly-dally.conf])
 ... ... @@ -2,12 +2,12 @@ NOTE: see bottom of this file for suggested configurations on particular ANL machines. 0 - Checkout, build, and install the trunk version of ROSS (https://github.com/carothersc/ROSS). At the time of (https://github.com/ross-org/ROSS). At the time of release (0.6.0), ROSS's latest commit hash was 10d7a06b2d, so this revision is "safe" in the unlikely case incompatible changes come along in the future. If working from the CODES master branches, use the ROSS master branch. git clone http://github.com/carothersc/ROSS.git git clone http://github.com/ross-org/ROSS.git # if using 0.5.2 release: git checkout d3bdc07 cd ROSS mkdir build ... ... @@ -22,7 +22,7 @@ working from the CODES master branches, use the ROSS master branch. ROSS/install/ directory> For more details on installing ROSS, go to https://github.com/carothersc/ROSS/blob/master/README.md . https://github.com/ross-org/ROSS/blob/master/README.md . If using ccmake to configure, don't forget to set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to mpicc/mpicxx ... ...
 ... ... @@ -16,7 +16,7 @@ per compute node or one multi-port NIC per node. Adding a generic template for building new network models. For simplest case, only 2 functions and premable changes should suffice to add a new network. Updated Express Mesh network model to serve as an example. For details, see Updated Express Mesh network model to serve as an example. For details, see Darshan workload generator has been updated to use Darshan version 3.x. ... ... @@ -28,11 +28,11 @@ https://xgitlab.cels.anl.gov/codes/codes/wikis/Using-ROSS-Instrumentation-with-C Compatible with ROSS version that enables statistics collection of simulation performance. For details see: http://carothersc.github.io/ROSS/instrumentation/instrumentation.html http://ross-org.github.io/instrumentation/instrumentation.html Online workload replay functionality has been added that allows SWM workloads to be simulated insitu on the network models. WIP to integrate Conceptual domain specific language for network communication. domain specific language for network communication. Multiple traffic patterns were added in the background traffic generation including stencil, all-to-all and random permutation. ... ... @@ -93,7 +93,7 @@ Background network communication using uniform random workload can now be generated. The traffic generation gets automatically shut off when the main workload finishes. Collectives can now be translated into point to point using the CoRTex library. Collectives can now be translated into point to point using the CoRTex library. Performance of MPI_AllReduce is reported when debug_cols option is enabled. ... ...
 ... ... @@ -121,7 +121,7 @@ xleftmargin=6ex % IEEEtran.cls handling of captions and this will result in nonIEEE style % figure/table captions. To prevent this problem, be sure and preload % caption.sty with its "caption=false" package option. This is will preserve % IEEEtran.cls handing of captions. Version 1.3 (2005/06/28) and later % IEEEtran.cls handing of captions. Version 1.3 (2005/06/28) and later % (recommended due to many improvements over 1.2) of subfig.sty supports % the caption=false option directly: %\usepackage[caption=false,font=footnotesize]{subfig} ... ... @@ -188,7 +188,7 @@ easily shared and reused. It also includes a few tips to help avoid common simulation bugs. For more information, ROSS has a bunch of documentation available in their repository/wiki - see \url{https://github.com/carothersc/ROSS}. repository/wiki - see \url{https://github.com/ross-org/ROSS}. \end{abstract} \section{CODES: modularizing models} ... ... @@ -394,7 +394,7 @@ action upon the completion of them. More generally, the problem is: an event issuance (an ack to the client) is based on the completion of more than one asynchronous/parallel events (local write on primary server, forwarding write to replica server). Further complicating the matter for storage simulations, there can be any number of outstanding requests, each waiting on multiple events. can be any number of outstanding requests, each waiting on multiple events. In ROSS's sequential and conservative parallel modes, the necessary state can easily be stored in the LP as a queue of statuses for each set of events, ... ... @@ -488,7 +488,7 @@ Most core ROSS examples are design to intentionally hit the end timestamp for the simulation (i.e. they are modeling a continuous, steady state system). This isn't necessarily true for other models. Quite simply, set g\_tw\_ts\_end to an arbitrary large number when running simulations that have a well-defined end-point in terms of events processed. that have a well-defined end-point in terms of events processed. Within the LP finalize function, do not call tw\_now. The time returned may not be consistent in the case of an optimistic simulation. ... ... @@ -515,7 +515,7 @@ section(s). \item generating multiple concurrent events makes rollback more difficult \end{enumerate} \item use dummy events to work around "event-less" advancement of simulation time \item use dummy events to work around "event-less" advancement of simulation time \item add a small amount of time "noise" to events to prevent ties ... ...
 ... ... @@ -44,7 +44,7 @@ Notes on how to release a new version of CODES 4. Upload the release tarball - Our release directory is at ftp.mcs.anl.gov/pub/CODES/releases . There's no web interface, so you have to get onto an MCS workstation and copy the release in that way (the ftp server is mounted at /homes/ftp). release in that way (the ftp server is mounted at /mcs/ftp.mcs.anl.gov). 5. Update website - Project wordpress: http://www.mcs.anl.gov/projects/codes/ (you need ... ...
 ... ... @@ -246,7 +246,7 @@ int main( /* calculate the number of servers in this simulation, * ignoring annotations */ num_servers = codes_mapping_get_lp_count(group_name, 0, "server", NULL, 1); num_servers = codes_mapping_get_lp_count(group_name, 0, "nw-lp", NULL, 1); /* for this example, we read from a separate configuration group for * server message parameters. Since they are constant for all LPs, ... ... @@ -273,7 +273,7 @@ static void svr_add_lp_type() { /* lp_type_register should be called exactly once per process per * LP type */ lp_type_register("server", svr_get_lp_type()); lp_type_register("nw-lp", svr_get_lp_type()); } static void svr_init( ... ...
 ... ... @@ -3,14 +3,14 @@ # of application- and codes-specific key-value pairs. LPGROUPS { # in our simulation, we simply have a set of servers, each with # in our simulation, we simply have a set of servers (nw-lp), each with # point-to-point access to each other SERVERS { # required: number of times to repeat the following key-value pairs repetitions="16"; # application-specific: parsed in main server="1"; nw-lp="1"; # model-net-specific field defining the network backend. In this example, # each server has one NIC, and each server are point-to-point connected modelnet_simplenet="1"; ... ...
 ... ... @@ -18,6 +18,7 @@ argobots_libs=@ARGOBOTS_LIBS@ argobots_cflags=@ARGOBOTS_CFLAGS@ swm_libs=@SWM_LIBS@ swm_cflags=@SWM_CFLAGS@ swm_datarootdir=@SWM_DATAROOTDIR@ Name: codes-base Description: Base functionality for CODES storage simulation ... ... @@ -25,4 +26,4 @@ Version: @PACKAGE_VERSION@ URL: http://trac.mcs.anl.gov/projects/CODES Requires: Libs: -L${libdir} -lcodes${ross_libs} ${argobots_libs}${swm_libs} ${darshan_libs}${dumpi_libs} ${cortex_libs} Cflags: -I${includedir} ${swm_datarootdir}${ross_cflags} ${darshan_cflags}${swm_cflags} ${argobots_cflags}${dumpi_cflags} ${cortex_cflags} Cflags: -I${includedir} -I${swm_datarootdir}${ross_cflags} ${darshan_cflags}${swm_cflags} ${argobots_cflags}${dumpi_cflags} ${cortex_cflags}  CONT 3456 rand 1024 2 1 1024 128 512 512  CONT 3456 1024 128 rand 1008 512 496 This diff is collapsed.  ... ... @@ -4,7 +4,7 @@ # In hindsight this was a lot more complicated than I intended. It was looking to solve a complex problem that turned out to be invalid from the beginning. ### USAGE ### # Correct usage: python3 script.py # Correct usage: python3 dragonfly-plus-topo-gen-v2.py ### ### import sys ... ... @@ -573,37 +573,37 @@ def mainV3(): print(A.astype(int)) def mainV2(): if(len(argv) < 8): raise Exception("Correct usage: python %s " % sys.argv[0]) # def mainV2(): # if(len(argv) < 8): # raise Exception("Correct usage: python %s " % sys.argv[0]) num_groups = int(argv[1]) num_spine_pg = int(argv[2]) num_leaf_pg = int(argv[3]) router_radix = int(argv[4]) term_per_leaf = int(argv[5]) intra_filename = argv[6] inter_filename = argv[7] # num_groups = int(argv[1]) # num_spine_pg = int(argv[2]) # num_leaf_pg = int(argv[3]) # router_radix = int(argv[4]) # term_per_leaf = int(argv[5]) # intra_filename = argv[6] # inter_filename = argv[7] parseOptionArguments() # parseOptionArguments() dfp_network = DragonflyPlusNetwork(num_groups, num_spine_pg, num_leaf_pg, router_radix, num_hosts_per_leaf=term_per_leaf) # dfp_network = DragonflyPlusNetwork(num_groups, num_spine_pg, num_leaf_pg, router_radix, num_hosts_per_leaf=term_per_leaf) if not DRYRUN: dfp_network.writeIntraconnectionFile(intra_filename) dfp_network.writeInterconnectionFile(inter_filename) # if not DRYRUN: # dfp_network.writeIntraconnectionFile(intra_filename) # dfp_network.writeInterconnectionFile(inter_filename) if LOUDNESS is not Loudness.QUIET: print("\nNOTE: THIS STILL CAN'T DO THE MED-LARGE TOPOLOGY RIGHT\n") # if LOUDNESS is not Loudness.QUIET: # print("\nNOTE: THIS STILL CAN'T DO THE MED-LARGE TOPOLOGY RIGHT\n") print(dfp_network.getSummary()) # print(dfp_network.getSummary()) if SHOW_ADJACENCY == 1: print("\nPrinting Adjacency Matrix:") # if SHOW_ADJACENCY == 1: # print("\nPrinting Adjacency Matrix:") np.set_printoptions(linewidth=400,threshold=10000,edgeitems=200) A = dfp_network.getAdjacencyMatrix(AdjacencyType.ALL_CONNS) print(A.astype(int)) # np.set_printoptions(linewidth=400,threshold=10000,edgeitems=200) # A = dfp_network.getAdjacencyMatrix(AdjacencyType.ALL_CONNS) # print(A.astype(int)) if __name__ == '__main__': mainV3()  ... ... @@ -3,14 +3,14 @@ # of application- and codes-specific key-value pairs. LPGROUPS { # in our simulation, we simply have a set of servers, each with # in our simulation, we simply have a set of servers (nw-lp), each with # point-to-point access to each other SERVERS { # required: number of times to repeat the following key-value pairs repetitions="C_NUM_SERVERS"; # application-specific: parsed in main server="1"; nw-lp="1"; # model-net-specific field defining the network backend. In this example, # each server has one NIC, and each server are point-to-point connected modelnet_simplenet="1"; ... ...  ... ... @@ -94,8 +94,9 @@ nobase_include_HEADERS = \ codes/connection-manager.h \ codes/net/common-net.h \ codes/net/dragonfly.h \ codes/net/dragonfly-plus.h \ codes/net/dragonfly-custom.h \ codes/net/dragonfly-dally.h \ codes/net/dragonfly-plus.h \ codes/net/slimfly.h \ codes/net/fattree.h \ codes/net/loggp.h \ ... ... @@ -164,6 +165,7 @@ src_libcodes_la_SOURCES = \ src/networks/model-net/dragonfly.c \ src/networks/model-net/dragonfly-custom.C \ src/networks/model-net/dragonfly-plus.C \ src/networks/model-net/dragonfly-dally.C \ src/networks/model-net/slimfly.c \ src/networks/model-net/fattree.c \ src/networks/model-net/loggp.c \ ... ... @@ -198,6 +200,7 @@ bin_PROGRAMS += src/network-workloads/model-net-synthetic-custom-dfly bin_PROGRAMS += src/network-workloads/model-net-synthetic-slimfly bin_PROGRAMS += src/network-workloads/model-net-synthetic-fattree bin_PROGRAMS += src/network-workloads/model-net-synthetic-dfly-plus bin_PROGRAMS += src/network-workloads/model-net-synthetic-dally-dfly src_workload_codes_workload_dump_SOURCES = \ ... ... @@ -212,6 +215,7 @@ src_network_workloads_model_net_mpi_replay_CFLAGS =$(AM_CFLAGS) src_network_workloads_model_net_synthetic_SOURCES = src/network-workloads/model-net-synthetic.c src_network_workloads_model_net_synthetic_custom_dfly_SOURCES = src/network-workloads/model-net-synthetic-custom-dfly.c src_network_workloads_model_net_synthetic_dfly_plus_SOURCES = src/network-workloads/model-net-synthetic-dfly-plus.c src_network_workloads_model_net_synthetic_dally_dfly_SOURCES = src/network-workloads/model-net-synthetic-dally-dfly.c src_networks_model_net_topology_test_SOURCES = src/networks/model-net/topology-test.c #bin_PROGRAMS += src/network-workload/codes-nw-test ... ...
 ... ... @@ -41,7 +41,7 @@ PARAMS # bandwidth in GiB/s for compute node-router channels cn_bandwidth="16.0"; # ROSS message size message_size="640"; message_size="736"; # number of compute nodes connected to router, dictated by dragonfly config # file num_cns_per_router="2"; ... ...
 ... ... @@ -2,61 +2,61 @@ LPGROUPS { MODELNET_GRP { repetitions="12"; repetitions="1040"; # name of this lp changes according to the model nw-lp="4"; nw-lp="8"; # these lp names will be the same for dragonfly-custom model modelnet_dragonfly_plus="4"; modelnet_dragonfly_plus_router="2"; modelnet_dragonfly_dally="8"; modelnet_dragonfly_dally_router="1"; } } PARAMS { # packet size in the network packet_size="1024"; modelnet_order=( "dragonfly_plus","dragonfly_plus_router" ); packet_size="4096"; modelnet_order=( "dragonfly_dally","dragonfly_dally_router" ); # scheduler options modelnet_scheduler="fcfs"; # chunk size in the network (when chunk size = packet size, packets will not be # divided into chunks) chunk_size="1024"; chunk_size="4096"; # modelnet_scheduler="round-robin"; # number of routers within each group # each router row corresponds to a chassis in Cray systems num_router_spine="4"; # each router column corresponds to a slot in a chassis num_router_leaf="4"; # number of links connecting between group levels per router num_level_chans="1"; # number of groups in the network num_groups="3"; # predefined threshold (T) deciding when to reassign packet to a lower priority queue queue_threshold="50"; num_router_rows="1"; # intra-group columns for routers num_router_cols="16"; # number of groups in the network num_groups="65"; # buffer size in bytes for local virtual channels local_vc_size="8192"; local_vc_size="16384"; #buffer size in bytes for global virtual channels global_vc_size="16384"; #buffer size in bytes for compute node virtual channels cn_vc_size="8192"; cn_vc_size="32768"; #bandwidth in GiB/s for local channels local_bandwidth="5.25"; local_bandwidth="2.0"; # bandwidth in GiB/s for global channels global_bandwidth="1.5"; global_bandwidth="2.0"; # bandwidth in GiB/s for compute node-router channels cn_bandwidth="8.0"; cn_bandwidth="2.0"; # Number of row channels num_row_chans="1"; # Number of column channels num_col_chans="1"; # ROSS message size message_size="608"; message_size="656"; # number of compute nodes connected to router, dictated by dragonfly config # file num_cns_per_router="4"; num_cns_per_router="8"; # number of global channels per router num_global_connections="4"; num_global_channels="8"; # network config file for intra-group connections intra-group-connections="../src/network-workloads/conf/dragonfly-plus/neil-intra"; intra-group-connections="../src/network-workloads/conf/dragonfly-dally/dfdally_8k_intra"; # network config file for inter-group connections inter-group-connections="../src/network-workloads/conf/dragonfly-plus/neil-inter"; inter-group-connections="../src/network-workloads/conf/dragonfly-dally/dfdally_8k_inter"; # routing protocol to be used routing="minimal"; routing="prog-adaptive"; adaptive_threshold="131072"; minimal-bias="1"; df-dally-vc = "1"; }
 LPGROUPS { MODELNET_GRP { repetitions="1040"; # name of this lp changes according to the model nw-lp="8"; # these lp names will be the same for dragonfly-custom model modelnet_dragonfly_dally="8"; modelnet_dragonfly_dally_router="1"; } } PARAMS { # packet size in the network packet_size="4096"; modelnet_order=( "dragonfly_dally","dragonfly_dally_router" ); # scheduler options modelnet_scheduler="fcfs"; # chunk size in the network (when chunk size = packet size, packets will not be # divided into chunks) chunk_size="4096"; # modelnet_scheduler="round-robin"; num_router_rows="1"; # intra-group columns for routers num_router_cols="16"; # number of groups in the network num_groups="65"; # buffer size in bytes for local virtual channels local_vc_size="16384"; #buffer size in bytes for global virtual channels global_vc_size="16384"; #buffer size in bytes for compute node virtual channels cn_vc_size="32768"; #bandwidth in GiB/s for local channels local_bandwidth="2.0"; # bandwidth in GiB/s for global channels global_bandwidth="2.0"; # bandwidth in GiB/s for compute node-router channels cn_bandwidth="2.0"; # Number of row channels num_row_chans="1"; # Number of column channels num_col_chans="1"; # ROSS message size message_size="656"; # number of compute nodes connected to router, dictated by dragonfly config # file num_cns_per_router="8"; # number of global channels per router num_global_channels="8"; # network config file for intra-group connections intra-group-connections="@abs_srcdir@/dfdally_8k_intra"; # network config file for inter-group connections inter-group-connections="@abs_srcdir@/dfdally_8k_inter"; # routing protocol to be used routing="prog-adaptive"; adaptive_threshold="131072"; minimal-bias="1"; df-dally-vc = "1"; }
 ... ... @@ -47,9 +47,9 @@ PARAMS # number of global channels per router num_global_connections="16"; # network config file for intra-group connections intra-group-connections="/gpfs/u/home/SPNR/SPNRmcgl/barn/dfp-df-comparisons/experiments/network_configs/dfp_8k_intra"; intra-group-connections="../src/network-workloads/conf/dragonfly-plus/dfp_8k_intra"; # network config file for inter-group connections inter-group-connections="/gpfs/u/home/SPNR/SPNRmcgl/barn/dfp-df-comparisons/experiments/network_configs/dfp_8k_inter"; inter-group-connections="../src/network-workloads/conf/dragonfly-plus/dfp_8k_inter"; # routing protocol to be used - 'minimal', 'non-minimal-spine', 'non-minimal-leaf', 'prog-adaptive' routing="prog-adaptive"; # route scoring protocol to be used - options are 'alpha', 'beta', or 'delta' - 'gamma' has been deprecated ... ...
 LPGROUPS { MODELNET_GRP { repetitions="1056"; # name of this lp changes according to the model nw-lp="8"; # these lp names will be the same for dragonfly-custom model modelnet_dragonfly_plus="8"; modelnet_dragonfly_plus_router="1"; } } PARAMS { # packet size in the network packet_size="4096"; # order of LPs, mapping for modelnet grp modelnet_order=( "dragonfly_plus","dragonfly_plus_router" ); # scheduler options modelnet_scheduler="fcfs"; # chunk size in the network (when chunk size = packet size, packets will not be divided into chunks) chunk_size="4096"; # number of spine routers per group num_router_spine="16"; # number of leaf routers per group num_router_leaf="16"; # number of links connecting between group levels per router num_level_chans="1"; # number of groups in the network num_groups="33"; # buffer size in bytes for local virtual channels local_vc_size="32768"; # buffer size in bytes for global virtual channels global_vc_size="32768"; # buffer size in bytes for compute node virtual channels cn_vc_size="32768"; # bandwidth in GiB/s for local channels local_bandwidth="25.0"; # bandwidth in GiB/s for global channels global_bandwidth="25.0"; # bandwidth in GiB/s for compute node-router channels cn_bandwidth="25.0"; # ROSS message size message_size="640"; # number of compute nodes connected to router, dictated by dragonfly config file num_cns_per_router="16"; # number of global channels per router num_global_connections="16"; # network config file for intra-group connections intra-group-connections="@abs_srcdir@/dfp_8k_intra"; # network config file for inter-group connections inter-group-connections="@abs_srcdir@/dfp_8k_inter"; # routing protocol to be used - 'minimal', 'non-minimal-spine', 'non-minimal-leaf', 'prog-adaptive' routing="prog-adaptive"; # route scoring protocol to be used - options are 'alpha', 'beta', or 'delta' - 'gamma' has been deprecated route_scoring_metric="delta"; # minimal route threshold before considering non-minimal paths adaptive_threshold="131072"; #1/16 of 32768 }
 ... ... @@ -3,7 +3,7 @@ LPGROUPS MODELNET_GRP { repetitions="198"; server="384"; nw-lp="384"; modelnet_fattree="24"; fattree_switch="6"; } ... ...
 ... ... @@ -3,7 +3,7 @@ LPGROUPS MODELNET_GRP { repetitions="252"; server="288"; nw-lp="288"; modelnet_fattree="18"; fattree_switch="6"; } ... ...
 ... ... @@ -23,6 +23,6 @@ PARAMS local_bandwidth="5.25"; global_bandwidth="4.7"; cn_bandwidth="5.25"; message_size="608"; message_size="656"; routing="adaptive"; }
 ... ... @@ -23,6 +23,6 @@ PARAMS local_bandwidth="5.25"; global_bandwidth="4.7"; cn_bandwidth="5.25"; message_size="640"; message_size="736"; routing="adaptive"; }
 ... ... @@ -12,7 +12,7 @@ PARAMS { ft_type="0"; packet_size="512"; message_size="592"; message_size="736"; chunk_size="512"; modelnet_scheduler="fcfs"; #modelnet_scheduler="round-robin"; ... ...
 ... ... @@ -31,6 +31,6 @@ PARAMS cn_bandwidth="9.0"; router_delay="0"; link_delay="0"; message_size="640"; message_size="736"; routing="minimal"; }
 ... ... @@ -10,7 +10,7 @@ LPGROUPS PARAMS { packet_size="512"; message_size="640"; message_size="736"; modelnet_order=( "torus" ); # scheduler options modelnet_scheduler="fcfs"; ... ...
 ... ... @@ -3,7 +3,7 @@ LPGROUPS MODELNET_GRP { repetitions="264"; server="4"; nw-lp="4"; modelnet_dragonfly="4"; modelnet_dragonfly_router="1"; } ... ...
 ... ... @@ -3,7 +3,7 @@ LPGROUPS MODELNET_GRP { repetitions="198"; # repetitions = Ne = total # of edge switches. For type0 Ne = Np*Ns = ceil(N/Ns*(k/2))*(k/2) = ceil(N/(k/2)^2)*(k/2) server="18"; nw-lp="18"; modelnet_fattree="18"; fattree_switch="3"; } ... ...
 ... ... @@ -3,7 +3,7 @@ LPGROUPS MODELNET_GRP { repetitions="32"; # repetitions = Ne = total # of edge switches. For type0 Ne = Np*Ns = ceil(N/Ns*(k/2))*(k/2) = ceil(N/(k/2)^2)*(k/2) server="4"; nw-lp="4"; modelnet_fattree="4"; fattree_switch="3"; } ... ...
 ... ... @@ -3,7 +3,7 @@ LPGROUPS MODELNET_GRP { repetitions="50"; server="3"; nw-lp=