Commit 7606e0ef authored by Caitlin Ross's avatar Caitlin Ross

making changes based on some recent instrumentation API updates

parent 075727c7
## README for using ROSS instrumentation with CODES
For details about the ROSS instrumentation, see the [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/feature/instrumentation.html)
For details about the ROSS instrumentation, see the [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/instrumentation/instrumentation.html)
on the ROSS webpage.
There are currently 3 types of instrumentation: GVT-based, real time, and event tracing. See the ROSS documentation for more info on
the specific options or use `--help` with your model. To collect data about the simulation engine, no changes are needed to model code
for any of the instrumentation modes. Some additions to the model code is needed in order to turn on any model-level data collection.
See the "Model-level data sampling" section on [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/feature/instrumentation.html).
There are currently 4 types of instrumentation: GVT-based, real time sampling, virtual time sampling, and event tracing.
See the ROSS documentation for more info on the specific options or use `--help` with your model.
To collect data about the simulation engine, no changes are needed to model code for any of the instrumentation modes.
Some additions to the model code is needed in order to turn on any model-level data collection.
See the "Model-level data sampling" section on [ROSS Instrumentation blog post](http://carothersc.github.io/ROSS/instrumentation/instrumentation.html).
Here we describe CODES specific details.
### Register Instrumentation Callback Functions
......@@ -17,15 +18,11 @@ The examples here are based on the dragonfly router and terminal LPs (`src/netwo
As described in the ROSS Vis documentation, we need to create a `st_model_types` struct with the pointer and size information.
```C
st_model_types dragonfly_model_types[] = {
{(rbev_trace_f) dragonfly_event_collect,
sizeof(int),
(ev_trace_f) dragonfly_event_collect,
{(ev_trace_f) dragonfly_event_collect,
sizeof(int),
(model_stat_f) dragonfly_model_stat_collect,
sizeof(tw_lpid) + sizeof(long) * 2 + sizeof(double) + sizeof(tw_stime) * 2},
{(rbev_trace_f) dragonfly_event_collect,
sizeof(int),
(ev_trace_f) dragonfly_event_collect,
{(ev_trace_f) dragonfly_event_collect,
sizeof(int),
(model_stat_f) dfly_router_model_stat_collect,
0}, // updated in router_setup()
......@@ -33,20 +30,17 @@ st_model_types dragonfly_model_types[] = {
}
```
`dragonfly_model_types[0]` is the function pointers for the terminal LP and `dragonfly_model_types[1]` is for the router LP.
For the first two function pointers for each LP, we use the same `dragonfly_event_collec()` because right now we just collect the event type, so
it's the same for both of these LPs. You can change these if you want to use different functions for different LP types or if you want a different
function for the full event tracing than that used for the rollback event trace (`rbev_trace_f` is for the event tracing of rollback triggering events only,
while `ev_trace_f` is for the full event tracing).
The number following each function pointer is the size of the data that will be saved when the function is called.
The third pointer is for the data to be sampled at the GVT or real time sampling points.
For the first function pointer for each LP type, we use the same `dragonfly_event_collect()` because right now we just collect the event type, so it's the same for both of these LP types.
You can change these if you want to use different functions for different LP types.
The number following that function pointer is the size of the data that will be saved when the function is called.
The second pointer is for the data to be sampled at the GVT or real time sampling points.
In this case the LPs have different function pointers since we want to collect different types of data for the two LP types.
For the terminal, I set the appropriate size of the data to be collected, but for the router, the size of the data is dependent on the radix for the
dragonfly configuration being used, which isn't known until runtime.
For the terminal, I set the appropriate size of the data to be collected, but for the router, the size of the data is dependent on the radix for the dragonfly configuration being used, which isn't known until runtime.
*Note*: You can only reuse the function for event tracing for LPs that use the same type of message struct.
For example, the dragonfly terminal and router LPs both use the `terminal_message` struct, so they can
use the same functions for event tracing. However the model net base LP uses the `model_net_wrap_msg` struct, so it gets its own event collection function and
`st_trace_type` struct, in order to read the event type correctly from the model.
use the same functions for event tracing.
However the model net base LP uses the `model_net_wrap_msg` struct, so it gets its own event collection function and `st_trace_type` struct, in order to read the event type correctly from the model.
In the ROSS instrumentation documentation, there are two methods provided for letting ROSS know about these `st_model_types` structs.
In CODES, this step is a little different, as `codes_mapping_setup()` calls `tw_lp_settype()`.
......@@ -106,9 +100,7 @@ Using the synthetic workload LP for dragonfly as an example (`src/network-worklo
In the main function, you call the register function *before* calling `codes_mapping_setup()`.
```C
st_model_types svr_model_types[] = {
{(rbev_trace_f) svr_event_collect,
sizeof(int),
(ev_trace_f) svr_event_collect,
{(ev_trace_f) svr_event_collect,
sizeof(int),
(model_stat_f) svr_model_stat_collect,
0}, // at the moment, we're not actually collecting any data about this LP
......@@ -143,10 +135,15 @@ modes are collecting model-level data as well.
### CODES LPs that currently have event type collection implemented:
If you're using any of the following CODES models, you don't have to add anything, unless you want to change the data that's being collected.
- nw-lp (model-net-mpi-replay.c)
- original dragonfly router and terminal LPs (dragonfly.c)
- dfly server LP (model-net-synthetic.c)
- model-net-base-lp (model-net-lp.c)
- custom dfly server LP (model-net-synthetic-custom-dfly.c)
- fat tree server LP (model-net-synthetic-fattree.c)
- slimfly server LP (model-net-synthetic-slimfly.c)
- original dragonfly router and terminal LPs (dragonfly.c)
- dragonfly custom router and terminal LPs (dragonfly-custom.C)
- slimfly router and terminal LPs (slimfly.c)
- fat tree switch and terminal LPs (fat-tree.c)
- model-net-base-lp (model-net-lp.c)
The fat-tree terminal and switch LPs (fattree.c) are only partially implemented at the moment. It needs two `model_net_method` structs to fully implement,
but currently both terminal and switch LPs use the same `fattree_method` struct.
......@@ -2554,8 +2554,6 @@ static void nw_add_lp_type()
}
/* setup for the ROSS event tracing
* can have a different function for rbev_trace_f and ev_trace_f
* but right now it is set to the same function for both
*/
void nw_lp_event_collect(nw_message *m, tw_lp *lp, char *buffer, int *collect_flag)
{
......@@ -2580,16 +2578,14 @@ void nw_lp_model_stat_collect(nw_state *s, tw_lp *lp, char *buffer)
}
st_model_types nw_lp_model_types[] = {
{(rbev_trace_f) nw_lp_event_collect,
sizeof(int),
(ev_trace_f) nw_lp_event_collect,
{(ev_trace_f) nw_lp_event_collect,
sizeof(int),
(model_stat_f) nw_lp_model_stat_collect,
0,
NULL,
NULL,
0},
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
static const st_model_types *nw_lp_get_model_stat_types(void)
......
......@@ -108,8 +108,6 @@ tw_lptype svr_lp = {
};
/* setup for the ROSS event tracing
* can have a different function for rbev_trace_f and ev_trace_f
* but right now it is set to the same function for both
*/
void custom_svr_event_collect(svr_msg *m, tw_lp *lp, char *buffer, int *collect_flag)
{
......@@ -132,16 +130,14 @@ void custom_svr_model_stat_collect(svr_state *s, tw_lp *lp, char *buffer)
}
st_model_types custom_svr_model_types[] = {
{(rbev_trace_f) custom_svr_event_collect,
sizeof(int),
(ev_trace_f) custom_svr_event_collect,
{(ev_trace_f) custom_svr_event_collect,
sizeof(int),
(model_stat_f) custom_svr_model_stat_collect,
0,
NULL,
NULL,
0},
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
static const st_model_types *custom_svr_get_model_stat_types(void)
......
......@@ -123,8 +123,6 @@ tw_lptype svr_lp = {
};
/* setup for the ROSS event tracing
* can have a different function for rbev_trace_f and ev_trace_f
* but right now it is set to the same function for both
*/
void ft_svr_event_collect(svr_msg *m, tw_lp *lp, char *buffer, int *collect_flag)
{
......@@ -150,16 +148,14 @@ void ft_svr_model_stat_collect(svr_state *s, tw_lp *lp, char *buffer)
}
st_model_types ft_svr_model_types[] = {
{(rbev_trace_f) ft_svr_event_collect,
sizeof(int),
(ev_trace_f) ft_svr_event_collect,
{(ev_trace_f) ft_svr_event_collect,
sizeof(int),
(model_stat_f) ft_svr_model_stat_collect,
0,
NULL,
NULL,
0},
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
NULL,
NULL,
0},
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
static const st_model_types *ft_svr_get_model_stat_types(void)
......
......@@ -121,8 +121,6 @@ tw_lptype svr_lp = {
};
/* setup for the ROSS event tracing
* can have a different function for rbev_trace_f and ev_trace_f
* but right now it is set to the same function for both
*/
void svr_event_collect(svr_msg *m, tw_lp *lp, char *buffer, int *collect_flag)
{
......@@ -145,16 +143,14 @@ void svr_model_stat_collect(svr_state *s, tw_lp *lp, char *buffer)
}
st_model_types svr_model_types[] = {
{(rbev_trace_f) svr_event_collect,
sizeof(int),
(ev_trace_f) svr_event_collect,
{(ev_trace_f) svr_event_collect,
sizeof(int),
(model_stat_f) svr_model_stat_collect,
0,
NULL,
NULL,
0},
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
static const st_model_types *svr_get_model_stat_types(void)
......
......@@ -108,8 +108,6 @@ tw_lptype svr_lp = {
};
/* setup for the ROSS event tracing
* can have a different function for rbev_trace_f and ev_trace_f
* but right now it is set to the same function for both
*/
void svr_event_collect(svr_msg *m, tw_lp *lp, char *buffer, int *collect_flag)
{
......@@ -132,16 +130,14 @@ void svr_model_stat_collect(svr_state *s, tw_lp *lp, char *buffer)
}
st_model_types svr_model_types[] = {
{(rbev_trace_f) svr_event_collect,
sizeof(int),
(ev_trace_f) svr_event_collect,
{(ev_trace_f) svr_event_collect,
sizeof(int),
(model_stat_f) svr_model_stat_collect,
0,
NULL,
NULL,
0},
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
static const st_model_types *svr_get_model_stat_types(void)
......
......@@ -382,25 +382,21 @@ static void ross_custom_dragonfly_sample_fn(terminal_state * s, tw_bf * bf, tw_l
static void ross_custom_dragonfly_sample_rc_fn(terminal_state * s, tw_bf * bf, tw_lp * lp, struct dfly_cn_sample *sample);
st_model_types custom_dragonfly_model_types[] = {
{(rbev_trace_f) custom_dragonfly_event_collect,
sizeof(int),
(ev_trace_f) custom_dragonfly_event_collect,
{(ev_trace_f) custom_dragonfly_event_collect,
sizeof(int),
(model_stat_f) custom_dragonfly_model_stat_collect,
sizeof(tw_lpid) + sizeof(long) * 2 + sizeof(double) + sizeof(tw_stime) *2,
(sample_event_f) ross_custom_dragonfly_sample_fn,
(sample_revent_f) ross_custom_dragonfly_sample_rc_fn,
sizeof(struct dfly_cn_sample) } ,
{(rbev_trace_f) custom_dragonfly_event_collect,
sizeof(int),
(ev_trace_f) custom_dragonfly_event_collect,
{(ev_trace_f) custom_dragonfly_event_collect,
sizeof(int),
(model_stat_f) custom_dfly_router_model_stat_collect,
0, //updated in router_custom_setup() since it's based on the radix
(sample_event_f) ross_custom_dragonfly_rsample_fn,
(sample_revent_f) ross_custom_dragonfly_rsample_rc_fn,
0 } , //updated in router_custom_setup() since it's based on the radix
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
/* End of ROSS model stats collection */
......
......@@ -381,25 +381,21 @@ static void ross_dragonfly_sample_fn(terminal_state * s, tw_bf * bf, tw_lp * lp,
static void ross_dragonfly_sample_rc_fn(terminal_state * s, tw_bf * bf, tw_lp * lp, struct dfly_cn_sample *sample);
st_model_types dragonfly_model_types[] = {
{(rbev_trace_f) dragonfly_event_collect,
sizeof(int),
(ev_trace_f) dragonfly_event_collect,
{(ev_trace_f) dragonfly_event_collect,
sizeof(int),
(model_stat_f) dragonfly_model_stat_collect,
sizeof(tw_lpid) + sizeof(long) * 2 + sizeof(double) + sizeof(tw_stime) *2,
(sample_event_f) ross_dragonfly_sample_fn,
(sample_revent_f) ross_dragonfly_sample_rc_fn,
sizeof(struct dfly_cn_sample) } ,
{(rbev_trace_f) dragonfly_event_collect,
sizeof(int),
(ev_trace_f) dragonfly_event_collect,
{(ev_trace_f) dragonfly_event_collect,
sizeof(int),
(model_stat_f) dfly_router_model_stat_collect,
0, //updated in router_setup() since it's based on the radix
(sample_event_f) ross_dragonfly_rsample_fn,
(sample_revent_f) ross_dragonfly_rsample_rc_fn,
0 } , //updated in router_setup() since it's based on the radix
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
/* End of ROSS model stats collection */
......
......@@ -319,25 +319,21 @@ static void ross_fattree_ssample_fn(switch_state * s, tw_bf * bf, tw_lp * lp, st
static void ross_fattree_ssample_rc_fn(switch_state * s, tw_bf * bf, tw_lp * lp, struct fattree_switch_sample *sample);
st_model_types fattree_model_types[] = {
{(rbev_trace_f) fattree_event_collect,
sizeof(int),
(ev_trace_f) fattree_event_collect,
{(ev_trace_f) fattree_event_collect,
sizeof(int),
(model_stat_f) fattree_model_stat_collect,
0, // update when changing fattree_model_stat_collect
(sample_event_f) ross_fattree_sample_fn,
(sample_revent_f) ross_fattree_sample_rc_fn,
sizeof(struct fattree_cn_sample) } ,
{(rbev_trace_f) fattree_event_collect,
sizeof(int),
(ev_trace_f) fattree_event_collect,
{(ev_trace_f) fattree_event_collect,
sizeof(int),
(model_stat_f) fattree_model_stat_collect,
0, // update when changing fattree_model_stat_collect
(sample_event_f) ross_fattree_ssample_fn,
(sample_revent_f) ross_fattree_ssample_rc_fn,
0 } , // updated in switch_init()
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
/* End of ROSS model stats collection */
......
......@@ -122,8 +122,6 @@ tw_lptype model_net_base_lp = {
};
/* setup for the ROSS event tracing
* can have a different function for rbev_trace_f and ev_trace_f
* but right now it is set to the same function for both
*/
void mn_event_collect(model_net_wrap_msg *m, tw_lp *lp, char *buffer, int *collect_flag)
{
......@@ -148,10 +146,7 @@ void mn_event_collect(model_net_wrap_msg *m, tw_lp *lp, char *buffer, int *colle
sub_msg = ((char*)m)+msg_offsets[((model_net_base_state*)lp->cur_state)->net_id];
if (((model_net_base_state*)lp->cur_state)->sub_model_type)
{
if (g_st_ev_trace == RB_TRACE || g_st_ev_trace == COMMIT_TRACE)
(((model_net_base_state*)lp->cur_state)->sub_model_type->rbev_trace)(sub_msg, lp, buffer, collect_flag);
else if (g_st_ev_trace == FULL_TRACE)
(((model_net_base_state*)lp->cur_state)->sub_model_type->ev_trace)(sub_msg, lp, buffer, collect_flag);
(((model_net_base_state*)lp->cur_state)->sub_model_type->ev_trace)(sub_msg, lp, buffer, collect_flag);
}
break;
default: // this shouldn't happen, but can help detect an issue
......@@ -183,8 +178,6 @@ void mn_sample_rc_event(model_net_base_state *s, tw_bf * bf, tw_lp * lp, void *s
st_model_types mn_model_types[MAX_NETS];
st_model_types mn_model_base_type = {
(rbev_trace_f) mn_event_collect,
sizeof(int),
(ev_trace_f) mn_event_collect,
sizeof(int),
(model_stat_f) mn_model_stat_collect,
......
......@@ -307,25 +307,21 @@ static void ross_slimfly_rsample_fn(router_state * s, tw_bf * bf, tw_lp * lp, st
static void ross_slimfly_rsample_rc_fn(router_state * s, tw_bf * bf, tw_lp * lp, struct slimfly_router_sample *sample);
st_model_types slimfly_model_types[] = {
{(rbev_trace_f) slimfly_event_collect,
sizeof(int),
(ev_trace_f) slimfly_event_collect,
{(ev_trace_f) slimfly_event_collect,
sizeof(int),
(model_stat_f) slimfly_model_stat_collect,
0, // update this when changing slimfly_model_stat_collect
(sample_event_f) ross_slimfly_sample_fn,
(sample_revent_f) ross_slimfly_sample_rc_fn,
sizeof(struct slimfly_cn_sample) } ,
{(rbev_trace_f) slimfly_event_collect,
sizeof(int),
(ev_trace_f) slimfly_event_collect,
{(ev_trace_f) slimfly_event_collect,
sizeof(int),
(model_stat_f) slimfly_model_stat_collect,
0, // update this when changing slimfly_model_stat_collect
(sample_event_f) ross_slimfly_rsample_fn,
(sample_revent_f) ross_slimfly_rsample_rc_fn,
0 } , //updated in slim_router_setup() since it's based on the radix
{NULL, 0, NULL, 0, NULL, 0, NULL, NULL, 0}
{NULL, 0, NULL, 0, NULL, NULL, 0}
};
/* End of ROSS model stats collection */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment