`margo_forward_timed` and long-tail latency
Description
In ssg, most of the rpcs use the margo_forward_timed
routine with a timeout of 2 seconds (20000 msecs). In some situations, 2 seconds is not long enough. However in those situations margo_forward
completes ten times faster.
Scenario
- OLCF Summit
- 8 nodes for SSG provider, one process per node
- 32 nodes for SSG "observers" (clients), 32 processes per node
- The 'ssg-bench' tests, which are the launch/observe tests from ssg but I added some more timing information https://xgitlab.cels.anl.gov/sds/ssg-bench
My modified 'ssg-bench' does the following:
- each MPI process records how long it takes to initialize the software stack (
MPI_Init()
,margo_init()
andssg_init()
); how long it takes to load the ssg serialized group state file; and how long it takes to observe the ssg group - rank 0 collects all the timings
- rank 0 reports a five-bin histogram of observe times
- rank 0 reports the average, minimum, and maximum times for the initialize, load, and observe steps
With margo_forward_timed
, the 1024 clients show the following distributions of observe times:
0.006624-0.420983 : 1020
0.420983-0.835342 : 0
0.835342-1.249700 : 0
1.249700-1.664059 : 0
1.664059-2.078418 : 4
1024 : init average (min max): 6.495028 ( 4.839191 7.676488 )
1024 : load average (min max): 0.004141 ( 0.000073 0.030747 )
1024 : observe average (min max): 0.123265 ( 0.006624 2.078418 )
but with margo_forward
, the same experiment completes much more quickly:
0.007187-0.047266 : 270
0.047266-0.087346 : 121
0.087346-0.127425 : 248
0.127425-0.167504 : 276
0.167504-0.207583 : 109
1024 : init average (min max): 6.133601 ( 4.772916 7.190876 )
1024 : load average (min max): 0.003372 ( 0.000062 0.021013 )
1024 : observe average (min max): 0.099808 ( 0.007187 0.207583 )