ofi+psm2 protocol low RPC throughput
I am currently investigating significantly reduced RPC throughput on the latest master branch when the
ofi+psm2 protocol is used between two nodes. Other protocols, e.g.,
ofi+tcp are unaffected. Interestingly, this issue also appears when
auto_sm is used on one node with
ofi+psm2, but does not appear when
auto_sm is used with
ofi+sockets, for example. Trying various Margo commits, this issue started to appear in one of the three commits where the initialization functions were refactored 0f115844, 143099c4, 313a5c92.
I just quickly ran a few tests to get a grasp on the performance difference with a small MPI program to evaluate raw RPC throughput and bulk_buffer transfer throughput that somewhat resembles GekkoFS' RPC layer between two nodes.
We are using two nodes with an 100Gbit Omni-Path Network with either native
ofi+psm2 or TCP over Omni-Path with
ofi+sockets. Mercury version: v2.0.0, Argobots version v1.0.1, Margo versions before and after aforementioned commits, called
AFTER below. The Margo server uses 8
Raw RPC throughput
Sending 1.6 million RPCs over 16 processes.
|protocol||RPCs/sec (BEFORE)||RPCs/sec (AFTER)|
||~90K||~70 (lower RPC volume)|
||100K||~100 (lower RPC volume)|
RPC bulk_buffer transfer throughput
Moving ~100 GiB over 16 processes with a bulk buffer of 64 MiB for each RPC in a
|protocol||MiB/sec (BEFORE)||MiB/sec (AFTER)|
My guess is that the reduced data throughput is a side effect due to the reduced RPC throughput in general.