sporadic "no RPC callback registered" error in bake-p2p-bw
Process 0 of 2 is on carns-x1 Process 1 of 2 is on carns-x1 # HG -- Error -- /tmp/pcarns/spack-stage/mercury-master-5rpkovxednyhwtxxkbp63ujnxtszyhqx/spack-src/src/mercury.c:429 # hg_core_rpc_cb(): No RPC callback registered # HG -- Error -- /tmp/pcarns/spack-stage/mercury-master-5rpkovxednyhwtxxkbp63ujnxtszyhqx/spack-src/src/mercury_core.c:2724 # hg_core_process(): Error while executing RPC callback # HG -- Warning -- /tmp/pcarns/spack-stage/mercury-master-5rpkovxednyhwtxxkbp63ujnxtszyhqx/spack-src/src/mercury_core_header.c:299 # hg_core_header_response_verify(): Response return code: HG_INVALID_PARAM bake-p2p-bw: ../perf-regression/bake-p2p-bw.c:206: main: Assertion `ret == 0 && num_targets == 1' failed.
The problem is likely that the benchmark is relying on mpi rank for client/server choice in some cases and ssg rank in others. The ssg rank isn't necessarily ordered the same as MPI ranks; it depends on the address string being used.
- use ssg rank exclusively
- refactor so that the "client" is just an observer, and only the "server" is an ssg member