too many observers can crash ssg
If one starts up a lot of ssg members on one node like this:
mpiexec -np 32 ./tests/ssg-launch-group -s 360 -f group.ssg sockets mpi &
and tries to observe that group with a small number of processes, things are ok:
./ssg-observe-group sockets build/group.ssg
If I try to observe with 64 processes, I get some errors:
SWIM dping ack recv error -- group 15324806640328145610 not found SWIM dping req recv error -- group 15324806640328145610 not found # HG -- Error -- /tmp/robl/spack-stage/spack-stage-mercury-2.0.0rc1-qeng5ccan7pe4mgpwopu4cpaw6ftfcbz/spack-src/src/mercury_core.c:3748 # HG_Core_registered_data(): Could not find RPC ID in function map # HG -- Error -- /tmp/robl/spack-stage/spack-stage-mercury-2.0.0rc1-qeng5ccan7pe4mgpwopu4cpaw6ftfcbz/spack-src/src/mercury.c:1368 # HG_Registered_data(): Could not get registered data
On slack @ssnyder mentioned the 'recv error' messages are spam but they sure seem to indicate something bad is about to happen.