SSG segfault in simple example
I have the following code: proc1 creates a group, store the id in a file, sleeps (using
margo_thread_sleep) for 10 seconds, then destroys the group and terminates. proc2 opens the stored id and joins the group, then sleeps, then leaves the group. The code is available here: https://xgitlab.cels.anl.gov/sds/mochi-doc/tree/8cbbf08428e5c15395e490ddd10ea76b6c39f27a/code/ssg/06_join_leave
If proc2 leaves before the 10 seconds in proc1 have passed, I see the correct output in proc1 (I see the new member joining and leaving), but proc2 prints this message:
SWIM dping ack recv error -- group 13614397414369239985 not found
which I don't think is normal.
If proc1 leaves before proc2 (i.e. is I set proc2 to wait more than 10 seconds), when proc1 leaves, proc2 crashes with a segfault. This is the gdb trace of proc2:
(gdb) bt #0 HG_Cancel (handle=0x37f00001fa0) at /tmp/mdorier/spack-stage/spack-stage-mercury-2.0.0a1-srnpfevisrpna2yleheliup5geik6dyv/spack-src/src/mercury.c:2205 #1 0x00007f189ead4aab in ABTD_thread_func_wrapper_thread () from /projects/spack/var/spack/environments/hepnos/.spack-env/view/lib/libabt.so.0 #2 0x00007f189ead5121 in make_fcontext () from /projects/spack/var/spack/environments/hepnos/.spack-env/view/lib/libabt.so.0 #3 0x0000000000000000 in ?? ()