Handle cache not thread-safe
The new caching mechanism for RPC handles is not thread safe. If multiple ES concurrently call margo_create, it will produce a segmentation fault. We need to urgently correct that since it's on the master branch and can impact users (starting with my Plasma database).
I fixed this in the fix-multithread-cache branch by adding an ABT_mutex handle_cache_mtx field in margo_instance and by locking the accesses to the cache. This seems to solve the problem but may not be the best solution. In particular in massively multithreaded programs it would probably be better to be able to disable handle caching (right now I see an overhead when using caching, instead of an improvement).
- Check and merge my fix;
- Maybe add a margo_disable_handle_caching(mid, HG_TRUE/HG_FALSE) function so that caching can be turned on or off at run time depending on whether we are in a section with many concurrent margo_create or not? (although this is tricky because we could do margo_create followed by margo_disable_handle_caching followed by margo_destroy and either we would still have to look into the cache, or we would keep in the cache something that should be removed). Maybe the solution would be just a --disable-handle-caching in configure?