WIP: On C++ Instrumentation / branch: new-cpp-mod-experimental
Just a few notes on how to instrument C++ libraries with Darshan. Architectural changes to Darshan are not required, but a number of Macros guarding C/C++ mismatches and a number of C++ runtime routines have to be linked for the final shared object. Finally, wrapper generation is slightly less convenient due to C++ name mangling when relying on standard mechanisms.
Preparing the Instrumentation Wrapper
Prevent polluting shared object with unnecessary symbols (e.g. from OMPI_..)
To prevent C++ MPI routines to be included into the intermediate and final shared objects ensure to set the following define before including mpi.h: #define OMPI_SKIP_MPICXX
Forward Declaration/Map and Fail Helpers
Slightly diverging from the existing FORWARD_DECL
/MAP_OR_FAIL
macro complex, C++ static variables that come with functions are used to store pointers to the original function after lookup, making the forward declaration unnecessary.
For consistency across modules, it might still be preferable to maintain the practice and provide C++ equivalent macros.
Intercepting C++ Mangled Names
Instrumenting C++ functions is slightly less convenient in comparison to C, because C++ has to mangle functions to be able to provide features such as function overloading and namespaces. Unfortunately, there is no standard mechanism to generate the exact mangled name during compile time as it will show up in the shared object.
For LD_PRELOAD to work, it is sufficient to compile a C++ shared library which matches signatures and scopes of the symbols to intercept from a regular cpp source file. While knowledge of the mangled name is not necessary to mask arbitrary shared library symbols, wrapping/calling the original symbol through a function pointer obtained via dlsym(RTLD_NEXT, “...mangledname...”)
requires to use the fully qualified (thus mangled) symbol name.
In an attempt to avoid adding additional dependencies mimicking what the mangler is doing at this time, it is recommended to just look up the mangled names, e.g., using the nm command-line utility. Fortunately, the rules for mangling are stable and mostly consistent across GCC and LLVM compiler architectures:
- https://github.com/gcc-mirror/gcc/blob/master/gcc/cp/mangle.c
- https://github.com/itanium-cxx-abi/cxx-abi
- https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
Minor changes to Darshan
Export C if C++ for Darshan Core API
To prevent spurious inclusion of mangled darshan core functions, the following files should be extended by adding a conditional extern C around function declarations.
darshan-runtime/darshan.h
darshan-runtime/darshan-common.hdev
Build process
To create the right symbols it’s easiest to rely on a C++ compiler to create the shared object which can easily be linked to an otherwise C based shared library with the minor exception of including -lstdc++ in the final linking stage to provide C++ runtime functionality.
lib/darshan-abcxyz.o: lib/darshan-abcxyz.c darshan.h darshan-common.h $(DARSHAN_LOG_FORMAT) $(srcdir)/../darshan-abcxyz-log-format.h | lib
$(CXX) $(CFLAGS) -fpermissive -c $< -o $@
****** ************New
lib/darshan-abcxyz.po: lib/darshan-abcxyz.c darshan.h darshan-dynamic.h darshan-common.h $(DARSHAN_LOG_FORMAT) $(srcdir)/../darshan-abcxyz-log-format.h | lib
$(CXX) $(CFLAGS_SHARED) -fpermissive -c $< -o $@
****** ************
# ...
lib/libdarshan.so: lib/darshan-core-init-finalize.po lib/darshan-core.po lib/darshan-common.po $(DARSHAN_DYNAMIC_MOD_OBJS) lib/lookup3.po lib/lookup8.po
$(CC) $(CFLAGS_SHARED) $(LDFLAGS) -o $@ $^ -lpthread -lrt -lz -ldl -lstdc++
***** ********
Note: That the final shared object can remain with the normal C compiler.