WIP: darshan instrumentation for apps that do not use MPI
This branch of Darshan now supports instrumenting serial applications that don't use MPI! This is still a work in progress, but it currently works and produces new insights so I'd like to review this code now with the Darshan wizards and make any changes to style, approach, nomenclature, etc now before hardening the code.
Most of the work was factoring MPI out of darshan-core and all of the modules' finalization functions so that instead of calling
PMPI_Reduce() and the like,
darshan_mpi_reduce() is called, and Darshan either passes directly to
PMPI_Reduce() if MPI is being used or a stub function otherwise.
At present one must export
DARSHAN_NOMPI to make Darshan spin up/down before entering and after leaving
main(); this allows a single
libdarshan.so to function with both MPI applications (in which case Darshan behaves identically to how it always has) and non-MPI applications (in which case Darshan is initialized/shutdown twice, but idempotently).
So to actually test this out,
- Build Darshan as normal
- LD_PRELOAD libdarshan.so and define
DARSHAN_NOMPI=1in the runtime environment to enable glibc hooks for non-MPI apps
(haswell)glock@cori03:~/src/git/darshan-dev/darshan-runtime$ make cc -DDARSHAN_CONFIG_H=\"darshan-runtime-config.h\" -I . -I ../ -I . -I./../ -g -O2 -D_LARGEFILE64_SOURCE -DDARSHAN_LUSTRE -c lib/darshan-core-init-finalize.c -o lib/darshan-core-init-finalize.o cc -DDARSHAN_CONFIG_H=\"darshan-runtime-config.h\" -I . -I ../ -I . -I./../ -g -O2 -D_LARGEFILE64_SOURCE -DDARSHAN_LUSTRE -c lib/darshan-core.c -o lib/darshan-core.o ... (haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ LD_PRELOAD=$PWD/lib/libdarshan.so DARSHAN_NOMPI=1 cp -v lib/libdarshan.so DELETEME 'lib/libdarshan.so' -> 'DELETEME' (haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ ls -lrt *.darshan -r-------- 1 glock glock 1399 Nov 26 14:20 glock_cp_id51423_11-26-51639-2202167712509143483_1.darshan (haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ darshan-parser glock_cp_id51423_11-26-51639-2202167712509143483_1.darshan ... POSIX -1 10398549382890127017 POSIX_BYTES_READ 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_BYTES_WRITTEN 443088 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_MAX_BYTE_READ 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_MAX_BYTE_WRITTEN 443087 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs ... POSIX -1 10398549382890127017 POSIX_ACCESS1_ACCESS 131072 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_ACCESS2_ACCESS 49872 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_ACCESS3_ACCESS 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_ACCESS4_ACCESS 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_ACCESS1_COUNT 3 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs POSIX -1 10398549382890127017 POSIX_ACCESS2_COUNT 1 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
I envisioned implementing this in two phases:
Phase 1: Still must link against MPI (as in the Cray case) but can force non-MPI mode using an env variable
- (done) provide serial+mpi abstraction to replace bare
PMPI_*calls in Darshan
- (done) make double-initialization of Darshan impossible so enabling serial mode on an MPI app simply ignores MPI profiling
- (done) allows us to use a single compiled library on both MPI and non-MPI applications
- (done) can implement a Darshan-specific subset of MPI functionality when MPI is not initialized
- (done) use GNU C's
atexit(), and signal handling (I actually opted to forego
atexit()and instead use both glibc constructor and destructor for consistent behavior)
Phase 2: Serial-only version of Darshan to avoid pulling in all of MPI when a non-MPI application is run
- must create a separate serial-mode build of Darshan that does not link against MPI at all
- must create a complete MPI stub library; can we get Sandia/Steve Plimpton to relicense the LAMMPS one to Argonne under non-GPL terms? Otherwise we can just clean-room our own; much of the hard work is already done, as the LAMMPS stub library did not have MPI-IO.
- figure out the correct environmental integration to ensure this version is linked in the absence of MPI but the full MPI version is included when appropriate.
Phase 2 may only be necessary for static linking since Phase 1 happily preloads in front of non-MPI applications and still works fine.
I've confirmed the following works:
- non-MPI applications (F77, C)
DARSHAN_NOMPIwith an actual MPI application (each MPI process generates its own independent Darshan log as if it were a serial application in this case)
- regular MPI app without
but there are a number of open issues/questions:
- move stubs into their own .c/.h? what to call it? darshan-mpi.h is already pulled out, but I couldn't see how
make distis done to include it in the source package
- verify that OpenMPI works
- why is
POSIX_MMAP-1 for a test job?
- verify that Python works (can do this with some of pytokio's I/O-heavy tools)
- verify that
DARSHAN_NOMPIwith HDF5 actually works
- verify OpenMP/pthreads and thread safety (should work)
- static linking
- does DXT still work?
There's also two known issues:
MPI_Type_*cannot be cleanly wrapped for both MPI and non-MPI because there is always a risk that derived types'
MPI_Datatypevalues will collide with an MPI implementation's representation of built-in types. Darshan works around this by instead wrapping the entire process of deriving a datatype and then performing the collective so that the serial version does not need to call
MPI_Type_*at all. The only proper solution I can think of would be to make a fully MPI-independent Darshan so that Darshan's MPI stubs own the representation of both builtin
MPI_Datatypes and derived ones.
- Profiling a do-nothing command like
python --versioncreates a Darshan log that causes
darshan-parserto throw an assertion (
darshan-parser: darshan-logutils.c:1367: darshan_log_libz_read: Assertion 'state->dz.size > 0' failed.). This might be because the log is completely empty, but I wasn't sure.
PS: this will resolve #173