WIP: darshan instrumentation for apps that do not use MPI
This branch of Darshan now supports instrumenting serial applications that don't use MPI! This is still a work in progress, but it currently works and produces new insights so I'd like to review this code now with the Darshan wizards and make any changes to style, approach, nomenclature, etc now before hardening the code.
Most of the work was factoring MPI out of darshan-core and all of the modules' finalization functions so that instead of calling PMPI_Reduce()
and the like, darshan_mpi_reduce()
is called, and Darshan either passes directly to PMPI_Reduce()
if MPI is being used or a stub function otherwise.
At present one must export DARSHAN_NOMPI
to make Darshan spin up/down before entering and after leaving main()
; this allows a single libdarshan.so
to function with both MPI applications (in which case Darshan behaves identically to how it always has) and non-MPI applications (in which case Darshan is initialized/shutdown twice, but idempotently).
So to actually test this out,
- Build Darshan as normal
- LD_PRELOAD libdarshan.so and define
DARSHAN_NOMPI=1
in the runtime environment to enable glibc hooks for non-MPI apps
For example,
(haswell)glock@cori03:~/src/git/darshan-dev/darshan-runtime$ make
cc -DDARSHAN_CONFIG_H=\"darshan-runtime-config.h\" -I . -I ../ -I . -I./../ -g -O2 -D_LARGEFILE64_SOURCE -DDARSHAN_LUSTRE -c lib/darshan-core-init-finalize.c -o lib/darshan-core-init-finalize.o
cc -DDARSHAN_CONFIG_H=\"darshan-runtime-config.h\" -I . -I ../ -I . -I./../ -g -O2 -D_LARGEFILE64_SOURCE -DDARSHAN_LUSTRE -c lib/darshan-core.c -o lib/darshan-core.o
...
(haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ LD_PRELOAD=$PWD/lib/libdarshan.so DARSHAN_NOMPI=1 cp -v lib/libdarshan.so DELETEME
'lib/libdarshan.so' -> 'DELETEME'
(haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ ls -lrt *.darshan
-r-------- 1 glock glock 1399 Nov 26 14:20 glock_cp_id51423_11-26-51639-2202167712509143483_1.darshan
(haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ darshan-parser glock_cp_id51423_11-26-51639-2202167712509143483_1.darshan
...
POSIX -1 10398549382890127017 POSIX_BYTES_READ 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_BYTES_WRITTEN 443088 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_MAX_BYTE_READ 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_MAX_BYTE_WRITTEN 443087 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
...
POSIX -1 10398549382890127017 POSIX_ACCESS1_ACCESS 131072 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_ACCESS2_ACCESS 49872 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_ACCESS3_ACCESS 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_ACCESS4_ACCESS 0 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_ACCESS1_COUNT 3 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
POSIX -1 10398549382890127017 POSIX_ACCESS2_COUNT 1 /global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME /global/u2 gpfs
I envisioned implementing this in two phases:
Phase 1: Still must link against MPI (as in the Cray case) but can force non-MPI mode using an env variable
- (done) provide serial+mpi abstraction to replace bare
PMPI_*
calls in Darshan - (done) make double-initialization of Darshan impossible so enabling serial mode on an MPI app simply ignores MPI profiling
- (done) allows us to use a single compiled library on both MPI and non-MPI applications
- (done) can implement a Darshan-specific subset of MPI functionality when MPI is not initialized
- (done) use GNU C's
__attribute__((constructor))
,atexit()
, and signal handling (I actually opted to foregoatexit()
and instead use both glibc constructor and destructor for consistent behavior)
Phase 2: Serial-only version of Darshan to avoid pulling in all of MPI when a non-MPI application is run
- must create a separate serial-mode build of Darshan that does not link against MPI at all
- must create a complete MPI stub library; can we get Sandia/Steve Plimpton to relicense the LAMMPS one to Argonne under non-GPL terms? Otherwise we can just clean-room our own; much of the hard work is already done, as the LAMMPS stub library did not have MPI-IO.
- figure out the correct environmental integration to ensure this version is linked in the absence of MPI but the full MPI version is included when appropriate.
Phase 2 may only be necessary for static linking since Phase 1 happily preloads in front of non-MPI applications and still works fine.
I've confirmed the following works:
- non-MPI applications (F77, C)
- enabling
DARSHAN_NOMPI
with an actual MPI application (each MPI process generates its own independent Darshan log as if it were a serial application in this case) - regular MPI app without
DARSHAN_NOMPI
but there are a number of open issues/questions:
- move stubs into their own .c/.h? what to call it? darshan-mpi.h is already pulled out, but I couldn't see how
make dist
is done to include it in the source package - verify
DARSHAN_INTERNAL_TIMING
still works - verify that OpenMPI works
- why is
POSIX_MMAP
-1 for a test job? - verify that Python works (can do this with some of pytokio's I/O-heavy tools)
- verify that
DARSHAN_NOMPI
with HDF5 actually works - verify OpenMP/pthreads and thread safety (should work)
- static linking
- does DXT still work?
There's also two known issues:
-
MPI_Type_*
cannot be cleanly wrapped for both MPI and non-MPI because there is always a risk that derived types'MPI_Datatype
values will collide with an MPI implementation's representation of built-in types. Darshan works around this by instead wrapping the entire process of deriving a datatype and then performing the collective so that the serial version does not need to callMPI_Type_*
at all. The only proper solution I can think of would be to make a fully MPI-independent Darshan so that Darshan's MPI stubs own the representation of both builtinMPI_Datatype
s and derived ones. - Profiling a do-nothing command like
python --version
creates a Darshan log that causesdarshan-parser
to throw an assertion (darshan-parser: darshan-logutils.c:1367: darshan_log_libz_read: Assertion 'state->dz.size > 0' failed.
). This might be because the log is completely empty, but I wasn't sure.
PS: this will resolve #173 (closed)