Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
D
darshan
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 73
    • Issues 73
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 8
    • Merge Requests 8
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
  • darshan
  • darshan
  • Merge Requests
  • !27

Closed
Opened Nov 25, 2018 by Glenn K. Lockwood@glock
  • Report abuse
Report abuse

WIP: darshan instrumentation for apps that do not use MPI

  • Overview 14
  • Commits 16
  • Changes 14

This branch of Darshan now supports instrumenting serial applications that don't use MPI! This is still a work in progress, but it currently works and produces new insights so I'd like to review this code now with the Darshan wizards and make any changes to style, approach, nomenclature, etc now before hardening the code.

Most of the work was factoring MPI out of darshan-core and all of the modules' finalization functions so that instead of calling PMPI_Reduce() and the like, darshan_mpi_reduce() is called, and Darshan either passes directly to PMPI_Reduce() if MPI is being used or a stub function otherwise.

At present one must export DARSHAN_NOMPI to make Darshan spin up/down before entering and after leaving main(); this allows a single libdarshan.so to function with both MPI applications (in which case Darshan behaves identically to how it always has) and non-MPI applications (in which case Darshan is initialized/shutdown twice, but idempotently).

So to actually test this out,

  1. Build Darshan as normal
  2. LD_PRELOAD libdarshan.so and define DARSHAN_NOMPI=1 in the runtime environment to enable glibc hooks for non-MPI apps

For example,

(haswell)glock@cori03:~/src/git/darshan-dev/darshan-runtime$ make
cc -DDARSHAN_CONFIG_H=\"darshan-runtime-config.h\" -I . -I ../ -I . -I./../ -g -O2  -D_LARGEFILE64_SOURCE -DDARSHAN_LUSTRE -c lib/darshan-core-init-finalize.c -o lib/darshan-core-init-finalize.o
cc -DDARSHAN_CONFIG_H=\"darshan-runtime-config.h\" -I . -I ../ -I . -I./../ -g -O2  -D_LARGEFILE64_SOURCE -DDARSHAN_LUSTRE -c lib/darshan-core.c -o lib/darshan-core.o
...

(haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ LD_PRELOAD=$PWD/lib/libdarshan.so DARSHAN_NOMPI=1 cp -v lib/libdarshan.so DELETEME
'lib/libdarshan.so' -> 'DELETEME'

(haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ ls -lrt *.darshan
-r-------- 1 glock glock 1399 Nov 26 14:20 glock_cp_id51423_11-26-51639-2202167712509143483_1.darshan

(haswell)glock@cori11:~/src/git/darshan-dev/darshan-runtime$ darshan-parser glock_cp_id51423_11-26-51639-2202167712509143483_1.darshan
...
POSIX	-1	10398549382890127017	POSIX_BYTES_READ	0	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_BYTES_WRITTEN	443088	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_MAX_BYTE_READ	0	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_MAX_BYTE_WRITTEN	443087	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
...
POSIX	-1	10398549382890127017	POSIX_ACCESS1_ACCESS	131072	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_ACCESS2_ACCESS	49872	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_ACCESS3_ACCESS	0	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_ACCESS4_ACCESS	0	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_ACCESS1_COUNT	3	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs
POSIX	-1	10398549382890127017	POSIX_ACCESS2_COUNT	1	/global/u2/g/glock/src/git/darshan-dev/darshan-runtime/DELETEME	/global/u2	gpfs

I envisioned implementing this in two phases:

Phase 1: Still must link against MPI (as in the Cray case) but can force non-MPI mode using an env variable

  • (done) provide serial+mpi abstraction to replace bare PMPI_* calls in Darshan
  • (done) make double-initialization of Darshan impossible so enabling serial mode on an MPI app simply ignores MPI profiling
  • (done) allows us to use a single compiled library on both MPI and non-MPI applications
  • (done) can implement a Darshan-specific subset of MPI functionality when MPI is not initialized
  • (done) use GNU C's __attribute__((constructor)), atexit(), and signal handling (I actually opted to forego atexit() and instead use both glibc constructor and destructor for consistent behavior)

Phase 2: Serial-only version of Darshan to avoid pulling in all of MPI when a non-MPI application is run

  • must create a separate serial-mode build of Darshan that does not link against MPI at all
  • must create a complete MPI stub library; can we get Sandia/Steve Plimpton to relicense the LAMMPS one to Argonne under non-GPL terms? Otherwise we can just clean-room our own; much of the hard work is already done, as the LAMMPS stub library did not have MPI-IO.
  • figure out the correct environmental integration to ensure this version is linked in the absence of MPI but the full MPI version is included when appropriate.

Phase 2 may only be necessary for static linking since Phase 1 happily preloads in front of non-MPI applications and still works fine.

I've confirmed the following works:

  • non-MPI applications (F77, C)
  • enabling DARSHAN_NOMPI with an actual MPI application (each MPI process generates its own independent Darshan log as if it were a serial application in this case)
  • regular MPI app without DARSHAN_NOMPI

but there are a number of open issues/questions:

  • move stubs into their own .c/.h? what to call it? darshan-mpi.h is already pulled out, but I couldn't see how make dist is done to include it in the source package
  • verify DARSHAN_INTERNAL_TIMING still works
  • verify that OpenMPI works
  • why is POSIX_MMAP -1 for a test job?
  • verify that Python works (can do this with some of pytokio's I/O-heavy tools)
  • verify that DARSHAN_NOMPI with HDF5 actually works
  • verify OpenMP/pthreads and thread safety (should work)
  • static linking
  • does DXT still work?

There's also two known issues:

  • MPI_Type_* cannot be cleanly wrapped for both MPI and non-MPI because there is always a risk that derived types' MPI_Datatype values will collide with an MPI implementation's representation of built-in types. Darshan works around this by instead wrapping the entire process of deriving a datatype and then performing the collective so that the serial version does not need to call MPI_Type_* at all. The only proper solution I can think of would be to make a fully MPI-independent Darshan so that Darshan's MPI stubs own the representation of both builtin MPI_Datatypes and derived ones.
  • Profiling a do-nothing command like python --version creates a Darshan log that causes darshan-parser to throw an assertion (darshan-parser: darshan-logutils.c:1367: darshan_log_libz_read: Assertion 'state->dz.size > 0' failed.). This might be because the log is completely empty, but I wasn't sure.

PS: this will resolve #173 (closed)

Edited Nov 26, 2018 by Glenn K. Lockwood

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch "https://xgitlab.cels.anl.gov/glock/darshan.git" "no-mpi"
git checkout -b "glock/darshan-no-mpi" FETCH_HEAD

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git fetch origin
git checkout "origin/master"
git merge --no-ff "glock/darshan-no-mpi"

Step 4. Push the result of the merge to GitLab

git push origin "master"

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
0
Labels
None
Assign labels
  • View project labels
Reference: darshan/darshan!27