Darshan-runtime installation and usage

== Introduction

This document describes darshan-runtime, which is the instrumentation
portion of the Darshan characterization tool.  It should be installed on the
system where you intend to collect I/O characterization information.
Darshan instruments applications via either compile time wrappers for static
executables or dynamic library preloading for dynamic executables.  An
application that has been instrumented with Darshan will produce a single
log file each time is executed.  This log summarizes the I/O access patterns
used by the application.

The darshan-runtime instrumentation only instruments MPI applications (the
application must at least call `MPI_Init()` and `MPI_Finalize()`).  However,
it captures both MPI-IO and POSIX file access.  It also captures limited
information about HDF5 and PnetCDF access.

This document provides generic installation instructions, but "recipes" for
several common HPC systems are provided at the end of the document as well.

== Requirements

* MPI C compiler
* zlib development headers and library

== Compilation and installation

.Configure and build example
tar -xvzf darshan-<version-number>.tar.gz
cd darshan-<version-number>/darshan-runtime
./configure --with-mem-align=8 --with-log-path=/darshan-logs --with-jobid-env=PBS_JOBID CC=mpicc
make install

.Explanation of configure arguments:
* `--with-mem-align` (mandatory): This value is system-dependent and will be
used by Darshan to determine if the buffer for a read or write operation is
aligned in memory.
* `--with-log-path` (this, or `--with-log-path-by-env`, is mandatory): This
specifies the parent directory for the directory tree where darshan logs
will be placed
* `--with-jobid-env` (mandatory): this specifies the environment variable that
Darshan should check to determine the jobid of a job.  Common values are
`PBS_JOBID` or `COBALT_JOBID`.  If you are not using a scheduler (or your
scheduler does not advertise the job ID) then you can specify `NONE` here.
Darshan will fall back to using the pid of the rank 0 process if the
specified environment variable is not set.
* `CC=`: specifies the MPI C compiler to use for compilation
* `--with-log-path-by-env`: specified an environment variable to use to
determine the log path at run time.
* `--with-log-hints=`: specifies hints to use when writing the Darshan log
file.  See `./configure --help` for details.
* `--with-zlib=`: specifies an alternate location for the zlib development
header and library

=== Cross compilation

On some systems (notably the IBM BlueGene series) the login nodes do not
have the same architecture or runtime environment as the compute nodes.  In
this case, you must configure darshan-runtime to be built using a cross
compiler.  The following arguments show an example for the BG/P system:

--host=powerpc-bgp-linux CC=/bgsys/drivers/ppcfloor/comm/default/bin/mpicc 

== Environment preparation

Once darshan-runtime has been installed, you must still prepare a location
to store Darshan log files and configure an instrumentation method.
=== Log directory

This step can be safely skipped if you configured darshan-runtime using the
`--with-log-path-by-env` option.  A more typical configuration, however, is
to provide a static directory hierarchy in which to gather Darshan log

The `darshan-mk-log-dirs.pl` utility will configure the path specified at
configure time to include
subdirectories organized by year, month, and day in which log files will be
placed. The last subdirectories will have sticky permissions to enable
multiple users to write to the same directory.  If the log directory is
shared system-wide across many users then the following script should be run
as root.

=== Instrumentation method

The instrumentation method to use depends on whether the executables
produced by your MPI compiler are statically or dynamically linked.  If you
are unsure, you can check by running `ldd <executable_name>` on an example
executable.  Dynamically-linked executables will produce a list of shared
libraries when this command is executed.

Most MPI compilers allow you to toggle dynamic or static linking via options
such as `-dynamic` or `-static`.  Please check your MPI compiler man page
for details if you intend to force one mode or the other.
== Instrumenting statically-linked applications

Statically linked executables must be instrumented at compile time.  The
simplest way to do this is to generate an MPI compiler script (e.g. `mpicc`)
that includes the link options and libraries needed by Darshan.  Once this
is done, Darshan instrumentation is transparent; you simply compile
applications using the darshan-enabled MPI compiler scripts.

For MPICH-based MPI libraries, such as MPICH1, MPICH2, or MVAPICH, these
wrapper scripts can be generated automatically.  The following example
illustrates how to produce wrappers for C, C++, and Fortran compilers:

darshan-gen-cc.pl `which mpicc` --output mpicc.darshan
darshan-gen-cxx.pl `which mpicxx` --output mpicxx.darshan
darshan-gen-fortran.pl `which mpif77` --output mpif77.darshan
darshan-gen-fortran.pl `which mpif90` --output mpif90.darshan

For other MPI Libraries you must manually modify the MPI compiler scripts to
add the necessary link options and libraries.  Please see the
`darshan-gen-*` scripts for examples or contact the Darshan users mailing
list for help.
== Instrumenting dynamically-linked applications

For dynamically-linked executables, darshan relies on the `LD_PRELOAD`
environment variable to insert instrumentation at run time.  The application
can be compiled using the normal, unmodified MPI compiler.

To use this mechanism, set the `LD_PRELOAD` environment variable to the full
path to the Darshan shared library, as in this example:

export LD_PRELOAD=/home/carns/darshan-install/lib/libdarshan.so

You can then run your application as usual.  Some environments may require a
special `mpirun` or `mpiexec` command line argument to propagate the
environment variable to all processes.  Other environments may require a
scheduler submission option to control this behavior.  Please check your
local site documentation for details.

=== Instrumenting dynamically-linked Fortran applications

Please follow the general steps outlined in the previous section.  For
Fortran applications compiled with MPICH you may have to take the additional
step fo adding
`libfmpich.so` to your `LD_PRELOAD` environment variable. For example:

export LD_PRELOAD=libfmpich.so:/home/carns/darshan-install/lib/libdarshan.so
== Darshan installation recipes

The following recipes provide examples for some prominant HPC systems.
These are intended to be used as a starting point for installation on such
systems, although you will most likely have to adjust paths and options to
reflect the specifics of your system.

=== IBM Blue Gene/P

The IBM Blue Gene/P series produces static executables by default, uses a
different architecture for login and compute nodes, and uses an MPI
environment based on MPICH.

The following example shows how to configure Darshan on a BG/P system:

./configure --with-mem-align=16 \
 --with-log-path=/home/carns/working/darshan/releases/logs \
 --prefix=/home/carns/working/darshan/install --with-jobid-env=COBALT_JOBID \
 --with-zlib=/soft/apps/zlib-1.2.3/ \
 --host=powerpc-bgp-linux CC=/bgsys/drivers/ppcfloor/comm/default/bin/mpicc 

The memory alignment is set to 16 not because that is the proper alignment
for the BG/P CPU architecture, but because that is the optimal alignment for
the network transport used between compute nodes and I/O nodes in the
system.  The jobid environment variable is set to `COBALT_JOBID` in this
case for use with the Cobalt scheduler, but other BG/P systems may use
different schedulers.  The `--with-zlib` argument is used to point to a
version of zlib that has been compiled for use on the compute nodes rather
than the login node.  The `--host` argument is used to force cross-compilation
of Darshan.  The `CC` variable is set to point to a stock MPI compiler.

Once Darshan has been installed, use the `darshan-gen-*.pl` scripts as
described earlier in this document to produce darshan-enabled MPI compilers.
This method has been widely used and tested with both the GNU and IBM XL

=== Cray XE (or similar)

The Cray environment produces static executables by default, uses a similar
architecture for login and compute nodes, and uses its own unique compiler
script system.

The following example shows how to configure Darshan on a Cray system:

module swap PrgEnv-pgi PrgEnv-gnu
./configure --with-mem-align=8 \
 --with-log-path=/lustre/beagle/carns/darshan-logs \
 --prefix=/home/carns/working/darshan/releases/install-darshan-2.2.0-pre1 \
 --with-jobid-env=PBS_JOBID CC=cc

Before compiling Darshan you must modify your environment to use the GNU
compilers rather than the default PGI or Cray compilers.  Please see your
site documentation for details.

The job ID is set to `PBS_JOBID` for use with a Torque or PBS based scheduler.
The `CC` variable is configured to point the standard MPI compiler.

The darshan-runtime package does not provide any scripts or wrappers to use
for instrumenting static executables in the Cray environment.  It may be
possible to do this manually.  However, you _can_ instrument dynamic
executables using `LD_PRELOAD`.  To do this, compile your application with
the `-dynamic` compiler option and follow the instructions for dynamic
executables listed earlier in this document.  This method has been tested
with PGI and GNU compilers and is likely to work with other compiler
combinations as well.

Note that some Cray systems may require additional environment variables or
modules to be set in order to run dynamic executables on a compute node.
Please see your site documentation for details.

=== Linux clusters using Intel MPI 

Most Intel MPI installations produce dynamic executables by default.  To
configure Darshan in this environment you can use the following example:

./configure --with-mem-align=8 --with-log-path=/darshan-logs --with-jobid-env=PBS_JOBID CC=mpicc

There is nothing unusual in this configuration except that you should use
the underlying GNU compilers rather than the Intel ICC compilers to compile
Darshan itself.

You can use the `LD_PRELOAD` method described earlier in this document to
instrument executables compiled with the Intel MPI compiler scripts.  This
method has been briefly tested using both GNU and Intel compilers.

Darshan is only known to work with C and C++ executables generated by the
Intel MPI suite.  Darshan will not produce instrumentation for Fortran
executables.  For more details please check this Intel forum discussion:


=== Linux clusters using MPICH or OpenMPI
Follow the generic instructions provided at the top of this document.  The
only modification is to make sure that the `CC` used for compilation is
based on a GNU compiler.  Once Darshan has been installed, it should be
capable of instrumenting executables built with GNU, Intel, and PGI
