darshan-runtime.txt 20.8 KB
Newer Older
1
2
3
4
5
Darshan-runtime installation and usage
======================================

== Introduction

6
7
8
This document describes darshan-runtime, which is the instrumentation
portion of the Darshan characterization tool.  It should be installed on the
system where you intend to collect I/O characterization information.
Philip Carns's avatar
Philip Carns committed
9

10
11
12
Darshan instruments applications via either compile time wrappers for static
executables or dynamic library preloading for dynamic executables.  An
application that has been instrumented with Darshan will produce a single
Kevin Harms's avatar
Kevin Harms committed
13
log file each time it is executed.  This log summarizes the I/O access patterns
14
15
16
17
18
19
20
21
22
23
used by the application.

The darshan-runtime instrumentation only instruments MPI applications (the
application must at least call `MPI_Init()` and `MPI_Finalize()`).  However,
it captures both MPI-IO and POSIX file access.  It also captures limited
information about HDF5 and PnetCDF access.

This document provides generic installation instructions, but "recipes" for
several common HPC systems are provided at the end of the document as well.

24
25
26
More information about Darshan can be found at the 
http://www.mcs.anl.gov/darshan[Darshan web site].

27
28
29
30
31
== Requirements

* MPI C compiler
* zlib development headers and library

Philip Carns's avatar
Philip Carns committed
32
== Compilation
33
34
35
36
37
38
39
40
41
42
43

.Configure and build example
----
tar -xvzf darshan-<version-number>.tar.gz
cd darshan-<version-number>/darshan-runtime
./configure --with-mem-align=8 --with-log-path=/darshan-logs --with-jobid-env=PBS_JOBID CC=mpicc
make
make install
----

.Explanation of configure arguments:
44
* `--with-mem-align=` (mandatory): This value is system-dependent and will be
45
46
used by Darshan to determine if the buffer for a read or write operation is
aligned in memory.
47
* `--with-jobid-env=` (mandatory): this specifies the environment variable that
48
49
50
51
52
Darshan should check to determine the jobid of a job.  Common values are
`PBS_JOBID` or `COBALT_JOBID`.  If you are not using a scheduler (or your
scheduler does not advertise the job ID) then you can specify `NONE` here.
Darshan will fall back to using the pid of the rank 0 process if the
specified environment variable is not set.
53
54
55
56
* `--with-log-path=` (this, or `--with-log-path-by-env`, is mandatory): This
specifies the parent directory for the directory tree where darshan logs
will be placed.
* `--with-log-path-by-env=`: specifies an environment variable to use to
57
58
59
60
determine the log path at run time.
* `--with-log-hints=`: specifies hints to use when writing the Darshan log
file.  See `./configure --help` for details.
* `--with-zlib=`: specifies an alternate location for the zlib development
Philip Carns's avatar
Philip Carns committed
61
header and library.
62
63
64
65
66
67
68
* `CC=`: specifies the MPI C compiler to use for compilation.
* `--disable-cuserid`: disables use of cuserid() at runtime.
* `--disable-ld-preload`: disables building of the Darshan LD_PRELOAD library
* `--disable-bgq-mod`: disables building of the BG/Q module (default checks
and only builds if BG/Q environment detected).
* `--enable-group-readable-logs`: sets darshan log file permissions to allow
group read access.
69
70
71

=== Cross compilation

Philip Carns's avatar
Philip Carns committed
72
On some systems (notably the IBM Blue Gene series), the login nodes do not
73
74
have the same architecture or runtime environment as the compute nodes.  In
this case, you must configure darshan-runtime to be built using a cross
Philip Carns's avatar
Philip Carns committed
75
compiler.  The following configure arguments show an example for the BG/P system:
76
77
78
79
80
81
82

----
--host=powerpc-bgp-linux CC=/bgsys/drivers/ppcfloor/comm/default/bin/mpicc 
----

== Environment preparation

Philip Carns's avatar
Philip Carns committed
83
84
Once darshan-runtime has been installed, you must prepare a location
in which to store the Darshan log files and configure an instrumentation method.
85
86
87

=== Log directory

88
This step can be safely skipped if you configured darshan-runtime using the
Philip Carns's avatar
Philip Carns committed
89
90
`--with-log-path-by-env` option.  A more typical configuration uses a static
directory hierarchy for Darshan log
91
92
93
94
95
files.

The `darshan-mk-log-dirs.pl` utility will configure the path specified at
configure time to include
subdirectories organized by year, month, and day in which log files will be
Philip Carns's avatar
Philip Carns committed
96
placed. The deepest subdirectories will have sticky permissions to enable
97
98
99
100
101
102
103
multiple users to write to the same directory.  If the log directory is
shared system-wide across many users then the following script should be run
as root.
 
----
darshan-mk-log-dirs.pl
----
104

105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
.A note about log directory permissions
[NOTE]
====
All log files written by darshan have permissions set to only allow
read access by the owner of the file.  You can modify this behavior,
however, by specifying the --enable-group-readable-logs option at
configure time.  One notable deployment scenario would be to configure
Darshan and the log directories to allow all logs to be readable by both the
end user and a Darshan administrators group.   This can be done with the
following steps:

* set the --enable-group-readable-logs option at configure time
* create the log directories with darshan-mk-log-dirs.pl
* recursively set the group ownership of the log directories to the Darshan
administrators group
* recursively set the setgid bit on the log directories
====


124
=== Instrumentation method
125

126
127
128
129
130
131
132
133
134
The instrumentation method to use depends on whether the executables
produced by your MPI compiler are statically or dynamically linked.  If you
are unsure, you can check by running `ldd <executable_name>` on an example
executable.  Dynamically-linked executables will produce a list of shared
libraries when this command is executed.

Most MPI compilers allow you to toggle dynamic or static linking via options
such as `-dynamic` or `-static`.  Please check your MPI compiler man page
for details if you intend to force one mode or the other.
135
136
137

== Instrumenting statically-linked applications

138
139
140
141
142
143
144
Statically linked executables must be instrumented at compile time.
The simplest methods to do this are to either generate a customized
MPI compiler script (e.g. `mpicc`) that includes the link options and
libraries needed by Darshan, or to use existing profiling configuration
hooks for existing MPI compiler scripts.  Once this is done, Darshan
instrumentation is transparent; you simply compile applications using
the darshan-enabled MPI compiler scripts.
145

146
147
=== Using a profile configuration 

148
[[static-prof]]
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
The MPICH MPI implementation supports the specification of a profiling library
configuration, then it can be used to insert Darshan instrumentation without
modifying the existing MPI compiler script.  Example profiling configuration
files are installed with Darshan 2.3.1 and later.  You can enable a profiling
configuration using environment variables or command line arguments to the
compiler scripts:

Example for MPICH 3.1.1 or newer:
----
export MPICC_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-cc
export MPICXX_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-cxx
export MPIFORT_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-f
----

Example for MPICH 3.1 or earlier:
----
export MPICC_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-cc
export MPICXX_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-cxx
export MPICF77_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-f
export MPICF90_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-f
----

Examples for command line use:
----
mpicc -profile=$DARSHAN_PREFIX/share/mpi-profile/darshan-c <args>
mpicxx -profile=$DARSHAN_PREFIX/share/mpi-profile/darshan-cxx <args>
mpif77 -profile=$DARSHAN_PREFIX/share/mpi-profile/darshan-f <args>
mpif90 -profile=$DARSHAN_PREFIX/share/mpi-profile/darshan-f <args>
----

179
180
=== Using customized compiler wrapper scripts

181
[[static-wrapper]]
182
183
184
185
186
187
188
189
190
191
192
193
For MPICH-based MPI libraries, such as MPICH1, MPICH2, or MVAPICH,
custom wrapper scripts can be generated to automatically include Darshan
instrumentation.  The following example illustrates how to produce
wrappers for C, C++, and Fortran compilers:

----
darshan-gen-cc.pl `which mpicc` --output mpicc.darshan
darshan-gen-cxx.pl `which mpicxx` --output mpicxx.darshan
darshan-gen-fortran.pl `which mpif77` --output mpif77.darshan
darshan-gen-fortran.pl `which mpif90` --output mpif90.darshan
-----

194
195
=== Other configurations

196
197
198
Please see the Cray recipe in this document for instructions on
instrumenting statically-linked applications on that platform.

199
200
201
202
For other MPI Libraries you must manually modify the MPI compiler scripts to
add the necessary link options and libraries.  Please see the
`darshan-gen-*` scripts for examples or contact the Darshan users mailing
list for help.
203
204
205

== Instrumenting dynamically-linked applications

206
For dynamically-linked executables, darshan relies on the `LD_PRELOAD`
Philip Carns's avatar
Philip Carns committed
207
208
environment variable to insert instrumentation at run time.  The executables
should be compiled using the normal, unmodified MPI compiler.
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226

To use this mechanism, set the `LD_PRELOAD` environment variable to the full
path to the Darshan shared library, as in this example:

----
export LD_PRELOAD=/home/carns/darshan-install/lib/libdarshan.so
----

You can then run your application as usual.  Some environments may require a
special `mpirun` or `mpiexec` command line argument to propagate the
environment variable to all processes.  Other environments may require a
scheduler submission option to control this behavior.  Please check your
local site documentation for details.

=== Instrumenting dynamically-linked Fortran applications

Please follow the general steps outlined in the previous section.  For
Fortran applications compiled with MPICH you may have to take the additional
Philip Carns's avatar
Philip Carns committed
227
step of adding
228
229
230
`libfmpich.so` to your `LD_PRELOAD` environment variable. For example:

----
231
export LD_PRELOAD=/path/to/mpi/used/by/executable/lib/libfmpich.so:/home/carns/darshan-install/lib/libdarshan.so
232
----
233

234
235
236
237
238
239
240
241
242
243
[NOTE]
The full path to the libfmpich.so library can be omitted if the rpath
variable points to the correct path.  Be careful to check the rpath of the
darshan library and the executable before using this configuration, however.
They may provide conflicting paths.  Ideally the rpath to the  MPI library
would *not* be set by the darshan library, but would instead be specified
exclusively by the executable itself.  You can check the rpath of the
darshan library by running `objdump -x
/home/carns/darshan-install/lib/libdarshan.so |grep RPATH`.

244
245
== Darshan installation recipes

Philip Carns's avatar
Philip Carns committed
246
The following recipes provide examples for prominent HPC systems.
Philip Carns's avatar
Philip Carns committed
247
These are intended to be used as a starting point.  You will most likely have to adjust paths and options to
248
249
reflect the specifics of your system.

250
=== IBM Blue Gene (BG/P or BG/Q)
251

252
IBM Blue Gene systems produces static executables by default, uses a
253
254
255
256
257
258
259
260
261
different architecture for login and compute nodes, and uses an MPI
environment based on MPICH.

The following example shows how to configure Darshan on a BG/P system:

----
./configure --with-mem-align=16 \
 --with-log-path=/home/carns/working/darshan/releases/logs \
 --prefix=/home/carns/working/darshan/install --with-jobid-env=COBALT_JOBID \
262
 --with-zlib=/soft/apps/zlib-1.2.3/ \
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
 --host=powerpc-bgp-linux CC=/bgsys/drivers/ppcfloor/comm/default/bin/mpicc 
----

.Rationale
[NOTE]
====
The memory alignment is set to 16 not because that is the proper alignment
for the BG/P CPU architecture, but because that is the optimal alignment for
the network transport used between compute nodes and I/O nodes in the
system.  The jobid environment variable is set to `COBALT_JOBID` in this
case for use with the Cobalt scheduler, but other BG/P systems may use
different schedulers.  The `--with-zlib` argument is used to point to a
version of zlib that has been compiled for use on the compute nodes rather
than the login node.  The `--host` argument is used to force cross-compilation
of Darshan.  The `CC` variable is set to point to a stock MPI compiler.
====

280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
Once Darshan has been installed, you can use one of the static
instrumentation methods described earlier in this document.  If you
use the profiling configuration file method, then please note that the
Darshan installation includes profiling configuration files that have been
adapted specifically for the Blue Gene environment.  Set the following
environment variables to enable them, and then use your normal compiler
scripts.  This method is compatible with both GNU and IBM compilers.

Blue Gene profiling configuration example:
----
export MPICC_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-bg-cc
export MPICXX_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-bg-cxx
export MPICF77_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-bg-f
export MPICF90_PROFILE=$DARSHAN_PREFIX/share/mpi-profile/darshan-bg-f
----
295

296
=== Cray platforms (XE, XC, or similar)
297

298
299
300
301
302
303
304
The Cray programming environment produces static executables by default,
which means that Darshan instrumentation must be inserted at compile
time.  This can be accomplished by loading a software module that sets
appropriate environment variables to modify the Cray compiler script link
behavior.  This section describes how to compile and install Darshan,
as well as how to use a software module to enable and disable Darshan
instrumentation.
305
306
307
308
309
310

==== Building and installing Darshan

Please set your environment to use the GNU programming environment before
configuring or compiling Darshan.  Although Darshan can be built with a
variety of compilers, the GNU compilers are recommended because it will
311
312
313
314
315
produce a Darshan library that is interoperable with the widest range
of compmilers and linkers.  On most Cray systems you can enable the GNU
programming environment with a command similar to "module swap PrgEnv-pgi
PrgEnv-gnu".  Please see your site documentation for information about
how to switch programming environments.
316
317

The following example shows how to configure and build Darshan on a Cray
318
system using either the GNU programming environment.  Adjust the 
319
320
--with-log-path and --prefix arguments to point to the desired log file path 
and installation path, respectively.
321
322

----
323
module swap PrgEnv-pgi PrgEnv-gnu
324
./configure --with-mem-align=8 \
325
326
 --with-log-path=/shared-file-system/darshan-logs \
 --prefix=/soft/darshan-2.2.3 \
327
 --with-jobid-env=PBS_JOBID --disable-cuserid CC=cc
328
329
make install
module swap PrgEnv-gnu PrgEnv-pgi
330
331
332
333
334
335
336
----

.Rationale
[NOTE]
====
The job ID is set to `PBS_JOBID` for use with a Torque or PBS based scheduler.
The `CC` variable is configured to point the standard MPI compiler.
Philip Carns's avatar
Philip Carns committed
337
338
339
340
341
342
343
344

The --disable-cuserid argument is used to prevent Darshan from attempting to
use the cuserid() function to retrieve the user name associated with a job.
Darshan automatically falls back to other methods if this function fails,
but on some Cray environments (notably the Beagle XE6 system as of March 2012)
the cuserid() call triggers a segmentation fault.  With this option set,
Darshan will typically use the LOGNAME environment variable to determine a
userid.
345
346
====

347
348
349
As in any Darshan installation, the darshan-mk-log-dirs.pl script can then be 
used to create the appropriate directory hierarchy for storing Darshan log 
files in the --with-log-path directory.
Philip Carns's avatar
Philip Carns committed
350

351
352
353
354
355
Note that Darshan is not currently capable of detecting the stripe size
(and therefore the Darshan FILE_ALIGNMENT value) on Lustre file systems.
If a Lustre file system is detected, then Darshan assumes an optimal
file alignment of 1 MiB.

356
==== Enabling Darshan instrumentation 
357

358
359
Darshan will automatically install example software module files in the
following locations (depending on how you specified the --prefix option in
360
the previous section):
Philip Carns's avatar
Philip Carns committed
361

362
----
363
/soft/darshan-2.2.3/share/craype-1.x/modulefiles/darshan
364
/soft/darshan-2.2.3/share/craype-2.x/modulefiles/darshan
365
----
Philip Carns's avatar
Philip Carns committed
366

367
368
Select the one that is appropriate for your Cray programming environment
(see the version number of the craype module in `module list`).
Philip Carns's avatar
Philip Carns committed
369

370
371
372
373
374
375
376
377
378
379
380
381
382
If you are using the Cray Programming Environment version 1.x, then you
must modify the corresponding modulefile before using it.  Please see
the comments at the end of the file and choose an environment variable
method that is appropriate for your system.  If this is not done, then
the compiler may fail to link some applications when the Darshan module
is loaded.

If you are using the Cray Programming Environment version 2.x then you can
likely use the modulefile as is.  Note that it pulls most of its
configuration from the lib/pkgconfig/darshan-runtime.pc file installed with
Darshan.

The modulefile that you select can be copied to a system location, or the
383
384
install location can be added to your local module path with the following
command:
Philip Carns's avatar
Philip Carns committed
385

386
----
387
module use /soft/darshan-2.2.3/share/craype-<VERSION>/modulefiles
388
----
Philip Carns's avatar
Philip Carns committed
389

390
391
From this point, Darshan instrumenation can be enabled for all future
application compilations by running "module load darshan".
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423

=== Linux clusters using Intel MPI 

Most Intel MPI installations produce dynamic executables by default.  To
configure Darshan in this environment you can use the following example:

----
./configure --with-mem-align=8 --with-log-path=/darshan-logs --with-jobid-env=PBS_JOBID CC=mpicc
----

.Rationale
[NOTE]
====
There is nothing unusual in this configuration except that you should use
the underlying GNU compilers rather than the Intel ICC compilers to compile
Darshan itself.
====

You can use the `LD_PRELOAD` method described earlier in this document to
instrument executables compiled with the Intel MPI compiler scripts.  This
method has been briefly tested using both GNU and Intel compilers.

.Caveat
[NOTE]
====
Darshan is only known to work with C and C++ executables generated by the
Intel MPI suite.  Darshan will not produce instrumentation for Fortran
executables.  For more details please check this Intel forum discussion:

http://software.intel.com/en-us/forums/showthread.php?t=103447&o=a&s=lr
====

424
=== Linux clusters using MPICH 
425

426
427
428
429
430
431
432
Follow the generic instructions provided at the top of this document.  For MPICH versions 3.1 and
later, MPICH uses shared libraries by default, so you may need to consider the dynamic linking
instrumentation approach.  

The static linking method can be used if MPICH is configured to use static
linking by default, or if you are using a version prior to 3.1.
The only modification is to make sure that the `CC` used for compilation is
433
434
435
based on a GNU compiler.  Once Darshan has been installed, it should be
capable of instrumenting executables built with GNU, Intel, and PGI
compilers.
Philip Carns's avatar
Philip Carns committed
436

437
438
439
440
441
442
443
444
445
446
447
448
449
[NOTE]
Darshan is not capable of instrumenting Fortran applications build with MPICH versions 3.1.1, 3.1.2,
or 3.1.3 due to a library symbol name compatibility issue.  Consider using a newer version of
MPICH if you wish to instrument Fortran applications.  Please see
http://trac.mpich.org/projects/mpich/ticket/2209 for more details.

[NOTE]
MPICH versions 3.1, 3.1.1, 3.1.2, and 3.1.3 may produce link-time errors when building static
executables (i.e. using the -static option) if MPICH is built with shared library support.
Please see http://trac.mpich.org/projects/mpich/ticket/2190 for more details.  The workaround if you
wish to use static linking is to configure MPICH with `--enable-shared=no --enable-static=yes` to
force it to use static MPI libraries with correct dependencies.

450
451
452
453
454
455
456
457
458
459
460
461
462
=== Linux clusters using Open MPI

Follow the generic instructions provided at the top of this document for
compilation, and make sure that the `CC` used for compilation is based on a
GNU compiler.

Open MPI typically produces dynamically linked executables by default, which
means that you should use the `LD_PRELOAD` method to instrument executables
that have been built with Open MPI.  Darshan is only compatible with Open
MPI 1.6.4 and newer.  For more details on why Darshan is not compatible with
older versions of Open MPI, please refer to the following mailing list discussion:

http://www.open-mpi.org/community/lists/devel/2013/01/11907.php
463
464
465
466
467
468
469
470
471

== Runtime environment variables

The Darshan library honors the following environment variables to modify
behavior at runtime:

* DARSHAN_DISABLE: disables Darshan instrumentation
* DARSHAN_INTERNAL_TIMING: enables internal instrumentation that will print the time required to startup and shutdown Darshan to stderr at run time.
* DARSHAN_LOGHINTS: specifies the MPI-IO hints to use when storing the Darshan output file.  The format is a semicolon-delimited list of key=value pairs, for example: hint1=value1;hint2=value2
472
* DARSHAN_MEMALIGN: specifies a value for system memory alignment
473
* DARSHAN_JOBID: specifies the name of the environment variable to use for the job identifier, such as PBS_JOBID
474
* DARSHAN_DISABLE_SHARED_REDUCTION: disables the step in Darshan aggregation in which files that were accessed by all ranks are collapsed into a single cumulative file record at rank 0.  This option retains more per-process information at the expense of creating larger log files. Note that it is up to individual instrumentation module implementations whether this environment variable is actually honored.
475
* DARSHAN_LOGPATH: specifies the path to write Darshan log files to. Note that this directory needs to be formatted using the darshan-mk-log-dirs script.
476
* DARSHAN_LOGFILE: specifies the path (directory + Darshan log file name) to write the output Darshan log to. This overrides the default Darshan behavior of automatically generating a log file name and adding it to a log file directory formatted using darshan-mk-log-dirs script.