darshan-util.txt 18.5 KB
Newer Older
1 2 3 4 5
Darshan-util installation and usage
===================================

== Introduction

6 7 8 9 10
This document describes darshan-util, a collection of tools for
parsing and summarizing log files produced by Darshan instrumentation.
The darshan-util package can be installed and used on any system
regardless of where the logs were originally generated.  Darshan log
files are platform-independent.
11

12 13 14
More information about Darshan can be found at the 
http://www.mcs.anl.gov/darshan[Darshan web site].

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
== Requirements

Darshan-util has only been tested in Linux environments, but will likely
work in other Unix-like environments as well.  

.Hard requirements
* C compiler
* zlib development headers and library (zlib-dev or similar)

.Optional requirements
* libbz2 development headers and library (libbz2-dev or similar)
* Perl
* pdflatex
* gnuplot 4.2 or later
* epstopdf

31
== Compilation and installation
32 33 34 35 36 37 38 39 40 41

.Configure and build example
----
tar -xvzf darshan-<version-number>.tar.gz
cd darshan-<version-number>/darshan-util
./configure
make
make install
----

42 43 44 45 46 47 48 49 50
CC variable
[NOTE]
The darshan-util package is intended to be used on a login node or
workstation.  For most use cases this means that you should
either leave CC to its default setting or specify a local compiler.  This is
in contrast to the darshan-runtime documentation, which suggests setting CC
to mpicc because the runtime library will be used in the compute node
environment.

51 52 53 54 55 56 57
You can specify `--prefix` to install darshan-util in a specific location
(such as in your home directory for non-root installations).  See
`./configure --help` for additional optional arguments, including how to
specify alternative paths for zlib and libbz2 development libraries.
darshan-util also supports VPATH or "out-of-tree" builds if you prefer that
method of compilation.

58 59 60
The `--enable-shared` argument to configure can be used to enable
compilation of a shared version of the darshan-util library.

61 62 63 64 65 66 67 68 69
== Analyzing log files

Each time a darshan-instrumented application is executed, it will generate a
single log file summarizing the I/O activity from that application.  See the
darshan-runtime documentation for more details, but the log file for a given
application will likely be found in a centralized directory, with the path
and log file name in the following format:

----
Kevin Harms's avatar
Kevin Harms committed
70
<YEAR>/<MONTH>/<DAY>/<USERNAME>_<BINARY_NAME>_<JOB_ID>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan.gz
71 72 73 74 75 76 77 78 79
----

This is a binary format file that summarizes I/O activity. As of version
2.0.0 of Darshan, this file is portable and does not have to be analyzed on
the same system that executed the job. 

=== darshan-job-summary.pl

You can generate a graphical summary
Philip Carns's avatar
Philip Carns committed
80
of the I/O activity for a job by using the `darshan-job-summary.pl` graphical summary
81 82 83 84 85 86 87 88 89 90 91 92 93
tool as in the following example:

----
darshan-job-summary.pl carns_my-app_id114525_7-27-58921_19.darshan.gz
----

This utility requires Perl, pdflatex, epstopdf, and gnuplot in order to
generate its summary.  By default, the output is written to a multi-page
pdf file based on the name of the input file (in this case it would
produce a `carns_my-app_id114525_7-27-58921_19.pdf` output file).
You can also manually specify the name of the output file using the
`--output` argument.

94 95 96 97
An example of the output produced by darshan-job-summary.pl can be found at
http://www.mcs.anl.gov/research/projects/darshan/files/2012/06/pcarns_mpi-io-test_id3406_6-7-47644-13333843235489639491_1.pdf
.

98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
=== darshan-summary-per-file.sh

This utility is similar to darshan-job-summary.pl, except that it produces a
separate pdf summary for every file accessed by an application.  It can be
executed as follows:

----
darshan-summary-per-file.sh carns_my-app_id114525_7-27-58921_19.darshan.gz output-dir
----

The second argument is the name of a directory (to be created) that will
contain the collection of pdf files.  Note that this utility probably
is not appropriate if your application opens a large number of files.

If you would like to produce a summary for a single
specific file, then you can run the following command to produce a quick
list of the files opened by an application and the amount of time spent
performing I/O to each of them:

----
darshan-parser --file-list carns_my-app_id114525_7-27-58921_19.darshan.gz
----

Once you have identified a specific file of interest, then you can produce a
summary for that specific file with the following commands:

----
darshan-convert --file HASH carns_my-app_id114525_7-27-58921_19.darshan.gz interesting_file.darshan.gz
darshan-job-summary.pl interesting_file.darshan.gz
----

The "HASH" argument is the hash of a file name as listed in the
+darshan-parser --file-list+ output.  The +interesting_file.darshan.gz+ file
produced by darshan-convert is like a normal Darshan log file, but it will only contain instrumentation for the
specified file.

134 135 136 137 138 139 140 141 142 143 144 145
=== darshan-parser

In order to obtained a full, human readable dump of all information
contained in a log file, you can use the `darshan-parser` command
line utility.  It does not require any additional command line tools.
The following example essentially converts the contents of the log file
into a fully expanded text file:

----
darshan-parser carns_my-app_id114525_7-27-58921_19.darshan.gz > ~/job-characterization.txt
----

Philip Carns's avatar
Philip Carns committed
146
The format of this output is described in the following section.
147 148 149

=== Guide to darshan-parser output

150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
The beginning of the output from darshan-parser displays a summary of
overall information about the job. The following table defines the meaning
of each line:

[cols="25%,75%",options="header"]
|====
|output line | description
| "# darshan log version" | internal version number of the Darshan log file
| "# size of file statistics" | uncompressed size of each file record in the binary log file
| "# size of job statistics" |  uncompressed size of the overall job statistics in the binary log file
| "# exe" | name of the executable that generated the log file
| "# uid" | user id that the job ran as
| "# jobid" | job id from the scheduler
| "# start_time" | start time of the job, in seconds since the epoch
| "# start_time_asci" | start time of the job, in human readable format
| "# end_time" | end time of the job, in seconds since the epoch
| "# end_time_asci" | end time of the job, in human readable format
| "# nprocs" | number of MPI processes
| "# run time" | run time of the job in seconds
|====

171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
==== Table of mounted file systems

The next portion of the output shows a table of all general purpose file
systems that were mounted while the job was running. Each line uses the
following format:

----
<device> <mount point> <fs type>
----

The device field is the device ID as reported by the stat() system call.
Note that this device ID may change if the node is rebooted or the file
system is remounted.

==== Format of I/O characterization fields

The remainder of the output will show characteristics for each file that was
opened by the application. Each line uses the following format:

----
<rank> <file name hash> <counter name> <counter value> <file name suffix> <mount point> <fs type>
----

The `<rank>` column indicates the rank of the process that opened the file. A
rank value of -1 indicates that all processes opened the same file. In that
case, the value of the counter represents an aggregate across all processes. The
`<file name hash>` is a 64 bit hash of the file path/name that was opened. It
is used as a way to uniquely differentiate each file. The `<counter name>` is
the name of the statistic that the line is reporting, while the `<counter
200 201 202
value>` is the value of that statistic. A value of -1 indicates that Darshan
was unable to collect statistics for that particular counter, and the value
should be ignored.  The `<file name suffix>` shows the last
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
11 characters of the file name. The `<mount point>` is the mount point of the
file system that this file belongs to. The `<fs type>` is the type of file
system.

==== I/O characterization fields

The following table shows a list of integer statistics that are available
for each file, along with a description of each.
Unless otherwise noted, counters include all variants of the call in
question, such a `read()`, `pread()`, and `readv()` for CP_POSIX_READS.


[cols="40%,60%",options="header"]
|====
| output line | description
| CP_POSIX_READS |  Count of POSIX read operations
| CP_POSIX_WRITES | Count of POSIX write operations
| CP_POSIX_OPENS |  Count of how many times the file was opened
| CP_POSIX_SEEKS |  Count of POSIX seek operations
| CP_POSIX_STATS |  Count of POSIX stat operations
| CP_POSIX_MMAPS |  Count of POSIX mmap operations
| CP_POSIX_FREADS | Count of stream read operations
| CP_POSIX_FWRITES |    Count of stream write operations
| CP_POSIX_FOPENS | Count of stream open operations
| CP_POSIX_FSEEKS | Count of stream seek operations
| CP_POSIX_FSYNCS | Count of fsync operations
| CP_POSIX_FDSYNCS |    Count of fdatasync operations
| CP_INDEP_OPENS |  Count of non-collective MPI opens
| CP_COLL_OPENS |   Count of collective MPI opens
| CP_INDEP_READS |  Count of non-collective MPI reads
| CP_INDEP_WRITES | Count of non-collective MPI writes
| CP_COLL_READS |   Count of collective MPI reads
| CP_COLL_WRITES |  Count of collective MPI writes
| CP_SPLIT_READS |  Count of MPI split collective reads
| CP_SPLIT_WRITES | Count of MPI split collective writes
| CP_NB_READS | Count of MPI non-blocking reads
| CP_NB_WRITES |    Count of MPI non-blocking writes
| CP_SYNCS |    Count of MPI file syncs
| CP_INDEP_NC_OPENS |   Count of independent Parallel NetCDF opens
| CP_COLL_NC_OPENS |    Count of collective Parallel NetCDF opens
| CP_HDF5_OPENS |   Count of HDF5 opens
| CP_COMBINER_* |   Count of each type of MPI datatype (both in memory and in file)
| CP_HINTS |    Count of MPI file hints used
| CP_VIEWS |    Count of MPI file views used
| CP_MODE | Mode that the file was last opened in
| CP_BYTES_READ |   Total number of bytes that were read from the file
| CP_BYTES_WRITTEN |    Total number of bytes written to the file
| CP_MAX_BYTE_READ |    Highest offset in the file that was read
| CP_MAX_BYTE_WRITTEN | Highest offset in the file that was written
| CP_CONSEC_READS | Number of consecutive reads (that were immediately adjacent to the previous access)
| CP_CONSEC_WRITES |    Number of consecutive writes (that were immediately adjacent to the previous access)
| CP_SEQ_READS |    Number of sequential reads (at a higher offset than where the previous access left off)
| CP_SEQ_WRITES |   Number of sequential writes (at a higher offset than where the previous access left off)
| CP_RW_SWITCHES |  Number of times that access toggled between read and write in consecutive operations
| CP_MEM_NOT_ALIGNED |  Number of times that a read or write was not aligned in memory
| CP_MEM_ALIGNMENT |    Memory alignment value (chosen at compile time)
| CP_FILE_NOT_ALIGNED | Number of times that a read or write was not aligned in file
260 261 262
| CP_FILE_ALIGNMENT |   File alignment value.  This value is detected at
runtime on most file systems.  On Lustre, however, Darshan assumes a default
value of 1 MiB for optimal file alignment.
263 264 265 266 267 268 269 270 271 272 273 274
| CP_MAX_READ_TIME_SIZE |   Size of the slowest POSIX read operation
| CP_MAX_WRITE_TIME_SIZE |  Size of the slowest POSIX write operation
| CP_SIZE_READ_* |  Histogram of read access sizes at POSIX level
| CP_SIZE_READ_AGG_* |  Histogram of total size of read accesses at MPI level, even if access is noncontiguous
| CP_EXTENT_READ_* |    Histogram of read extents
| CP_SIZE_WRITE_* | Histogram of write access sizes at POSIX level
| CP_SIZE_WRITE_AGG_* | Histogram of total size of write accesses at MPI level, even if access is noncontiguous
| CP_EXTENT_WRITE_* |   Histogram of write extents
| CP_STRIDE[1-4]_STRIDE |   Size of 4 most common stride patterns
| CP_STRIDE[1-4]_COUNT |    Count of 4 most common stride patterns
| CP_ACCESS[1-4]_ACCESS |   4 most common access sizes
| CP_ACCESS[1-4]_COUNT |    Count of 4 most common access sizes
275
| CP_DEVICE |   File system identifier; correlates with mount table shown earlier.  In Darshan 2.2.5 and earlier, this is the device ID reported by stat(), in Darshan 2.2.6 and later, this is an opaque identifier generated by Darshan.
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302
| CP_SIZE_AT_OPEN | Size of file at first open time
| CP_FASTEST_RANK | The MPI rank of the rank with smallest time spent in I/O
| CP_FASTEST_RANK_BYTES |   The number of bytes transferred by the rank with smallest time spent in I/O
| CP_SLOWEST_RANK | The MPI rank of the rank with largest time spent in I/O
| CP_SLOWEST_RANK_BYTES |   The number of bytes transferred by the rank with the largest time spent in I/O
|====

The following is a list of floating point statistics that are available for
each file:

[cols="40%,60%",options="header"]
|====
| output line |  description
| CP_F_OPEN_TIMESTAMP | Timestamp of first time that the file was opened
| CP_F_CLOSE_TIMESTAMP |    Timestamp of the last time that the file was closed
| CP_F_READ_START_TIMESTAMP |   Timestamp that the first read operation began
| CP_F_READ_END_TIMESTAMP | Timestamp that the last read operation ended
| CP_F_WRITE_START_TIMESTAMP |  Timestamp that the first write operation begin
| CP_F_WRITE_END_TIMESTAMP |    Timestamp that the last write operation ended
| CP_F_POSIX_READ_TIME |    Cumulative time spent reading at the POSIX level
| CP_F_POSIX_WRITE_TIME |   Cumulative time spent in write, fsync, and fdatasync at the POSIX level
| CP_F_POSIX_META_TIME |    Cumulative time spent in open, close, stat, and seek at the POSIX level
| CP_F_MPI_META_TIME |  Cumulative time spent in open and close at the MPI-IO level
| CP_F_MPI_READ_TIME |  Cumulative time spent reading at the MPI-IO level
| CP_F_MPI_WRITE_TIME | Cumulative time spent write and sync at the MPI-IO level
| CP_F_MAX_READ_TIME |  Duration of the slowest individual POSIX read operation
| CP_F_MAX_WRITE_TIME | Duration of the slowest individual POSIX write operation
Philip Carns's avatar
Philip Carns committed
303
| CP_F_FASTEST_RANK_TIME |  The time of the rank which had the smallest amount of time spent in I/O.  If the file was accessed usign MPI-IO it combines the MPI meta, read, and write time.  If the file was not accessed with MPI-IO then it combines the posix meta, read, and write time.
304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319
| CP_F_SLOWEST_RANK_TIME |  The time of the rank which had the largest amount of time spent in I/O
| CP_F_VARIANCE_RANK_TIME | The population variance for I/O time of all the ranks
| CP_F_VARIANCE_RANK_BYTES |    The population variance for bytes transferred of all the ranks
|====

==== Additional summary output

===== Performance

Use the '--perf' option to get performance approximations using four
different computations.

.Example output
----
# performance
# -----------
320 321 322 323
# total_bytes: 134217728
#
# I/O timing for unique files (seconds):
# ...........................
324
# unique files: slowest_rank_io_time: 0.000000
325
# unique files: slowest_rank_meta_time: 0.000000
326
# unique files: slowest_rank: 0
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344
#
# I/O timing for shared files (seconds):
# (multiple estimates shown; time_by_slowest is generally the most accurate)
# ...........................
# shared files: time_by_cumul_io_only: 0.042264
# shared files: time_by_cumul_meta_only: 0.000325
# shared files: time_by_open: 0.064986
# shared files: time_by_open_lastio: 0.064966
# shared files: time_by_slowest: 0.057998
#
# Aggregate performance, including both shared and unique files (MiB/s):
# (multiple estimates shown; agg_perf_by_slowest is generally the most
# accurate)
# ...........................
# agg_perf_by_cumul: 3028.570529
# agg_perf_by_open: 1969.648064
# agg_perf_by_open_lastio: 1970.255248
# agg_perf_by_slowest: 2206.983935
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395
----

===== Files
Use the `--file` option to get totals based on file usage.
The first column is the count of files for that type, the second column is
number of bytes for that type and the third column is the maximum offset
accessed.

* total: All files
* read_only: Files that were only read from
* write_only: Files that were only written to
* unique: Files that were opened on only one rank
* shared: File that were opened by more than one rank

.Example output
----
# files
# -----
# total: 1542 236572244952 154157611
# read_only: 3 133998651 122805519
# write_only: 1539 236438246301 154157611
# read_write: 0 0 0
# unique: 2 11193132 11193063
# shared: 1540 236561051820 154157611
----

===== Totals

Use the `--total` option to get all statistics as an aggregate total.
Statistics that make sense to be aggregated are aggregated. Other statistics
may be a minimum or maximum if that makes sense. Other data maybe zeroed if
it doesn't make sense to aggregate the data.

.Example output
----
total_CP_INDEP_OPENS: 0
total_CP_COLL_OPENS: 196608
total_CP_INDEP_READS: 0
total_CP_INDEP_WRITES: 0
total_CP_COLL_READS: 0
total_CP_COLL_WRITES: 0
total_CP_SPLIT_READS: 0
total_CP_SPLIT_WRITES: 1179648
total_CP_NB_READS: 0
total_CP_NB_WRITES: 0
total_CP_SYNCS: 0
total_CP_POSIX_READS: 983045
total_CP_POSIX_WRITES: 33795
total_CP_POSIX_OPENS: 230918
...
----
396

397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422
===== File list

Use the `--file-list` option to produce a list of files opened by the
application along with estimates of the amount of time spent accessing each
file.

.Example output
----
# Per-file summary of I/O activity.
# <hash>: hash of file name
# <suffix>: last 15 characters of file name
# <type>: MPI or POSIX
# <nprocs>: number of processes that opened the file
# <slowest>: (estimated) time in seconds consumed in IO by slowest process
# <avg>: average time in seconds consumed in IO per process

# <hash>    <suffix>    <type>  <nprocs>    <slowest>   <avg>
17028232952633024488    amples/boom.dat MPI 2   0.000363    0.012262
----

===== Detailed file list

The `--file-list-detailed` is the same as --file-list except that it
produces many columns of output containing statistics broken down by file.
This option is mainly useful for automated analysis.

423 424 425 426 427 428 429 430
=== Other command line utilities

The darshan-util package includes a number of other utilies that can be
summarized briefly as follows:

* darshan-convert: converts an existing log file to the newest log format.
If the output file has a .bz2 extension, then it will be re-compressed in
bz2 format rather than gz format.  It also has command line options for
431 432
anonymizing personal data, adding metadata annotation to the log header, and
restricting the output to a specific instrumented file.
433
* darshan-diff: compares two darshan log files and shows counters that
Philip Carns's avatar
Philip Carns committed
434
differ.
435
* darshan-analyzer: walks an entire directory tree of Darshan log files and
Philip Carns's avatar
Philip Carns committed
436
produces a summary of the types of access methods used in those log files.
437
* darshan-logutils*: this is a library rather than an executable, but it
Kevin Harms's avatar
Kevin Harms committed
438
provides a C interface for opening and parsing Darshan log files.  This is
439 440 441 442
the recommended method for writing custom utilities, as darshan-logutils
provides a relatively stable interface across different versions of Darshan
and different log formats.