darshan-util.txt 33.1 KB
Newer Older
1 2 3 4 5
Darshan-util installation and usage
===================================

== Introduction

6 7 8 9 10
This document describes darshan-util, a collection of tools for
parsing and summarizing log files produced by Darshan instrumentation.
The darshan-util package can be installed and used on any system
regardless of where the logs were originally generated.  Darshan log
files are platform-independent.
11

12 13 14
More information about Darshan can be found at the 
http://www.mcs.anl.gov/darshan[Darshan web site].

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
== Requirements

Darshan-util has only been tested in Linux environments, but will likely
work in other Unix-like environments as well.  

.Hard requirements
* C compiler
* zlib development headers and library (zlib-dev or similar)

.Optional requirements
* libbz2 development headers and library (libbz2-dev or similar)
* Perl
* pdflatex
* gnuplot 4.2 or later
* epstopdf

31
== Compilation and installation
32 33 34 35 36 37 38 39 40 41

.Configure and build example
----
tar -xvzf darshan-<version-number>.tar.gz
cd darshan-<version-number>/darshan-util
./configure
make
make install
----

42 43 44 45 46 47 48
The darshan-util package is intended to be used on a login node or
workstation.  For most use cases this means that you should
either leave CC to its default setting or specify a local compiler.  This is
in contrast to the darshan-runtime documentation, which suggests setting CC
to mpicc because the runtime library will be used in the compute node
environment.

49 50 51 52 53 54 55
You can specify `--prefix` to install darshan-util in a specific location
(such as in your home directory for non-root installations).  See
`./configure --help` for additional optional arguments, including how to
specify alternative paths for zlib and libbz2 development libraries.
darshan-util also supports VPATH or "out-of-tree" builds if you prefer that
method of compilation.

56 57 58
The `--enable-shared` argument to configure can be used to enable
compilation of a shared version of the darshan-util library.

59 60 61 62 63 64 65 66 67
== Analyzing log files

Each time a darshan-instrumented application is executed, it will generate a
single log file summarizing the I/O activity from that application.  See the
darshan-runtime documentation for more details, but the log file for a given
application will likely be found in a centralized directory, with the path
and log file name in the following format:

----
68
<YEAR>/<MONTH>/<DAY>/<USERNAME>_<BINARY_NAME>_<JOB_ID>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan
69 70 71 72
----

This is a binary format file that summarizes I/O activity. As of version
2.0.0 of Darshan, this file is portable and does not have to be analyzed on
73 74 75 76 77
the same system that executed the job. Also, note that Darshan logs generated
with Darshan versions preceding version 3.0 will have the extension `darshan.gz`
(or `darshan.bz2` if compressed using bzip2 format). These logs are not compatible
with Darshan 3.0 utilities, and thus must be analyzed using an appropriate version
(2.x) of the darshan-util package.
78 79 80 81

=== darshan-job-summary.pl

You can generate a graphical summary
Philip Carns's avatar
Philip Carns committed
82
of the I/O activity for a job by using the `darshan-job-summary.pl` graphical summary
83 84 85 86 87 88 89 90 91 92 93 94 95
tool as in the following example:

----
darshan-job-summary.pl carns_my-app_id114525_7-27-58921_19.darshan.gz
----

This utility requires Perl, pdflatex, epstopdf, and gnuplot in order to
generate its summary.  By default, the output is written to a multi-page
pdf file based on the name of the input file (in this case it would
produce a `carns_my-app_id114525_7-27-58921_19.pdf` output file).
You can also manually specify the name of the output file using the
`--output` argument.

96 97
An example of the output produced by darshan-job-summary.pl can be found
link:http://www.mcs.anl.gov/research/projects/darshan/docs/ssnyder_ior-hdf5_id3655016_9-23-29011-12333993518351519212_1.darshan.pdf[HERE].
98

99 100 101 102 103 104
*NOTE*: The darshan-job-summary tool depends on a few LaTeX packages that may not
be availalble by default on all systems, including: lastpage, subfigure, and
threeparttable. These packages can be found and installed using your system's package
manager. For instance, the packages can be installed on Debian or Ubuntu systems as
follows: `apt-get install texlive-latex-extra`

105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
=== darshan-summary-per-file.sh

This utility is similar to darshan-job-summary.pl, except that it produces a
separate pdf summary for every file accessed by an application.  It can be
executed as follows:

----
darshan-summary-per-file.sh carns_my-app_id114525_7-27-58921_19.darshan.gz output-dir
----

The second argument is the name of a directory (to be created) that will
contain the collection of pdf files.  Note that this utility probably
is not appropriate if your application opens a large number of files.

If you would like to produce a summary for a single
specific file, then you can run the following command to produce a quick
list of the files opened by an application and the amount of time spent
performing I/O to each of them:

----
darshan-parser --file-list carns_my-app_id114525_7-27-58921_19.darshan.gz
----

Once you have identified a specific file of interest, then you can produce a
summary for that specific file with the following commands:

----
darshan-convert --file HASH carns_my-app_id114525_7-27-58921_19.darshan.gz interesting_file.darshan.gz
darshan-job-summary.pl interesting_file.darshan.gz
----

The "HASH" argument is the hash of a file name as listed in the
+darshan-parser --file-list+ output.  The +interesting_file.darshan.gz+ file
produced by darshan-convert is like a normal Darshan log file, but it will only contain instrumentation for the
specified file.

141 142
=== darshan-parser

143 144 145 146
You can use the `darshan-parser` command line utility to obtain a
complete, human-readable, text-format dump of all information contained
in a log file.   The following example converts the contents of the
log file into a fully expanded text file:
147 148 149 150 151

----
darshan-parser carns_my-app_id114525_7-27-58921_19.darshan.gz > ~/job-characterization.txt
----

Philip Carns's avatar
Philip Carns committed
152
The format of this output is described in the following section.
153 154 155

=== Guide to darshan-parser output

156
The beginning of the output from darshan-parser displays a summary of
157
overall information about the job. Additional job-level summary information
158 159
can also be produced using the `--perf`, `--file`, `--file-list`,
`--file-list-detailed`, or `--total` command line options.  See the
160 161 162 163 164
<<addsummary,Additional summary output>> section for more information about
those options.

The following table defines the meaning
of each line in the default header section of the output:
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180

[cols="25%,75%",options="header"]
|====
|output line | description
| "# darshan log version" | internal version number of the Darshan log file
| "# exe" | name of the executable that generated the log file
| "# uid" | user id that the job ran as
| "# jobid" | job id from the scheduler
| "# start_time" | start time of the job, in seconds since the epoch
| "# start_time_asci" | start time of the job, in human readable format
| "# end_time" | end time of the job, in seconds since the epoch
| "# end_time_asci" | end time of the job, in human readable format
| "# nprocs" | number of MPI processes
| "# run time" | run time of the job in seconds
|====

181 182 183 184 185 186 187 188 189 190 191 192 193
==== Log file region sizes

The next portion of the parser output displays the size of each region
contained within the given log file. Each log file will contain the
following regions:

* header - constant-sized uncompressed header providing data on how to properly access the log
* job data - job-level metadata (e.g., start/end time and exe name) for the log
* record table - a table mapping Darshan record identifiers to full file name paths
* module data - each module (e.g., POSIX, MPI-IO, etc.) stores their I/O characterization data in distinct regions of the log

All regions of the log file are compressed (in libz or bzip2 format), except the header.

194 195 196 197 198 199 200
==== Table of mounted file systems

The next portion of the output shows a table of all general purpose file
systems that were mounted while the job was running. Each line uses the
following format:

----
201
<mount point> <fs type>
202 203 204 205 206 207 208 209
----

==== Format of I/O characterization fields

The remainder of the output will show characteristics for each file that was
opened by the application. Each line uses the following format:

----
210
<module> <rank> <record id> <counter name> <counter value> <file name> <mount point> <fs type>
211 212
----

213 214 215 216 217 218 219 220 221 222 223 224
The `<module>` column specifies the module responsible for recording this piece
of I/O characterization data. The `<rank>` column indicates the rank of the process
that opened the file. A rank value of -1 indicates that all processes opened the
same file. In that case, the value of the counter represents an aggregate across all
processes. The `<record id>` is a 64 bit hash of the file path/name that was opened.
It is used as a way to uniquely differentiate each file. The `<counter name>` is
the name of the statistic that the line is reporting, while the `<counter value>` is
the value of that statistic. A value of -1 indicates that Darshan was unable to
collect statistics for that particular counter, and the value should be ignored.
The `<file name>` field shows the complete file name the record corresponds to. The
`<mount point>` is the mount point of the file system that this file belongs to and
`<fs type>` is the type of that file system.
225 226 227

==== I/O characterization fields

228 229 230 231
The following tables show a list of integer statistics that are available for each of
Darshan's current instrumentation modules, along with a description of each. Unless
otherwise noted, counters include all variants of the call in question, such as
`read()`, `pread()`, and `readv()` for POSIX_READS.
232

233
.POSIX module
234 235
[cols="40%,60%",options="header"]
|====
236
| counter name | description
237 238 239
| POSIX_OPENS | Count of how many times the file was opened (INCLUDING `fileno` and `dup` operations)
| POSIX_FILENOS| Count of POSIX fileno operations
| POSIX_DUPS| Count of POSIX dup operations
240 241 242 243 244 245 246
| POSIX_READS | Count of POSIX read operations
| POSIX_WRITES | Count of POSIX write operations
| POSIX_SEEKS | Count of POSIX seek operations
| POSIX_STATS | Count of POSIX stat operations
| POSIX_MMAPS | Count of POSIX mmap operations
| POSIX_FSYNCS | Count of POSIX fsync operations
| POSIX_FDSYNCS | Count of POSIX fdatasync operations
247 248 249
| POSIX_RENAME_SOURCES| Number of times this file was the source of a rename operation
| POSIX_RENAME_TARGETS| Number of times this file was the target of a rename operation
| POSIX_RENAMED_FROM | If this file was a rename target, the Darshan record ID of the first rename source
250 251 252 253 254 255 256 257 258 259 260 261 262 263
| POSIX_MODE | Mode that the file was last opened in
| POSIX_BYTES_READ | Total number of bytes that were read from the file
| POSIX_BYTES_WRITTEN | Total number of bytes written to the file
| POSIX_MAX_BYTE_READ | Highest offset in the file that was read
| POSIX_MAX_BYTE_WRITTEN | Highest offset in the file that was written
| POSIX_CONSEC_READS | Number of consecutive reads (that were immediately adjacent to the previous access)
| POSIX_CONSEC_WRITES | Number of consecutive writes (that were immediately adjacent to the previous access)
| POSIX_SEQ_READS | Number of sequential reads (at a higher offset than where the previous access left off)
| POSIX_SEQ_WRITES | Number of sequential writes (at a higher offset than where the previous access left off)
| POSIX_RW_SWITCHES | Number of times that access toggled between read and write in consecutive operations
| POSIX_MEM_NOT_ALIGNED | Number of times that a read or write was not aligned in memory
| POSIX_MEM_ALIGNMENT | Memory alignment value (chosen at compile time)
| POSIX_FILE_NOT_ALIGNED | Number of times that a read or write was not aligned in file
| POSIX_FILE_ALIGNMENT | File alignment value.  This value is detected at
264 265
runtime on most file systems.  On Lustre, however, Darshan assumes a default
value of 1 MiB for optimal file alignment.
266 267 268 269 270 271 272 273
| POSIX_MAX_READ_TIME_SIZE | Size of the slowest POSIX read operation
| POSIX_MAX_WRITE_TIME_SIZE | Size of the slowest POSIX write operation
| POSIX_SIZE_READ_* | Histogram of read access sizes at POSIX level
| POSIX_SIZE_WRITE_* | Histogram of write access sizes at POSIX level
| POSIX_STRIDE[1-4]_STRIDE | Size of 4 most common stride patterns
| POSIX_STRIDE[1-4]_COUNT | Count of 4 most common stride patterns
| POSIX_ACCESS[1-4]_ACCESS | 4 most common POSIX access sizes
| POSIX_ACCESS[1-4]_COUNT | Count of 4 most common POSIX access sizes
274
| POSIX_FASTEST_RANK | The MPI rank with smallest time spent in POSIX I/O
275
| POSIX_FASTEST_RANK_BYTES | The number of bytes transferred by the rank with smallest time spent in POSIX I/O
276
| POSIX_SLOWEST_RANK | The MPI rank with largest time spent in POSIX I/O
277
| POSIX_SLOWEST_RANK_BYTES | The number of bytes transferred by the rank with the largest time spent in POSIX I/O
278 279
| POSIX_F_*_START_TIMESTAMP | Timestamp that the first POSIX file open/read/write/close operation began
| POSIX_F_*_END_TIMESTAMP | Timestamp that the last POSIX file open/read/write/close operation ended
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317
| POSIX_F_READ_TIME | Cumulative time spent reading at the POSIX level
| POSIX_F_WRITE_TIME | Cumulative time spent in write, fsync, and fdatasync at the POSIX level
| POSIX_F_META_TIME | Cumulative time spent in open, close, stat, and seek at the POSIX level
| POSIX_F_MAX_READ_TIME | Duration of the slowest individual POSIX read operation
| POSIX_F_MAX_WRITE_TIME | Duration of the slowest individual POSIX write operation
| POSIX_F_FASTEST_RANK_TIME | The time of the rank which had the smallest amount of time spent in POSIX I/O (cumulative read, write, and meta times)
| POSIX_F_SLOWEST_RANK_TIME | The time of the rank which had the largest amount of time spent in POSIX I/O
| POSIX_F_VARIANCE_RANK_TIME | The population variance for POSIX I/O time of all the ranks
| POSIX_F_VARIANCE_RANK_BYTES | The population variance for bytes transferred of all the ranks
|====

.MPI-IO module
[cols="40%,60%",options="header"]
|====
| counter name | description
| MPIIO_INDEP_OPENS | Count of non-collective MPI opens
| MPIIO_COLL_OPENS | Count of collective MPI opens
| MPIIO_INDEP_READS | Count of non-collective MPI reads
| MPIIO_INDEP_WRITES | Count of non-collective MPI writes
| MPIIO_COLL_READS | Count of collective MPI reads
| MPIIO_COLL_WRITES | Count of collective MPI writes
| MPIIO_SPLIT_READS | Count of MPI split collective reads
| MPIIO_SPLIT_WRITES | Count of MPI split collective writes
| MPIIO_NB_READS | Count of MPI non-blocking reads
| MPIIO_NB_WRITES | Count of MPI non-blocking writes
| MPIIO_SYNCS | Count of MPI file syncs
| MPIIO_HINTS | Count of MPI file hints used
| MPIIO_VIEWS | Count of MPI file views used
| MPIIO_MODE | MPI mode that the file was last opened in
| MPIIO_BYTES_READ | Total number of bytes that were read from the file at MPI level
| MPIIO_BYTES_WRITTEN | Total number of bytes written to the file at MPI level
| MPIIO_RW_SWITCHES | Number of times that access toggled between read and write in consecutive MPI operations
| MPIIO_MAX_READ_TIME_SIZE | Size of the slowest MPI read operation
| MPIIO_MAX_WRITE_TIME_SIZE | Size of the slowest MPI write operation
| MPIIO_SIZE_READ_AGG_* | Histogram of total size of read accesses at MPI level, even if access is noncontiguous
| MPIIO_SIZE_WRITE_AGG_* | Histogram of total size of write accesses at MPI level, even if access is noncontiguous
| MPIIO_ACCESS[1-4]_ACCESS | 4 most common MPI aggregate access sizes
| MPIIO_ACCESS[1-4]_COUNT | Count of 4 most common MPI aggregate access sizes
318
| MPIIO_FASTEST_RANK | The MPI rank with smallest time spent in MPI I/O
319
| MPIIO_FASTEST_RANK_BYTES | The number of bytes transferred by the rank with smallest time spent in MPI I/O
320
| MPIIO_SLOWEST_RANK | The MPI rank with largest time spent in MPI I/O
321 322 323 324 325 326 327 328 329 330 331 332
| MPIIO_SLOWEST_RANK_BYTES | The number of bytes transferred by the rank with the largest time spent in MPI I/O
| MPIIO_F_OPEN_TIMESTAMP | Timestamp of first time that the file was opened at MPI level
| MPIIO_F_READ_START_TIMESTAMP | Timestamp that the first MPI read operation began
| MPIIO_F_WRITE_START_TIMESTAMP | Timestamp that the first MPI write operation begin
| MPIIO_F_READ_END_TIMESTAMP | Timestamp that the last MPI read operation ended
| MPIIO_F_WRITE_END_TIMESTAMP | Timestamp that the last MPI write operation ended
| MPIIO_F_CLOSE_TIMESTAMP | Timestamp of the last time that the file was closed at MPI level
| MPIIO_READ_TIME | Cumulative time spent reading at MPI level
| MPIIO_WRITE_TIME | Cumulative time spent write and sync at MPI level
| MPIIO_META_TIME | Cumulative time spent in open and close at MPI level
| MPIIO_F_MAX_READ_TIME | Duration of the slowest individual MPI read operation
| MPIIO_F_MAX_WRITE_TIME | Duration of the slowest individual MPI write operation
333 334 335 336 337 338 339 340 341 342 343
| MPIIO_F_FASTEST_RANK_TIME | The time of the rank which had the smallest amount of time spent in MPI I/O (cumulative read, write, and meta times)
| MPIIO_F_SLOWEST_RANK_TIME | The time of the rank which had the largest amount of time spent in MPI I/O
| MPIIO_F_VARIANCE_RANK_TIME | The population variance for MPI I/O time of all the ranks
| MPIIO_F_VARIANCE_RANK_BYTES | The population variance for bytes transferred of all the ranks at MPI level
|====


.STDIO module
[cols="40%,60%",options="header"]
|====
| counter name | description
344 345
| STDIO_OPENS | Count of stdio file open operations (INCLUDING `fdopen` operations)
| STDIO_FDOPENS| Count of stdio fdopen operations
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366
| STDIO_READS | Count of stdio read operations
| STDIO_WRITES | Count of stdio write operations
| STDIO_SEEKS | Count of stdio seek operations
| STDIO_FLUSHES | Count of stdio flush operations
| STDIO_BYTES_WRITTEN | Total number of bytes written to the file using stdio operations
| STDIO_BYTES_READ | Total number of bytes read from the file using stdio operations
| STDIO_MAX_BYTE_READ | Highest offset in the file that was read
| STDIO_MAX_BYTE_WRITTEN | Highest offset in the file that was written
| STDIO_FASTEST_RANK | The MPI rank with the smallest time spent in stdio operations
| STDIO_FASTEST_RANK_BYTES | The number of bytes transferred by the rank with the smallest time spent in stdio operations
| STDIO_SLOWEST_RANK | The MPI rank with the largest time spent in stdio operations
| STDIO_SLOWEST_RANK_BYTES | The number of bytes transferred by the rank with the largest time spent in stdio operations
| STDIO_META_TIME | Cumulative time spent in stdio open/close/seek operations
| STDIO_WRITE_TIME | Cumulative time spent in stdio write operations
| STDIO_READ_TIME | Cumulative time spent in stdio read operations
| STDIO_*_START_TIMESTAMP | Timestamp that the first stdio file open/read/write/close operation began
| STDIO_*_END_TIMESTAMP | Timestamp that the last stdio file open/read/write/close operation ended
| STDIO_F_FASTEST_RANK_TIME | The time of the rank which had the smallest time spent in stdio I/O (cumulative read, write, and meta times)
| STDIO_F_SLOWEST_RANK_TIME | The time of the rank which had the largest time spent in stdio I/O
| STDIO_F_VARIANCE_RANK_TIME | The population variance for stdio I/O time of all the ranks
| STDIO_F_VARIANCE_RANK_BYTES | The population variance for bytes transferred of all the ranks
367 368
|====

369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388
.HDF5 module
[cols="40%,60%",options="header"]
|====
| counter name | description
| HDF5_OPENS | Count of HDF5 opens
| HDF5_F_OPEN_TIMESTAMP | Timestamp of first time that the file was opened at HDF5 level
| HDF5_F_CLOSE_TIMESTAMP | Timestamp of the last time that the file was closed at HDF5 level
|====

.PnetCDF module
[cols="40%,60%",options="header"]
|====
| counter name | description
| PNETCDF_INDEP_OPENS | Count of PnetCDF independent opens
| PNETCDF_COLL_OPENS | Count of PnetCDF collective opens
| PNETCDF_F_OPEN_TIMESTAMP | Timestamp of first time that the file was opened at PnetCDF level
| PNETCDF_F_CLOSE_TIMESTAMP | Timestamp of the last time that the file was closed at PnetCDF level
|====

===== Additional modules 
389

390
.BG/Q module (if enabled on BG/Q systems)
391 392
[cols="40%,60%",options="header"]
|====
393 394 395 396 397 398 399 400 401 402 403 404 405
| counter name | description
| BGQ_CSJOBID | Control system job ID
| BGQ_NNODES | Total number of BG/Q compute nodes
| BGQ_RANKSPERNODE | Number of MPI ranks per compute node
| BGQ_DDRPERNODE | Size of compute node DDR in MiB
| BGQ_INODES | Total number of BG/Q I/O nodes
| BGQ_ANODES | Dimension of A torus
| BGQ_BNODES | Dimension of B torus
| BGQ_CNODES | Dimension of C torus
| BGQ_DNODES | Dimension of D torus
| BGQ_ENODES | Dimension of E torus
| BGQ_TORUSENABLED | Bitfield indicating enabled torus dimensions
| BGQ_F_TIMESTAMP | Timestamp of when BG/Q data was collected
406 407
|====

408 409 410 411 412 413 414 415 416 417 418 419 420

.Lustre module (if enabled, for Lustre file systems)
[cols="40%,60%",options="header"]
|====
| counter name | description
| LUSTRE_OSTS | number of OSTs (object storage targets) for the file system
| LUSTRE_MDTS | number of MDTs (metadata targets) for the file system
| LUSTRE_STRIPE_OFFSET | OST id offset specified at file creation time
| LUSTRE_STRIPE_SIZE | stripe size for the file in bytes
| LUSTRE_STRIPE_WIDTH | number of OSTs over which the file is striped
| LUSTRE_OST_ID_* | indices of OSTs over which the file is striped
|====

421
==== Additional summary output
422
[[addsummary]]
423

424 425 426
The following sections describe addtitional parser options that provide
summary I/O characterization data for the given log.

427
*NOTE*: These options are currently only supported by the POSIX, MPI-IO, and stdio modules.
428

429 430
===== Performance

431
Job performance information can be generated using the `--perf` command-line option.
432 433 434 435 436

.Example output
----
# performance
# -----------
437 438 439 440
# total_bytes: 134217728
#
# I/O timing for unique files (seconds):
# ...........................
441
# unique files: slowest_rank_io_time: 0.000000
442
# unique files: slowest_rank_meta_only_time: 0.000000
443
# unique files: slowest_rank: 0
444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461
#
# I/O timing for shared files (seconds):
# (multiple estimates shown; time_by_slowest is generally the most accurate)
# ...........................
# shared files: time_by_cumul_io_only: 0.042264
# shared files: time_by_cumul_meta_only: 0.000325
# shared files: time_by_open: 0.064986
# shared files: time_by_open_lastio: 0.064966
# shared files: time_by_slowest: 0.057998
#
# Aggregate performance, including both shared and unique files (MiB/s):
# (multiple estimates shown; agg_perf_by_slowest is generally the most
# accurate)
# ...........................
# agg_perf_by_cumul: 3028.570529
# agg_perf_by_open: 1969.648064
# agg_perf_by_open_lastio: 1970.255248
# agg_perf_by_slowest: 2206.983935
462 463
----

464 465 466 467 468 469 470 471 472 473 474 475 476 477
The `total_bytes` line shows the total number of bytes transferred
(read/written) by the job.  That is followed by three sections:

.I/O timing for unique files

This section reports information about any files that were *not* opened
by every rank in the job.  This includes independent files (opened by
1 process) and partially shared files (opened by a proper subset of
the job's processes). The I/O time for this category of file access
is reported based on the *slowest* rank of all processes that performed this
type of file access.

* unique files: slowest_rank_io_time: total I/O time for unique files
  (including both metadata + data transfer time)
478
* unique files: slowest_rank_meta_only_time: metadata time for unique files
479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
* unique files: slowest_rank: the rank of the slowest process

.I/O timing for shared files

This section reports information about files that were globally shared (i.e.
opened by every rank in the job).  This section estimates performance for
globally shared files using four different methods.  The `time_by_slowest`
is generally the most accurate, but it may not available in some older Darshan
log files. 

* shared files: time_by_cumul_*: adds the cumulative time across all
  processes and divides by the number of processes (inaccurate when there is
  high variance among processes).
** shared files: time_by_cumul_io_only: include metadata AND data transfer
   time for global shared files
** shared files: time_by_cumul_meta_only: metadata time for global shared
   files
* shared files: time_by_open: difference between timestamp of open and
  close (inaccurate if file is left open without I/O activity)
* shared files: time_by_open_lastio: difference between timestamp of open
  and the timestamp of last I/O (similar to above but fixes case where file is
  left open after I/O is complete)
* shared files: time_by_slowest : measures time according to which rank was
  the slowest to perform both metadata operations and data transfer for each
  shared file. (most accurate but requires newer log version)

.Aggregate performance

Performance is calculated by dividing the total bytes by the I/O time
(shared files and unique files combined) computed
using each of the four methods described in the previous output section. Note the unit for total bytes is
Byte and for the aggregate performance is MiB/s (1024*1024 Bytes/s).

512 513
===== Files
Use the `--file` option to get totals based on file usage.
514 515 516
Each line has 3 columns. The first column is the count of files for that
type of file, the second column is number of bytes for that type, and the third
column is the maximum offset accessed.
517 518 519 520

* total: All files
* read_only: Files that were only read from
* write_only: Files that were only written to
521
* read_write: Files that were both read and written
522
* unique: Files that were opened on only one rank
523
* shared: Files that were opened by more than one rank
524

525

526 527
.Example output
----
528 529 530 531
# <file_type> <file_count> <total_bytes> <max_byte_offset>
# total: 5 4371499438884 4364699616485
# read_only: 2 4370100334589 4364699616485
# write_only: 1 1399104295 1399104295
532
# read_write: 0 0 0
533 534
# unique: 0 0 0
# shared: 5 4371499438884 4364699616485
535 536 537 538
----

===== Totals

539 540 541 542 543
Use the `--total` option to get all statistics as an aggregate total rather
than broken down per file.  Each field is either summed across files and
process (for values such as number of opens), set to global minimums and
maximums (for values such as open time and close time), or zeroed out (for
statistics that are nonsensical in aggregate).
544 545 546

.Example output
----
547 548 549 550 551 552 553 554 555 556 557 558 559
total_POSIX_OPENS: 1024
total_POSIX_READS: 0
total_POSIX_WRITES: 16384
total_POSIX_SEEKS: 16384
total_POSIX_STATS: 1024
total_POSIX_MMAPS: 0
total_POSIX_FOPENS: 0
total_POSIX_FREADS: 0
total_POSIX_FWRITES: 0
total_POSIX_BYTES_READ: 0
total_POSIX_BYTES_WRITTEN: 68719476736
total_POSIX_MAX_BYTE_READ: 0
total_POSIX_MAX_BYTE_WRITTEN: 67108863
560 561
...
----
562

563 564 565 566 567 568 569 570 571
===== File list

Use the `--file-list` option to produce a list of files opened by the
application along with estimates of the amount of time spent accessing each
file.

.Example output
----
# Per-file summary of I/O activity.
572
# -----
573 574
# <record_id>: darshan record id for this file
# <file_name>: full file name
575 576 577 578
# <nprocs>: number of processes that opened the file
# <slowest>: (estimated) time in seconds consumed in IO by slowest process
# <avg>: average time in seconds consumed in IO per process

579 580
# <record_id>   <file_name> <nprocs>    <slowest>   <avg>
5041708885572677970 /projects/SSSPPg/snyder/ior/ior.dat 1024    16.342061   1.705930
581 582
----

583 584 585 586 587 588 589
This data could be post-processed to compute more in-depth statistics, such as
the total number of MPI files and total number of POSIX files used in a
job, categorizing files into independent/unique/local files (opened by
1 process), subset/partially shared files (opened by a proper subset of
processes) or globally shared files (opened by all processes), and ranking
files according to how much time was spent performing I/O in each file.

590 591 592 593
===== Detailed file list

The `--file-list-detailed` is the same as --file-list except that it
produces many columns of output containing statistics broken down by file.
594
This option is mainly useful for more detailed automated analysis.
595

596 597
=== darshan-dxt-parser

598 599 600
The `darshan-dxt-parser` utility can be used to parse DXT traces out of Darshan
log files, assuming the corresponding application was executed with the DXT
modules enabled. The following example parses all DXT trace information out
601 602 603 604 605 606 607 608 609 610 611 612
of a Darshan log file and stores it in a text file:

----
darshan-dxt-parser shane_ior_id25016_1-31-38066-13864742673678115131_1.darshan > ~/ior-trace.txt
----

=== Guide to darshan-dxt-parser output

The preamble to `darshan-dxt-parser` output is identical to that of the traditional
`darshan-parser` utility, which is described above.

`darshan-dxt-parser` displays detailed trace information contained within a Darshan log
613
that was generated with DXT instrumentation enabled. Trace data is captured from both
614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654
POSIX and MPI-IO interfaces. Example output is given below:

.Example output
----
# ***************************************************
# DXT_POSIX module data
# ***************************************************

# DXT, file_id: 16457598720760448348, file_name: /tmp/test/testFile
# DXT, rank: 0, hostname: shane-thinkpad
# DXT, write_count: 4, read_count: 4
# DXT, mnt_pt: /, fs_type: ext4
# Module    Rank  Wt/Rd  Segment          Offset       Length    Start(s)      End(s)
 X_POSIX       0  write        0               0       262144      0.0029      0.0032
 X_POSIX       0  write        1          262144       262144      0.0032      0.0035
 X_POSIX       0  write        2          524288       262144      0.0035      0.0038
 X_POSIX       0  write        3          786432       262144      0.0038      0.0040
 X_POSIX       0   read        0               0       262144      0.0048      0.0048
 X_POSIX       0   read        1          262144       262144      0.0049      0.0049
 X_POSIX       0   read        2          524288       262144      0.0049      0.0050
 X_POSIX       0   read        3          786432       262144      0.0050      0.0051

# ***************************************************
# DXT_MPIIO module data
# ***************************************************

# DXT, file_id: 16457598720760448348, file_name: /tmp/test/testFile
# DXT, rank: 0, hostname: shane-thinkpad
# DXT, write_count: 4, read_count: 4
# DXT, mnt_pt: /, fs_type: ext4
# Module    Rank  Wt/Rd  Segment       Length    Start(s)      End(s)
 X_MPIIO       0  write        0       262144      0.0029      0.0032
 X_MPIIO       0  write        1       262144      0.0032      0.0035
 X_MPIIO       0  write        2       262144      0.0035      0.0038
 X_MPIIO       0  write        3       262144      0.0038      0.0040
 X_MPIIO       0   read        0       262144      0.0048      0.0049
 X_MPIIO       0   read        1       262144      0.0049      0.0049
 X_MPIIO       0   read        2       262144      0.0049      0.0050
 X_MPIIO       0   read        3       262144      0.0050      0.0051
----

655
===== DXT POSIX module
656 657 658

This module provides details on each read or write access at the POSIX layer.
The trace output is organized first by file then by process rank. So, for each
659
file accessed by the application, DXT will provide each process's I/O trace
660 661 662
segments in separate blocks, ordered by increasing process rank. Within each
file/rank block, I/O trace segments are ordered chronologically.

663
Before providing details on each I/O operation, DXT provides a short preamble
664 665 666 667 668 669 670 671 672 673 674 675 676
for each file/rank trace block with the following bits of information: the Darshan
identifier for the file (which is equivalent to the identifers used by Darshan in its
traditional modules), the full file path, the corresponding MPI rank the current
block of trace data belongs to, the hostname associated with this process rank, the
number of individual POSIX read and write operations by this process, and the mount
point and file system type corresponding to the traced file.

The output format for each indvidual I/O operation segment is:

----
# Module    Rank  Wt/Rd  Segment          Offset       Length    Start(s)      End(s)
----

677
* Module: corresponding DXT module (DXT_POSIX or DXT_MPIIO)
678 679 680 681 682 683 684 685
* Rank: process rank responsible for I/O operation
* Wt/Rd: whether the operation was a write or read
* Segment: The operation number for this segment (first operation is segment 0)
* Offset: file offset the I/O operation occured at
* Length: length of the I/O operation in bytes
* Start: timestamp of the start of the operation (w.r.t. application start time)
* End: timestamp of the end of the operation (w.r.t. application start time)

686
===== DXT MPI-IO module
687 688 689 690 691 692

If the MPI-IO interface is used by an application, this module provides details on
each read or write access at the MPI-IO layer. This data is often useful in
understanding how MPI-IO read or write operations map to underlying POSIX read
or write operations issued to the traced file.

693
The output format for the DXT MPI-IO module is essentially identical to the DXT
694 695
POSIX module, except that the offset of file operations is not tracked.

696
=== Other darshan-util utilities
697 698 699 700 701

The darshan-util package includes a number of other utilies that can be
summarized briefly as follows:

* darshan-convert: converts an existing log file to the newest log format.
702 703
If the `--bzip2` flag is given, then the output file will be re-compressed in
bzip2 format rather than libz format.  It also has command line options for
704 705
anonymizing personal data, adding metadata annotation to the log header, and
restricting the output to a specific instrumented file.
706 707
* darshan-diff: provides a text diff of two Darshan log files, comparing both
job-level metadata and module data records between the files.
708
* darshan-analyzer: walks an entire directory tree of Darshan log files and
Philip Carns's avatar
Philip Carns committed
709
produces a summary of the types of access methods used in those log files.
710
* darshan-logutils*: this is a library rather than an executable, but it
711
provides a C interface for opening and parsing Darshan log files.  This is
712 713 714
the recommended method for writing custom utilities, as darshan-logutils
provides a relatively stable interface across different versions of Darshan
and different log formats.
715 716
* dxt_analyzer: plots the read or write activity of a job using data obtained
from Darshan's DXT modules (if DXT is enabled).