darshan-modularization.txt 28.1 KB
Newer Older
1
:data-uri:
Shane Snyder's avatar
Shane Snyder committed
2

3 4
Darshan modularization branch development notes
===============================================
5 6 7

== Introduction

8 9
Darshan is a lightweight toolkit for characterizing the I/O performance of instrumented
HPC applications.
10

11 12 13 14 15 16 17
Starting with version 3.0.0, the Darshan runtime environment and log file format have
been redesigned such that new "instrumentation modules" can be added without breaking
existing tools. Developers are given a framework to implement arbitrary instrumentation
modules, which are responsible for gathering I/O data from a specific system component
(which could be from an I/O library, platform-specific data, etc.). Darshan can then
manage these modules at runtime and create a valid Darshan log regardless of how many
or what types of modules are used.
18

19 20
== Checking out and building the modularization branch

21 22 23 24
The Darshan source code is available at the following GitLab project page:
https://xgitlab.cels.anl.gov/darshan/darshan. It is worth noting that this page
also provides issue tracking to provide users the ability to browse known issues
with the code or to report new issues.
25

26 27
The following commands can be used to clone the Darshan source code and checkout
the modularization branch:
28 29

----
30 31
git clone git@xgitlab.cels.anl.gov:darshan/darshan.git
cd darshan
32 33 34
git checkout dev-modular
----

35 36
For details on configuring, building, and using the Darshan runtime and utility
repositories, consult the documentation from previous versions
37 38 39 40 41
(http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-runtime.html[darshan-runtime] and
http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html[darshan-util]) -- the
necessary steps for building these repositories should not have changed in the new version of
Darshan.

42
== Darshan dev-modular overview
43

44
The Darshan source tree is organized into two primary components:
45

46
* *darshan-runtime*: Darshan runtime framework necessary for instrumenting MPI
47 48 49 50 51
applications and generating I/O characterization logs.

* *darshan-util*: Darshan utilities for analyzing the contents of a given Darshan
I/O characterization log.

52 53 54 55 56
The following subsections provide detailed overviews of each of these components to
give a better understanding of the architecture of the modularized version of Darshan.
In link:darshan-modularization.html#_adding_new_instrumentation_modules[Section 4], we
actually outline the necessary steps for integrating new instrumentation modules into
Darshan.
57

58
=== Darshan-runtime
59

60
The primary responsibilities of the darshan-runtime component are:
61

62
* intercepting I/O functions of interest from a target application;
63

64
* extracting statistics, timing information, and other data characterizing the application's I/O workload;
65

66
* compressing I/O characterization data and corresponding metadata;
67

68
* logging the compressed I/O characterization to file for future evaluation
69

Shane Snyder's avatar
Shane Snyder committed
70 71 72 73 74 75 76 77 78 79
The first two responsibilities are the burden of module developers, while the last two are handled
automatically by Darshan.

In general, instrumentation modules are composed of:

* wrapper functions for intercepting I/O functions;

* internal functions for initializing and maintaining internal data structures and module-specific
  I/O characterization data;

Shane Snyder's avatar
Shane Snyder committed
80
* a set of functions for interfacing with the Darshan runtime environment
81

82 83 84 85 86 87
A block diagram illustrating the interaction of an example POSIX instrumentation module and the
Darshan runtime environment is given below in Figure 1.

.Darshan runtime environment
image::darshan-dev-modular-runtime.png[align="center"]

Shane Snyder's avatar
Shane Snyder committed
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
As shown in Figure 1, the Darshan runtime environment is just a library (libdarshan) which
intercepts and instruments functions of interest made by an application to existing system
libraries. Two primary components of this library are `darshan-core` and `darshan-common`.
`darshan-core` is the central component which manages the initialization/shutdown of Darshan,
coordinates with active instrumentation modules, and writes I/O characterization logs to disk,
among other things. `darshan-core` intercepts `MPI_Init()` to initialize key internal data
stuctures and intercepts `MPI_Finalize()` to initiate Darshan's shutdown process. `darshan-common`
simply provides module developers with functionality that is likely to be reused across modules
to minimize development and maintenance costs. Instrumentation modules must utilize `darshan-core`
to register themselves and corresponding I/O records with Darshan so they can be added to the
output I/O characterization. While not shown in Figure 1, numerous modules can be registered
with Darshan at any given time and Darshan is capable of correlating records between these
modules.

In the next three subsections, we describe instrumentation modules, the `darshan-core` component,
103
and the `darshan-common` component in more detail.
Shane Snyder's avatar
Shane Snyder committed
104

105 106
==== Instrumentation modules

107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
The new modularized version of Darshan allows for the generation of I/O characterizations
composed from numerous instrumentation modules, where an instrumentation module is simply a
Darshan component responsible for capturing I/O data from some arbitrary source. For example,
distinct instrumentation modules may be defined for different I/O interfaces or to gather
system-specific I/O parameters from a given computing system. Each instrumentation module
interfaces with the `darshan-core` component to coordinate its initialization and shutdown
and to provide output I/O characterization data to be written to log.

In general, there are two different methods an instrumentation module can use to initialize
itself: static initialization at Darshan startup time or dynamic initialization within
intercepted function calls during application execution. The initialization process should
initialize module-specific data structures and register the module with the `darshan-core`
component so it is included in the output I/O characterization.

The static initialization approach is useful for modules that do not have function calls
that can be intercepted and instead can just grab all I/O characterization data at Darshan
startup or shutdown time. A module can be statically initialized at Darshan startup time
124 125
by adding its initializatin routine to the `mod_static_init_fns` list at the top of the
`lib/darshan-core.c` source file.
126

127 128 129 130
*NOTE*: Modules may wish to add a corresponding configure option to disable the module
from attempting to gather I/O data. The ability to disable a module using a configure
option is especially necessary for system-specific modules which can not be built or
used on other systems.
131 132 133 134 135 136 137

Most instrumentation modules can just bootstrap themselves within wrapper functions during
normal application execution. Each of Darshan's current I/O library instrumentation modules
(POSIX, MPI-IO, HDF5, PnetCDF) follow this approach. Each wrapper function should just include
logic to initialize data structures and register with `darshan-core` if this initialization
has not already occurred. Darshan intercepts function calls of interest by inserting these
wrappers at compile time for statically linked executables (e.g., using the linkers
Shane Snyder's avatar
Shane Snyder committed
138
`--wrap` mechanism) and at runtime for dynamically linked executables (using LD_PRELOAD).
139 140 141 142

*NOTE*: Modules should not perform any I/O or communication within wrapper functions. Darshan records
I/O data independently on each application process, then merges the data from all processes when the
job is shutting down. This defers expensive I/O and communication operations to the shutdown process,
Shane Snyder's avatar
Shane Snyder committed
143
minimizing Darshan's impact on application I/O performance.
144 145 146 147 148 149 150 151 152

When the instrumented application terminates and Darshan begins its shutdown procedure, it requires
a way to interface with any active modules that have data to contribute to the output I/O characterization.
Darshan requires that module developers implement the following functions to allow the Darshan runtime
environment to coordinate with modules while shutting down:

[source,c]
struct darshan_module_funcs
{
Shane Snyder's avatar
Shane Snyder committed
153
    void (*begin_shutdown)(void);
Shane Snyder's avatar
Shane Snyder committed
154
    void (*get_output_data)(
155
        MPI_Comm mod_comm,
156
        darshan_record_id *shared_recs,
157 158 159
        int shared_rec_count,
        void** mod_buf,
        int* mod_buf_sz
160
    );
161
    void (*shutdown)(void);
162 163
};

Shane Snyder's avatar
Shane Snyder committed
164
`begin_shutdown()`
165 166

This function informs the module that Darshan is about to begin shutting down. It should disable
Shane Snyder's avatar
Shane Snyder committed
167
all wrappers to prevent the module from making future updates to internal data structures, primarily
168
to ensure data consistency and avoid other race conditions.
169 170 171

`get_output_data()`

172 173 174 175 176 177 178 179 180 181
This function is responsible for packing all module I/O data into a single buffer to be written
to the output I/O characterization. This function can be used to run collective MPI operations on
module data; for instance, Darshan typically tries to reduce file records which are shared across
all application processes into a single data record (more details on the shared file reduction
mechanism are given in link:darshan-modularization.html#_shared_record_reductions[Section 5]).

* _mod_comm_ is the MPI communicator to use for collective communication

* _shared_recs_ is a list of Darshan record identifiers that are shared across all application
processes
182

183
* _shared_rec_count_ is the size of the shared record list
184

185 186 187
* _mod_buf_ is a pointer to the buffer of this module's I/O characterization data

* _mod_buf_sz_ is the size of the module's output buffer
188 189 190 191 192 193 194 195 196

`shutdown()`

This function is a signal from Darshan that it is safe to shutdown. It should clean up and free
all internal data structures.

==== darshan-core

Within darshan-runtime, the darshan-core component manages the initialization and shutdown of the
197 198 199
Darshan environment, provides an interface for modules to register themselves and their data
records with Darshan, and manages the compressing and the writing of the resultant I/O
characterization. As illustrated in Figure 1, the darshan-core runtime environment intercepts
200
`MPI_Init` and `MPI_Finalize` routines to initialize and shutdown the Darshan runtime environment,
201
respectively.
202

203
Each of the functions provided by `darshan-core` to interface with instrumentation modules are
204
described in detail below.
205 206 207 208 209

[source,c]
void darshan_core_register_module(
    darshan_module_id mod_id,
    struct darshan_module_funcs *funcs,
210
    int *my_rank,
Shane Snyder's avatar
Shane Snyder committed
211 212
    int *mod_mem_limit,
    int *sys_mem_alignment);
213 214

The `darshan_core_register_module` function registers Darshan instrumentation modules with the
215 216
`darshan-core` runtime environment. This function needs to be called once for any module that
will contribute data to Darshan's final I/O characterization. 
217 218

* _mod_id_ is a unique identifier for the given module, which is defined in the Darshan log
219 220 221 222
format header file (`darshan-log-format.h`).

* _funcs_ is the structure of function pointers (as described above in the previous section) that
a module developer must provide to interface with the darshan-core runtime. 
223

224
* _my_rank_ is a pointer to an integer to store the calling process's application MPI rank in
225

Shane Snyder's avatar
Shane Snyder committed
226
* _mod_mem_limit_ is a pointer to an integer which will store the amount of memory Darshan
227 228 229 230
allows this module to use at runtime. Darshan's default module memory limit is currently set to
2 MiB, but the user can choose a different value at configure time (using the `--with-mod-mem`
configure option) or at runtime (using the DARSHAN_MODMEM environment variable). Note that Darshan
does not allocate any memory for modules; it just informs a module how much memory it can use.
231

Shane Snyder's avatar
Shane Snyder committed
232 233 234 235
* _sys_mem_alignment_ is a pointer to an integer which will store the system memory alignment value
Darshan was configured with. This parameter may be set to `NULL` if a module is not concerned with the
memory alignment value.

236 237 238 239 240
[source,c]
void darshan_core_unregister_module(
    darshan_module_id mod_id);

The `darshan_core_unregister_module` function disassociates the given module from the
241
`darshan-core` runtime. Consequentially, Darshan does not interface with the given module at
242 243 244 245 246 247 248 249 250 251 252
shutdown time and will not log any I/O data from the module. This function should only be used
if a module registers itself with darshan-core but later decides it does not want to contribute
any I/O data.

* _mod_id_ is the unique identifer for the module being unregistered.

[source,c]
void darshan_core_register_record(
    void *name,
    int len,
    darshan_module_id mod_id,
253 254
    int printable_flag,
    int mod_limit_flag,
Shane Snyder's avatar
Shane Snyder committed
255 256
    darshan_record_id *rec_id,
    int *file_alignment);
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272

The `darshan_core_register_record` function registers some data record with the darshan-core
runtime. This record could reference a POSIX file or perhaps an object identifier for an
object storage system, for instance.  A unique identifier for the given record name is
generated by Darshan, which should then be used by the module for referencing the corresponding
record.  This allows multiple modules to refer to a specific data record in a consistent manner
and also provides a mechanism for mapping these records back to important metadata stored by
darshan-core. It is safe (and likely necessary) to call this function many times for the same
record -- darshan-core will just set the corresponding record identifier if the record has
been previously registered.

* _name_ is just the name of the data record, which could be a file path, object ID, etc.

* _len_ is the size of the input record name. For string record names, this would just be the
string length, but for nonprintable record names (e.g., an integer object identifier), this
is the size of the record name type.
273

274 275
* _mod_id_ is the identifier for the module attempting to register this record.

276 277
* _printable_flag_ indicates whether the input record name is a printable ASCII string.

278 279 280
* _mod_limit_flag_ indicates whether the calling module is out of memory to instrument new
records or not. If this flag is set, darshan-core will not create new records and instead just
search existing records for one corresponding to input _name_. 
281 282 283 284

* _rec_id_ is a pointer to a variable which will store the unique record identifier generated
by Darshan.

Shane Snyder's avatar
Shane Snyder committed
285 286 287 288
* _file_alignment_ is a pointer to an integer which will store the the file alignment (block size)
of the underlying storage system. This parameter may be set to `NULL` if it is not applicable to a
given module.

289 290 291 292 293
[source,c]
void darshan_core_unregister_record(
    darshan_record_id rec_id,
    darshan_module_id mod_id);

Shane Snyder's avatar
Shane Snyder committed
294
The `darshan_core_unregister_record` function disassociates the given module identifier from the
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311
given record identifier. If no other modules are associated with the given record identifier, then
Darshan removes all internal references to the record. This function should only be used if a
module registers a record with darshan-core, but later decides not to store the record internally.

* _rec_id_ is the record identifier we want to unregister.

* _mod_id_ is the module identifier that is unregistering _rec_id_.

[source,c]
double darshan_core_wtime(void);

The `darshan_core_wtime` function simply returns a floating point number of seconds since
Darshan was initialized. This functionality can be used to time the duration of application
I/O calls or to store timestamps of when functions of interest were called.

==== darshan-common

312
`darshan-common` is a utility component of darshan-runtime, providing module developers with
313 314 315 316 317 318 319 320 321 322
general functions that are likely to be reused across multiple modules. These functions are
distinct from darshan-core functions since they do not require access to internal Darshan
state.

[source,c]
char* darshan_clean_file_path(
    const char* path);

The `darshan_clean_file_path` function just cleans up the input path string, converting
relative paths to absolute paths and suppressing any potential noise within the string.
323
The address of the new string is returned and should be freed by the user.
324 325 326

* _path_ is the input path string to be cleaned up.

327 328 329
`darshan-common` also currently includes functions for maintaining counters that store
common I/O values (such as common I/O access sizes or strides used by an application),
as well as functions for calculating the variance of a given counter across all processes.
330 331 332 333 334
As more modules are contributed, it is likely that more functionality can be refactored out
of module implementations and maintained in darshan-common, facilitating code reuse and
simplifying maintenance.

=== Darshan-util
335

336 337 338 339 340 341 342 343 344 345 346 347 348 349
The darshan-util component is composed of a log parsing library (libdarshan-util) and a
corresponding set of utility programs that can parse and analyze Darshan I/O characterization
logs using this library. The log parsing library includes a generic interface (see
`darshan-logutils.h`) for retrieving specific portions of a given log file. Specifically,
this interface allows utilities to retrieve a log's header metadata, job details, record
identifier mapping, and any module-specific data contained within the log.

Module developers may wish to define additional interfaces for parsing module-specific data
that can then be integrated into the log parsing library. This extended functionality can be
implemented in terms of the generic functions offered by darshan-logutils and by module-specific
formatting information.

==== darshan-logutils

350 351
Here we define each function in the `darshan-logutils` interface, which can be used to create
new log utilities and to implement module-specific interfaces into log files.
352 353

[source,c]
354
darshan_fd darshan_log_open(const char *name);
355

356 357 358
Opens Darshan log file stored at path `name`. The log file must already exist and is opened
for reading only. As part of the open routine, the log file header is read to set internal
file descriptor data structures. Returns a Darshan file descriptor on success or `NULL` on error.
359 360

[source,c]
361
darshan_fd darshan_log_create(const char *name, enum darshan_comp_type comp_type, int partial_flag);
362

363 364 365 366
Creates a new darshan log file for writing only at path `name`. `comp_type` denotes the underlying
compression type used on the log file (currently either libz or bzip2) and `partial_flag`
denotes whether the log is storing partial data (that is, all possible application file records
were not tracked by darshan). Returns a Darshan file descriptor on success or `NULL` on error.
367 368 369

[source,c]
int darshan_log_getjob(darshan_fd fd, struct darshan_job *job);
370
int darshan_log_putjob(darshan_fd fd, struct darshan_job *job);
371

372
Reads/writes `job` structure from/to the log file referenced by descriptor `fd`. The `darshan_job`
373 374 375 376
structure is defined in `darshan-log-format.h`. Returns `0` on success, `-1` on failure.

[source,c]
int darshan_log_getexe(darshan_fd fd, char *buf);
377
int darshan_log_putexe(darshan_fd fd, char *buf);
378

379 380
Reads/writes the corresponding executable string (exe name and command line arguments)
from/to the Darshan log referenced by `fd`. Returns `0` on success, `-1` on failure.
381 382 383

[source,c]
int darshan_log_getmounts(darshan_fd fd, char*** mnt_pts, char*** fs_types, int* count);
384
int darshan_log_putmounts(darshan_fd fd, char** mnt_pts, char** fs_types, int count);
385

386
Reads/writes mounted file system information for the Darshan log referenced by `fd`. `mnt_pnts` points
387 388 389 390 391 392
to an array of strings storing mount points, `fs_types` points to an array of strings storing file
system types (e.g., ext4, nfs, etc.), and `count` points to an integer storing the total number
of mounted file systems recorded by Darshan. Returns `0` on success, `-1` on failure.

[source,c]
int darshan_log_gethash(darshan_fd fd, struct darshan_record_ref **hash);
393
int darshan_log_puthash(darshan_fd fd, struct darshan_record_ref *hash);
394

395 396 397 398
Reads/writes the hash table of Darshan record identifiers to full names for all records
contained in the Darshan log referenced by `fd`. `hash` is a pointer to the hash table (of type
struct darshan_record_ref *, which should be initialized to `NULL` for reading). This hash table
is defined by the `uthash` hash table implementation and includes corresponding macros for
399 400
searching, iterating, and deleting records from the hash. For detailed documentation on using this
hash table, consult `uthash` documentation in `darshan-util/uthash-1.9.2/doc/txt/userguide.txt`.
401 402
The `darshan-parser` utility (for parsing module information out of a Darshan log) provides an
example of how this hash table may be used. Returns `0` on success, `-1` on failure.
403 404

[source,c]
405 406
int darshan_log_getmod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz);
int darshan_log_putmod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz);
407

408 409 410 411 412 413 414
Reads/writes a chunk of (uncompressed) module data for the module identified by `mod_id` from/to
the Darshan log referenced by `fd`. `mod_buf_sz` specifies the number of uncompressed bytes to
read/write from/to the file and store in `mod_buf`. The `darshan_log_getmod` routine can be
repeatedly called to retrieve chunks of uncompressed data from a specific module region of the
log file given by `fd`. The `darshan_log_putmod` routine just continually appends data to a
specific module region in the log file given by `fd`. This function returns the number of bytes
read/written on success, `-1` on failure.
415

416 417 418 419 420 421
*NOTE*: Darshan use a reader makes right conversion strategy to rectify endianness issues
between the machine a log was generated on and a machine analyzing the log. Accordingly,
module-specific log utility functions will need to check the `swap_flag` variable of the Darshan
file descriptor to determine if byte swapping is necessary. 32-bit and 64-bit byte swapping
macros (DARSHAN_BSWAP32/DARSHAN_BSWAP64) are provided in `darshan-logutils.h`.

422 423 424
[source,c]
void darshan_log_close(darshan_fd fd);

425 426 427 428 429 430 431 432
Close Darshan file descriptor `fd`. This routine *must* be called for newly created log files,
as it flushes pending writes and writes a corresponding log file header before closing.

*NOTE*: For newly created Darshan log files, care must be taken to write log file data in the
correct order, since the log file write routines basically are appending data to the log file.
The correct order for writing all log file data to file is: (1) job data, (2) exe string, (3)
mount data, (4) record id -> file name map, (5) each module's data, in increasing order of
module identifiers.
433 434 435

== Adding new instrumentation modules

436 437 438 439 440 441 442
In this section we outline each step necessary for adding a module to Darshan. To assist module
developers, we have provided the example "NULL" module as part of the Darshan source tree
(`darshan-null-log-format.h`, `darshan-runtime/lib/darshan-null.c`, and
`darshan-util/darshan-null-logutils.*`) This example can be used as a minimal stubbed out module
implementation that is heavily annotated to further clarify how modules interact with Darshan
and to provide best practices to future module developers. For full-fledged module implementation
examples, developers are encouraged to examine the POSIX and MPI-IO modules.
443 444 445 446 447 448

=== Log format headers

The following modifications to Darshan log format headers are required for defining
the module's record structure:

449 450
* Add a module identifier to the `DARSHAN_MODULE_IDS` macro at the top of the `darshan-log-format.h`
header. In this macro, the first field is a corresponding enum value that can be used to
451 452 453 454
identify the module, the second field is a string name for the module, the third field is the
current version number of the given module's log format, and the fourth field is a corresponding
pointer to a Darshan log utility implementation for this module (which can be set to `NULL`
until the module has its own log utility implementation). 
455

Shane Snyder's avatar
Shane Snyder committed
456 457 458 459 460
* Add a top-level header that defines an I/O data record structure for the module. Consider
the "NULL" module and POSIX module log format headers for examples (`darshan-null-log-format.h`
and `darshan-posix-log-format.h`, respectively).

These log format headers are defined at the top level of the Darshan source tree, since both the
461
darshan-runtime and darshan-util repositories depend on their definitions.
462 463 464 465 466 467 468 469

=== Darshan-runtime

==== Build modifications

The following modifications to the darshan-runtime build system are necessary to integrate
new instrumentation modules:

470
* Necessary linker flags for inserting this module's wrapper functions need to be added to a
Shane Snyder's avatar
Shane Snyder committed
471 472 473 474
module-specific file which is used when linking applications with Darshan. For an example,
consider `darshan-runtime/darshan-posix-ld-opts`, the required linker options for the POSIX
module. The base linker options file for Darshan (`darshan-runtime/darshan-base-ld-opts.in`)
must also be updated to point to the new module-specific linker options file.
475 476

* Targets must be added to `Makefile.in` to build static and shared objects for the module's
Shane Snyder's avatar
Shane Snyder committed
477 478 479
source files, which will be stored in the `darshan-runtime/lib/` directory. The prerequisites
to building static and dynamic versions of `libdarshan` must be updated to include these objects,
as well.
480
    - If the module defines a linker options file, a rule must also be added to install this
Shane Snyder's avatar
Shane Snyder committed
481
      file with libdarshan.
482 483 484

==== Instrumentation module implementation

Shane Snyder's avatar
Shane Snyder committed
485
In addtion to the development notes from above and the exemplar "NULL" and POSIX modules, we
486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502
provide the following notes to assist module developers:

* Modules only need to include the `darshan.h` header to interface with darshan-core.

* The file record identifier given when registering a record with darshan-core can be used
to store the record structure in a hash table or some other structure.
    - The `darshan_core_register_record` function is really more like a lookup function. It
    may be called multiple times for the same record -- if the record already exists, the function
    simply returns its record ID.
    - It may be necessary to maintain a separate hash table for other handles which the module
    may use to refer to a given record. For instance, the POSIX module may need to look up a
    file record based on a given file descriptor, rather than a path name.

=== Darshan-util

==== Build modifications

503 504 505
The following modifications to the darshan-util build system are necessary to integrate new
instrumentation modules:

506 507 508 509
* Update `Makefile.in` with new targets necessary for building module-specific logutil source.
    - Make sure to add the module's logutil implementation objects as a prerequisite for
building `libdarshan-util`. 
    - Make sure to update `all`, `clean`, and `install` rules to reference updates.
510 511 512

==== Module-specific logutils and utilities

513 514 515 516 517
For a straightforward reference implementation of module-specific log utility functions,
consider the implementations for the NULL module (`darshan-util/darshan-null-logutils.*`)
and the POSIX module (`darshan-util/darshan-posix-logutils.*`). These module-specific log
utility implementations are built on top of the `darshan_log_getmod()` and `darshan_log_putmod()`
functions, and are used to read/write complete module records from/to file.
518

519 520 521
Also, consider the `darshan-parser` source code for an example of a utility which can leverage
`libdarshan-util` for analyzing the contents of a Darshan I/O characterization log with data
from arbitrary instrumentation modules.
522

Shane Snyder's avatar
Shane Snyder committed
523 524 525 526 527 528
== Shared record reductions

Since Darshan perfers to aggregate data records which are shared across all processes into a single
data record, module developers should consider implementing this functionality eventually, though it
is not strictly required. 

529 530 531 532
Module developers should implement the shared record reduction mechanism within the module's
`get_output_data()` function, as it provides an MPI communicator for the module to use for
collective communication and a list of record identifiers which are shared globally by the
module (as described in link:darshan-modularization.html#_darshan_runtime[Section 3.1]).
Shane Snyder's avatar
Shane Snyder committed
533

534
In general, implementing a shared record reduction involves the following steps:
Shane Snyder's avatar
Shane Snyder committed
535

536 537 538
* reorganizing shared records into a contiguous region in the buffer of module records

* allocating a record buffer to store the reduction output on application rank 0
Shane Snyder's avatar
Shane Snyder committed
539

540 541
* creating an MPI reduction operation using the `MPI_Op_create()` function (see more
http://www.mpich.org/static/docs/v3.1/www3/MPI_Op_create.html[here])
Shane Snyder's avatar
Shane Snyder committed
542

543 544
* reducing all shared records using the created MPI reduction operation and the send
and receive buffers described above
Shane Snyder's avatar
Shane Snyder committed
545

546 547
For a more in-depth example of how to use the shared record reduction mechanism, consider
the implementations of this in the POSIX or MPI-IO modules.
Shane Snyder's avatar
Shane Snyder committed
548

549 550
== Other resources

551 552
* https://xgitlab.cels.anl.gov/darshan/darshan[Darshan GitLab page]
* http://www.mcs.anl.gov/research/projects/darshan/[Darshan project website]
553 554
* http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-runtime.html[darshan-runtime documentation]
* http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html[darshan-util documentation]