darshan-modularization.txt 29.1 KB
Newer Older
1
:data-uri:
Shane Snyder's avatar
Shane Snyder committed
2

3
4
Modularized I/O characterization using Darshan 3.x
==================================================
5
6
7

== Introduction

8
9
Darshan is a lightweight toolkit for characterizing the I/O performance of instrumented
HPC applications.
10

11
12
13
14
15
16
17
Starting with version 3.0.0, the Darshan runtime environment and log file format have
been redesigned such that new "instrumentation modules" can be added without breaking
existing tools. Developers are given a framework to implement arbitrary instrumentation
modules, which are responsible for gathering I/O data from a specific system component
(which could be from an I/O library, platform-specific data, etc.). Darshan can then
manage these modules at runtime and create a valid Darshan log regardless of how many
or what types of modules are used.
18

19
== Overview of Darshan's modularized architecture
20

21
The Darshan source tree is organized into two primary components:
22

23
* *darshan-runtime*: Darshan runtime framework necessary for instrumenting MPI
24
25
26
27
28
applications and generating I/O characterization logs.

* *darshan-util*: Darshan utilities for analyzing the contents of a given Darshan
I/O characterization log.

29
30
31
32
33
The following subsections provide detailed overviews of each of these components to
give a better understanding of the architecture of the modularized version of Darshan.
In link:darshan-modularization.html#_adding_new_instrumentation_modules[Section 4], we
actually outline the necessary steps for integrating new instrumentation modules into
Darshan.
34

35
=== Darshan-runtime
36

37
The primary responsibilities of the darshan-runtime component are:
38

39
* intercepting I/O functions of interest from a target application;
40

41
* extracting statistics, timing information, and other data characterizing the application's I/O workload;
42

43
* compressing I/O characterization data and corresponding metadata;
44

45
* logging the compressed I/O characterization to file for future evaluation
46

Shane Snyder's avatar
Shane Snyder committed
47
48
49
50
51
52
53
54
55
56
The first two responsibilities are the burden of module developers, while the last two are handled
automatically by Darshan.

In general, instrumentation modules are composed of:

* wrapper functions for intercepting I/O functions;

* internal functions for initializing and maintaining internal data structures and module-specific
  I/O characterization data;

Shane Snyder's avatar
Shane Snyder committed
57
* a set of functions for interfacing with the Darshan runtime environment
58

59
60
61
62
63
64
A block diagram illustrating the interaction of an example POSIX instrumentation module and the
Darshan runtime environment is given below in Figure 1.

.Darshan runtime environment
image::darshan-dev-modular-runtime.png[align="center"]

Shane Snyder's avatar
Shane Snyder committed
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
As shown in Figure 1, the Darshan runtime environment is just a library (libdarshan) which
intercepts and instruments functions of interest made by an application to existing system
libraries. Two primary components of this library are `darshan-core` and `darshan-common`.
`darshan-core` is the central component which manages the initialization/shutdown of Darshan,
coordinates with active instrumentation modules, and writes I/O characterization logs to disk,
among other things. `darshan-core` intercepts `MPI_Init()` to initialize key internal data
stuctures and intercepts `MPI_Finalize()` to initiate Darshan's shutdown process. `darshan-common`
simply provides module developers with functionality that is likely to be reused across modules
to minimize development and maintenance costs. Instrumentation modules must utilize `darshan-core`
to register themselves and corresponding I/O records with Darshan so they can be added to the
output I/O characterization. While not shown in Figure 1, numerous modules can be registered
with Darshan at any given time and Darshan is capable of correlating records between these
modules.

In the next three subsections, we describe instrumentation modules, the `darshan-core` component,
80
and the `darshan-common` component in more detail.
Shane Snyder's avatar
Shane Snyder committed
81

82
83
==== Instrumentation modules

84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
The new modularized version of Darshan allows for the generation of I/O characterizations
composed from numerous instrumentation modules, where an instrumentation module is simply a
Darshan component responsible for capturing I/O data from some arbitrary source. For example,
distinct instrumentation modules may be defined for different I/O interfaces or to gather
system-specific I/O parameters from a given computing system. Each instrumentation module
interfaces with the `darshan-core` component to coordinate its initialization and shutdown
and to provide output I/O characterization data to be written to log.

In general, there are two different methods an instrumentation module can use to initialize
itself: static initialization at Darshan startup time or dynamic initialization within
intercepted function calls during application execution. The initialization process should
initialize module-specific data structures and register the module with the `darshan-core`
component so it is included in the output I/O characterization.

The static initialization approach is useful for modules that do not have function calls
that can be intercepted and instead can just grab all I/O characterization data at Darshan
startup or shutdown time. A module can be statically initialized at Darshan startup time
101
by adding its initializatin routine to the `mod_static_init_fns` array at the top of the
102
`lib/darshan-core.c` source file.
103

104
105
106
107
*NOTE*: Modules may wish to add a corresponding configure option to disable the module
from attempting to gather I/O data. The ability to disable a module using a configure
option is especially necessary for system-specific modules which can not be built or
used on other systems.
108
109
110

Most instrumentation modules can just bootstrap themselves within wrapper functions during
normal application execution. Each of Darshan's current I/O library instrumentation modules
111
(POSIX, MPI-IO, stdio, HDF5, PnetCDF) follow this approach. Each wrapper function should just include
112
113
114
logic to initialize data structures and register with `darshan-core` if this initialization
has not already occurred. Darshan intercepts function calls of interest by inserting these
wrappers at compile time for statically linked executables (e.g., using the linkers
Shane Snyder's avatar
Shane Snyder committed
115
`--wrap` mechanism) and at runtime for dynamically linked executables (using LD_PRELOAD).
116
117
118
119

*NOTE*: Modules should not perform any I/O or communication within wrapper functions. Darshan records
I/O data independently on each application process, then merges the data from all processes when the
job is shutting down. This defers expensive I/O and communication operations to the shutdown process,
Shane Snyder's avatar
Shane Snyder committed
120
minimizing Darshan's impact on application I/O performance.
121
122
123

When the instrumented application terminates and Darshan begins its shutdown procedure, it requires
a way to interface with any active modules that have data to contribute to the output I/O characterization.
124
125
The following function is implemented by each module to finalize (and perhaps reorganize) module records
before returning the record memory back to darshan-core to be compressed and written to file.
126
127

[source,c]
128
129
130
131
132
133
134
135
136
137
138
139
140
typedef void (*darshan_module_shutdown)(
    MPI_Comm mod_comm,
    darshan_record_id *shared_recs,
    int shared_rec_count,
    void** mod_buf,
    int* mod_buf_sz
);

This function can be used to run collective MPI operations on module data; for instance, Darshan
typically tries to reduce file records which are shared across all application processes into a
single data record (more details on the shared file reduction mechanism are given in
link:darshan-modularization.html#_shared_record_reductions[Section 5]). This function also serves
as a final opportunity for modules to cleanup and free any allocated data structures, etc.
141
142
143
144
145

* _mod_comm_ is the MPI communicator to use for collective communication

* _shared_recs_ is a list of Darshan record identifiers that are shared across all application
processes
146

147
* _shared_rec_count_ is the size of the shared record list
148

149
* _mod_buf_ is a pointer to the buffer address of the module's contiguous set of data records
150

151
152
153
* _mod_buf_sz_ is a pointer to a variable storing the aggregate size of the module's records. On
input, the pointed to value indicates the aggregate size of the module's registered records; on
ouptut, the value may be updated if, for instance, certain records are discarded
154
155
156
157

==== darshan-core

Within darshan-runtime, the darshan-core component manages the initialization and shutdown of the
158
159
160
Darshan environment, provides an interface for modules to register themselves and their data
records with Darshan, and manages the compressing and the writing of the resultant I/O
characterization. As illustrated in Figure 1, the darshan-core runtime environment intercepts
161
`MPI_Init` and `MPI_Finalize` routines to initialize and shutdown the Darshan runtime environment,
162
respectively.
163

164
Each of the functions provided by `darshan-core` to interface with instrumentation modules are
165
described in detail below.
166
167
168
169

[source,c]
void darshan_core_register_module(
    darshan_module_id mod_id,
170
    darshan_module_shutdown mod_shutdown_func,
Shane Snyder's avatar
Shane Snyder committed
171
    int *mod_mem_limit,
172
    int *rank,
Shane Snyder's avatar
Shane Snyder committed
173
    int *sys_mem_alignment);
174
175

The `darshan_core_register_module` function registers Darshan instrumentation modules with the
176
177
`darshan-core` runtime environment. This function needs to be called once for any module that
will contribute data to Darshan's final I/O characterization. 
178
179

* _mod_id_ is a unique identifier for the given module, which is defined in the Darshan log
180
181
format header file (`darshan-log-format.h`).

182
183
* _mod_shutdown_func_ is the function pointer to the module shutdown function described in the
previous section.
184

185
186
187
* _inout_mod_buf_size_ is an input/output argument that stores the amount of module memory
being requested when calling the function and the amount of memory actually reserved by
darshan-core when returning.
188

189
190
* _rank_ is a pointer to an integer to store the calling process's application MPI rank in.
`NULL` may be passed in to ignore this value.
191

Shane Snyder's avatar
Shane Snyder committed
192
* _sys_mem_alignment_ is a pointer to an integer which will store the system memory alignment value
193
Darshan was configured with. `NULL` may be passed in to ignore this value.
Shane Snyder's avatar
Shane Snyder committed
194

195
196
197
198
199
[source,c]
void darshan_core_unregister_module(
    darshan_module_id mod_id);

The `darshan_core_unregister_module` function disassociates the given module from the
200
`darshan-core` runtime. Consequentially, Darshan does not interface with the given module at
201
202
shutdown time and will not log any I/O data from the module. This function should only be used
if a module registers itself with darshan-core but later decides it does not want to contribute
203
204
any I/O data. Note that, in the current implementation, Darshan does not have the ability to
reclaim the record memory allocated to the calling module to assign to other modules.
205
206
207
208

* _mod_id_ is the unique identifer for the module being unregistered.

[source,c]
209
210
darshan_record_id darshan_core_gen_record_id(
    const char *name);
211

212
213
214
215
216
The `darshan_core_gen_record_id` function simply generates a unique record identifier for a
given record name. This function is generally called to convert a name string to a unique record
identifier that is needed to register a data record with darshan-core. The generation of IDs
is consistent, such that modules which reference records with the same names will store these
records using the same unique IDs, simplifying the correlation of these records for analysis.
217

218
* _name_ is the name of the corresponding data record (often times this is just a file name).
219

220
221
222
223
224
225
226
[source,c]
void *darshan_core_register_record(
    darshan_record_id rec_id,
    const char *name,
    darshan_module_id mod_id,
    int rec_len,
    int *fs_info);
227

228
229
230
231
232
233
234
The `darshan_core_register_record` function registers a data record with the darshan-core
runtime, allocating memory for the record so that it is persisted in the output log file.
This record could reference a POSIX file or perhaps an object identifier for an
object storage system, for instance. This function should only be called once for each
record being tracked by a module to avoid duplicating record memory. This function returns
the address which the record should be stored at or `NULL` if there is insufficient
memory for storing the record.
235

236
237
* _rec_id_ is a unique integer identifier for this record (generally generated using the
`darshan_core_gen_record_id` function).
Shane Snyder's avatar
Shane Snyder committed
238

239
240
241
242
* _name_ is the string name of the data record, which could be a file path, object ID, etc.
If given, darshan-core will associate the given name with the record identifier and store
this mapping in the log file so it can be retrieved for analysis. `NULL` may be passed in
to generate an anonymous (unnamed) record.
243

244
* _mod_id_ is the identifier for the module attempting to register this record.
245

246
* _rec_len_ is the length of the record.
247

248
249
250
251
252
* _fs_info_ is a pointer to a structure of relevant info for the file system associated
with the given record -- this structure is defined in the `darshan.h` header. Note that this
functionality only works for record names that are absolute file paths, since we determine
the file system by matching the file path to the list of mount points Darshan is aware of.
`NULL` may be passed in to ignore this value.
253
254
255
256
257
258
259
260

[source,c]
double darshan_core_wtime(void);

The `darshan_core_wtime` function simply returns a floating point number of seconds since
Darshan was initialized. This functionality can be used to time the duration of application
I/O calls or to store timestamps of when functions of interest were called.

261
262
263
264
265
266
267
268
269
270
[source,c]
double darshan_core_excluded_path(
    const char *path);

The `darshan_core_excluded_path` function checks to see if a given file path is in Darshan's
list of excluded file paths (i.e., paths that we don't instrument I/O to/from, such as /etc,
/dev, /usr, etc.).

* _path_ is the absolute file path we are checking.

271
272
==== darshan-common

273
`darshan-common` is a utility component of darshan-runtime, providing module developers with
274
275
276
277
278
279
280
281
282
283
general functions that are likely to be reused across multiple modules. These functions are
distinct from darshan-core functions since they do not require access to internal Darshan
state.

[source,c]
char* darshan_clean_file_path(
    const char* path);

The `darshan_clean_file_path` function just cleans up the input path string, converting
relative paths to absolute paths and suppressing any potential noise within the string.
284
The address of the new string is returned and should be freed by the user.
285
286
287

* _path_ is the input path string to be cleaned up.

288
289
290
`darshan-common` also currently includes functions for maintaining counters that store
common I/O values (such as common I/O access sizes or strides used by an application),
as well as functions for calculating the variance of a given counter across all processes.
291
292
293
294
295
As more modules are contributed, it is likely that more functionality can be refactored out
of module implementations and maintained in darshan-common, facilitating code reuse and
simplifying maintenance.

=== Darshan-util
296

297
298
299
300
301
302
303
304
305
306
307
308
309
310
The darshan-util component is composed of a helper library for accessing log file data
records (`libdarshan-util`) and a set of utilities that use this library to analyze
application I/O behavior. `libdarhan-util` includes a generic interface (`darshan-logutils`)
for retrieving specific components of a given log file. Specifically, this interface allows
utilities to retrieve a log's header metadata, job details, record ID to name mapping, and
any module-specific data contained within the log.

`libdarshan-util` additionally includes the definition of a generic module interface (`darshan-mod-logutils`)
that may be implemented by modules to provide a consistent way for Darshan utilities to interact
with module data stored in log files. This interface is necessary since each module has records
of varying size and format, so module-specific code is needed to interact with the records in a
generic manner. This interface is used by the `darshan-parser` utility, for instance, to extract
data records from all modules contained in a log file and to print these records in a consistent
format that is amenable to further analysis by other tools.
311
312
313

==== darshan-logutils

314
315
Here we define each function in the `darshan-logutils` interface, which can be used to create
new log utilities and to implement module-specific interfaces into log files.
316
317

[source,c]
318
darshan_fd darshan_log_open(const char *name);
319

320
321
322
Opens Darshan log file stored at path `name`. The log file must already exist and is opened
for reading only. As part of the open routine, the log file header is read to set internal
file descriptor data structures. Returns a Darshan file descriptor on success or `NULL` on error.
323
324

[source,c]
325
darshan_fd darshan_log_create(const char *name, enum darshan_comp_type comp_type, int partial_flag);
326

327
328
329
330
Creates a new darshan log file for writing only at path `name`. `comp_type` denotes the underlying
compression type used on the log file (currently either libz or bzip2) and `partial_flag`
denotes whether the log is storing partial data (that is, all possible application file records
were not tracked by darshan). Returns a Darshan file descriptor on success or `NULL` on error.
331
332

[source,c]
333
334
int darshan_log_get_job(darshan_fd fd, struct darshan_job *job);
int darshan_log_put_job(darshan_fd fd, struct darshan_job *job);
335

336
Reads/writes `job` structure from/to the log file referenced by descriptor `fd`. The `darshan_job`
337
338
339
structure is defined in `darshan-log-format.h`. Returns `0` on success, `-1` on failure.

[source,c]
340
341
int darshan_log_get_exe(darshan_fd fd, char *buf);
int darshan_log_put_exe(darshan_fd fd, char *buf);
342

343
344
Reads/writes the corresponding executable string (exe name and command line arguments)
from/to the Darshan log referenced by `fd`. Returns `0` on success, `-1` on failure.
345
346

[source,c]
347
348
int darshan_log_get_mounts(darshan_fd fd, char*** mnt_pts, char*** fs_types, int* count);
int darshan_log_put_mounts(darshan_fd fd, char** mnt_pts, char** fs_types, int count);
349

350
Reads/writes mounted file system information for the Darshan log referenced by `fd`. `mnt_pnts` points
351
352
353
354
355
to an array of strings storing mount points, `fs_types` points to an array of strings storing file
system types (e.g., ext4, nfs, etc.), and `count` points to an integer storing the total number
of mounted file systems recorded by Darshan. Returns `0` on success, `-1` on failure.

[source,c]
356
357
int darshan_log_get_namehash(darshan_fd fd, struct darshan_name_record_ref **hash);
int darshan_log_put_namehash(darshan_fd fd, struct darshan_name_record_ref *hash);
358

359
360
Reads/writes the hash table of Darshan record identifiers to full names for all records
contained in the Darshan log referenced by `fd`. `hash` is a pointer to the hash table (of type
361
struct darshan_name_record_ref *), which should be initialized to `NULL` for reading. This hash table
362
is defined by the `uthash` hash table implementation and includes corresponding macros for
363
364
searching, iterating, and deleting records from the hash. For detailed documentation on using this
hash table, consult `uthash` documentation in `darshan-util/uthash-1.9.2/doc/txt/userguide.txt`.
365
366
The `darshan-parser` utility (for parsing module information out of a Darshan log) provides an
example of how this hash table may be used. Returns `0` on success, `-1` on failure.
367
368

[source,c]
369
370
int darshan_log_get_mod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz);
int darshan_log_put_mod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz, int ver);
371

372
Reads/writes a chunk of (uncompressed) module data for the module identified by `mod_id` from/to
373
374
the Darshan log referenced by `fd`. `mod_buf` is the buffer to read data into or write data from,
and `mod_buf_sz` is the corresponding size of the buffer. The `darshan_log_getmod` routine can be
375
376
repeatedly called to retrieve chunks of uncompressed data from a specific module region of the
log file given by `fd`. The `darshan_log_putmod` routine just continually appends data to a
377
378
379
specific module region in the log file given by `fd` and accepts an additional `ver` parameter
indicating the version number for the module data records being written. These functions return
the number of bytes read/written on success, `-1` on failure.
380

381
*NOTE*: Darshan use a "reader makes right" conversion strategy to rectify endianness issues
382
383
384
385
386
between the machine a log was generated on and a machine analyzing the log. Accordingly,
module-specific log utility functions will need to check the `swap_flag` variable of the Darshan
file descriptor to determine if byte swapping is necessary. 32-bit and 64-bit byte swapping
macros (DARSHAN_BSWAP32/DARSHAN_BSWAP64) are provided in `darshan-logutils.h`.

387
388
389
[source,c]
void darshan_log_close(darshan_fd fd);

390
391
392
393
394
395
396
397
Close Darshan file descriptor `fd`. This routine *must* be called for newly created log files,
as it flushes pending writes and writes a corresponding log file header before closing.

*NOTE*: For newly created Darshan log files, care must be taken to write log file data in the
correct order, since the log file write routines basically are appending data to the log file.
The correct order for writing all log file data to file is: (1) job data, (2) exe string, (3)
mount data, (4) record id -> file name map, (5) each module's data, in increasing order of
module identifiers.
398

399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
==== darshan-mod-logutils

The `darshan-mod-logutils` interface provides a convenient way to implement new log functionality
across all Darshan instrumentation modules, which can potentially greatly simplify the developent
of new Darshan log utilies. These functions are defined in the `darshan_mod_logutil_funcs` structure
in `darshan-logutils.h` -- instrumentation modules simply provide their own implementation of each
function, then utilities can leverage this functionality using the `mod_logutils` array defined in
`darshan-logutils.c`. A description of some of the currently implemented functions are provided below.

[source,c]
int log_get_record(darshan_fd fd, void **buf);
int log_put_record(darshan_fd fd, void *buf);

Reads/writes the module record stored in `buf` to the log referenced by `fd`. Notice that a
size parameter is not needed since the utilities calling this interface will likely not know
the record size -- the module-specific log utility code can determine the corresponding size
before reading/writing the record from/to file.

*NOTE*: `log_get_record` takes a pointer to a buffer address rather than just the buffer address.
If the pointed to address is equal to `NULL`, then record memory should be allocated instead. This
functionality helps optimize memory usage, since utilities often don't know the size of records
being accessed but still must provide a buffer to read them into.

[source,c]
void log_print_record(void *rec, char *name, char *mnt_pt, char *fs_type);

Prints all data associated with the record pointed to by `rec`. `name` holds the corresponding name
string for this record. `mnt_pt` and `fs_type` hold the corresponding mount point path and file
system type strings associated with the record (only valid for records with names that are absolute
file paths).

[source,c]
void log_print_description(int ver);

Prints a description of the data stored within records for this module (with version number `ver`).

435
436
== Adding new instrumentation modules

437
438
439
440
441
442
443
In this section we outline each step necessary for adding a module to Darshan. To assist module
developers, we have provided the example "NULL" module as part of the Darshan source tree
(`darshan-null-log-format.h`, `darshan-runtime/lib/darshan-null.c`, and
`darshan-util/darshan-null-logutils.*`) This example can be used as a minimal stubbed out module
implementation that is heavily annotated to further clarify how modules interact with Darshan
and to provide best practices to future module developers. For full-fledged module implementation
examples, developers are encouraged to examine the POSIX and MPI-IO modules.
444
445
446
447
448
449

=== Log format headers

The following modifications to Darshan log format headers are required for defining
the module's record structure:

450
451
* Add a module identifier to the `DARSHAN_MODULE_IDS` macro at the top of the `darshan-log-format.h`
header. In this macro, the first field is a corresponding enum value that can be used to
452
453
454
455
identify the module, the second field is a string name for the module, the third field is the
current version number of the given module's log format, and the fourth field is a corresponding
pointer to a Darshan log utility implementation for this module (which can be set to `NULL`
until the module has its own log utility implementation). 
456

Shane Snyder's avatar
Shane Snyder committed
457
458
459
460
461
* Add a top-level header that defines an I/O data record structure for the module. Consider
the "NULL" module and POSIX module log format headers for examples (`darshan-null-log-format.h`
and `darshan-posix-log-format.h`, respectively).

These log format headers are defined at the top level of the Darshan source tree, since both the
462
darshan-runtime and darshan-util repositories depend on their definitions.
463
464
465
466
467
468
469
470

=== Darshan-runtime

==== Build modifications

The following modifications to the darshan-runtime build system are necessary to integrate
new instrumentation modules:

471
* Necessary linker flags for inserting this module's wrapper functions need to be added to a
Shane Snyder's avatar
Shane Snyder committed
472
module-specific file which is used when linking applications with Darshan. For an example,
Rob Latham's avatar
Rob Latham committed
473
consider `darshan-runtime/share/ld-opts/darshan-posix-ld-opts`, the required linker options for the POSIX
Shane Snyder's avatar
Shane Snyder committed
474
module. The base linker options file for Darshan (`darshan-runtime/share/ld-opts/darshan-base-ld-opts.in`)
Shane Snyder's avatar
Shane Snyder committed
475
must also be updated to point to the new module-specific linker options file.
476
477

* Targets must be added to `Makefile.in` to build static and shared objects for the module's
Shane Snyder's avatar
Shane Snyder committed
478
479
480
source files, which will be stored in the `darshan-runtime/lib/` directory. The prerequisites
to building static and dynamic versions of `libdarshan` must be updated to include these objects,
as well.
481
    - If the module defines a linker options file, a rule must also be added to install this
Shane Snyder's avatar
Shane Snyder committed
482
      file with libdarshan.
483
484
485

==== Instrumentation module implementation

Shane Snyder's avatar
Shane Snyder committed
486
In addtion to the development notes from above and the exemplar "NULL" and POSIX modules, we
487
488
489
490
provide the following notes to assist module developers:

* Modules only need to include the `darshan.h` header to interface with darshan-core.

491
* The file record identifier given when registering a record with darshan-core should be used
492
to store the record structure in a hash table or some other structure.
493
494
    - Subsequent calls that need to modify this record can then use the corresponding record
    identifier to lookup the record in this local hash table.
495
496
497
498
499
500
501
502
    - It may be necessary to maintain a separate hash table for other handles which the module
    may use to refer to a given record. For instance, the POSIX module may need to look up a
    file record based on a given file descriptor, rather than a path name.

=== Darshan-util

==== Build modifications

503
504
505
The following modifications to the darshan-util build system are necessary to integrate new
instrumentation modules:

506
507
508
509
* Update `Makefile.in` with new targets necessary for building module-specific logutil source.
    - Make sure to add the module's logutil implementation objects as a prerequisite for
building `libdarshan-util`. 
    - Make sure to update `all`, `clean`, and `install` rules to reference updates.
510
511
512

==== Module-specific logutils and utilities

513
514
515
516
517
For a straightforward reference implementation of module-specific log utility functions,
consider the implementations for the NULL module (`darshan-util/darshan-null-logutils.*`)
and the POSIX module (`darshan-util/darshan-posix-logutils.*`). These module-specific log
utility implementations are built on top of the `darshan_log_getmod()` and `darshan_log_putmod()`
functions, and are used to read/write complete module records from/to file.
518

519
520
521
Also, consider the `darshan-parser` source code for an example of a utility which can leverage
`libdarshan-util` for analyzing the contents of a Darshan I/O characterization log with data
from arbitrary instrumentation modules.
522

Shane Snyder's avatar
Shane Snyder committed
523
524
525
526
527
528
== Shared record reductions

Since Darshan perfers to aggregate data records which are shared across all processes into a single
data record, module developers should consider implementing this functionality eventually, though it
is not strictly required. 

529
Module developers should implement the shared record reduction mechanism within the module's
530
531
`darshan_module_shutdown()` function, as it provides an MPI communicator for the module to use
for collective communication and a list of record identifiers which are shared globally by the
532
module (as described in link:darshan-modularization.html#_darshan_runtime[Section 3.1]).
Shane Snyder's avatar
Shane Snyder committed
533

534
In general, implementing a shared record reduction involves the following steps:
Shane Snyder's avatar
Shane Snyder committed
535

536
537
538
* reorganizing shared records into a contiguous region in the buffer of module records

* allocating a record buffer to store the reduction output on application rank 0
Shane Snyder's avatar
Shane Snyder committed
539

540
541
* creating an MPI reduction operation using the `MPI_Op_create()` function (see more
http://www.mpich.org/static/docs/v3.1/www3/MPI_Op_create.html[here])
Shane Snyder's avatar
Shane Snyder committed
542

543
544
* reducing all shared records using the created MPI reduction operation and the send
and receive buffers described above
Shane Snyder's avatar
Shane Snyder committed
545

546
547
For a more in-depth example of how to use the shared record reduction mechanism, consider
the implementations of this in the POSIX or MPI-IO modules.
Shane Snyder's avatar
Shane Snyder committed
548

549
550
== Other resources

551
552
* https://xgitlab.cels.anl.gov/darshan/darshan[Darshan GitLab page]
* http://www.mcs.anl.gov/research/projects/darshan/[Darshan project website]
553
554
* http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-runtime.html[darshan-runtime documentation]
* http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html[darshan-util documentation]