Commit 45111402 authored by Shane Snyder's avatar Shane Snyder
Browse files

updated dev-modular docs

parent 5ecf4013
:sectlinks:
Darshan modularization branch development notes Darshan modularization branch development notes
=============================================== ===============================================
...@@ -47,8 +49,6 @@ applications and generating I/O characterization logs. ...@@ -47,8 +49,6 @@ applications and generating I/O characterization logs.
* *darshan-util*: Darshan utilities for analyzing the contents of a given Darshan * *darshan-util*: Darshan utilities for analyzing the contents of a given Darshan
I/O characterization log. I/O characterization log.
The following subsections provide an overview of each of these components with specific The following subsections provide an overview of each of these components with specific
attention to how new instrumentation modules may be integrated into Darshan. attention to how new instrumentation modules may be integrated into Darshan.
...@@ -64,21 +64,32 @@ The primary responsibilities of the darshan-runtime component are: ...@@ -64,21 +64,32 @@ The primary responsibilities of the darshan-runtime component are:
* logging the compressed I/O characterization to file for future evaluation * logging the compressed I/O characterization to file for future evaluation
The first two responsibilities are the burden of the instrumentation module developer, while the last The first two responsibilities are the burden of module developers, while the last two are handled
two are handled automatically by Darshan. automatically by Darshan.
In general, instrumentation modules are composed of:
* wrapper functions for intercepting I/O functions;
* internal functions for initializing and maintaining internal data structures and module-specific
I/O characterization data;
* a set of functions for interfacing with the Darshan runtime environment, including an optional
reduction operation to condense I/O data shared on all processes into a single data record.
==== Instrumentation modules ==== Instrumentation modules
The wrapper functions used to intercept I/O function calls of interest are central to the design of The wrapper functions used to intercept I/O function calls of interest are central to the design of
any Darshan instrumentation module. These wrappers are used to extract pertinent I/O data from any Darshan instrumentation module. These wrappers are used to extract pertinent I/O data from
the function call and persist this data in some state structure maintained by the module. The wrappers the function call and persist this data in some state structure maintained by the module. Modules
are inserted at compile time for statically linked executables (e.g., using the linkers `--wrap` must bootstrap themselves by initializing internal data structures within wrapper functions. The
mechanism) and at runtime for dynamically linked executables (using LD_PRELOAD). wrappers are inserted at compile time for statically linked executables (e.g., using the linkers
`--wrap` mechanism) and at runtime for dynamically linked executables (using LD_PRELOAD).
*NOTE*: Modules should not perform any I/O or communication within wrapper functions. Darshan records *NOTE*: Modules should not perform any I/O or communication within wrapper functions. Darshan records
I/O data independently on each application process, then merges the data from all processes when the I/O data independently on each application process, then merges the data from all processes when the
job is shutting down. This defers expensive I/O and communication operations to the shutdown process, job is shutting down. This defers expensive I/O and communication operations to the shutdown process,
limiting Darshan's impact on application I/O performance. minimizing Darshan's impact on application I/O performance.
When the instrumented application terminates and Darshan begins its shutdown procedure, it requires When the instrumented application terminates and Darshan begins its shutdown procedure, it requires
a way to interface with any active modules that have data to contribute to the output I/O characterization. a way to interface with any active modules that have data to contribute to the output I/O characterization.
...@@ -88,15 +99,15 @@ environment to coordinate with modules while shutting down: ...@@ -88,15 +99,15 @@ environment to coordinate with modules while shutting down:
[source,c] [source,c]
struct darshan_module_funcs struct darshan_module_funcs
{ {
void (*disable_instrumentation)(void); void (*begin_shutdown)(void);
void (*prepare_for_reduction)( void (*setup_reduction)(
darshan_record_id *shared_recs, darshan_record_id *shared_recs,
int *shared_rec_count, int *shared_rec_count,
void **send_buf, void **send_buf,
void **recv_buf, void **recv_buf,
int *rec_size int *rec_size
); );
void (*reduce_records)( void (*record_reduction_op)(
void* a, void* a,
void* b, void* b,
int *len, int *len,
...@@ -109,55 +120,42 @@ struct darshan_module_funcs ...@@ -109,55 +120,42 @@ struct darshan_module_funcs
void (*shutdown)(void); void (*shutdown)(void);
}; };
`disable_instrumentation()` `begin_shutdown()`
This function informs the module that Darshan is about to begin shutting down. It should disable This function informs the module that Darshan is about to begin shutting down. It should disable
all wrappers and stop updating internal data structures to ensure data consistency and avoid all wrappers to prevent the module from making future updates to internal data structures, primarily
other race conditions. to ensure data consistency and avoid other race conditions. This function also serves as a final
opportunity for a module to modify internal data structures prior to a possible reduction of shared
`prepare_for_reduction()` data.
Since Darshan aggregates shared data records (i.e., records which all application processes `setup_reduction()`
accessed) into a single record, module developers must provide mechanisms for performing a reduction
on these records.
This function is used to prepare a module for performing a reduction operation. In general, this An optional feature provided to instrumentation modules it the ability to run reduction operations
just involves providing the input buffers to the reduction, and (on rank 0 only) providing output on I/O data records which are shared across all application processes (e.g., data records for a
buffer space to store the result of the reduction. shared file). This reduction is done to minimize the size of the resulting I/O characterization,
by aggregating shared records into a single data record.
* _shared_recs_ is a set of Darshan record identifiers which are associated with this module.
These are the records which need to be reduced into single shared data records.
* _shared_rec_count_ is a pointer to an integer storing the number of shared records will
be reduced by this module. When the function is called this variable points to the number
of shared records detected by Darshan, but the module can decide not to reduce any number
of these records. Upon completion of the function, this variable should point to the number
of shared records to perform reductions on (i.e., the size of the input and output buffers).
* _send_buf_ is a pointer to the address of the send buffer used for performing the reduction This function allows modules to setup internal data structures to run a reduction operation
operation. Upon completion, this variable should point to a buffer containing *_shared_rec_count_ on data records which are shared across all application processes. Module developers can bypass
records that will be reduced. the shared record reduction mechanism by setting the `setup_reduction` function pointer equal to `NULL`.
This is helpful in initial prototyping of a module, or in the case where a module would not maintain
I/O data which is shared across all processes.
* _recv_buf_ is a pointer to the address of the receive bufffer used for performing the reduction The shared record reduction mechanism is described in detail
operation. Upon completion, this variable should point to a buffer containing *_shared_rec_count_ link:darshan-modularization.html#_shared_record_reductions[here].
records that will be reduced. This variable is only valid on the root process (rank 0). This
buffer address needs to be stored with module state, as it will be needed when retrieiving
the final output buffers from this module.
* _rec_size_ is just the size of the record structure being reduced for this module. `record_reduction_op()`
`reduce_records()` This function implements the actual shared record reduction operation. Module developers can bypass
the shared record reduction mechanism by setting the `record_reduction_op` pointer equal to `NULL`.
This is the function which performs the actual shared record reduction operation. The prototype The shared record reduction mechanism is described in detail
of this function matches that of the user function provided to the MPI_Op_create function. Refer link:darshan-modularization.html#_shared_record_reductions[here].
to the http://www.mpich.org/static/docs/v3.1/www3/MPI_Op_create.html[documentation] for further
details.
`get_output_data()` `get_output_data()`
This function is responsible for passing back a single buffer storing all data this module is This function is responsible for passing back a single buffer storing all data this module is
contributing to the output I/O characterization. On rank 0, this may involve copying the results contributing to the output I/O characterization.
of the shared record reduction into the output buffer.
* _buf_ is a pointer to the address of the buffer this module is contributing to the I/O * _buf_ is a pointer to the address of the buffer this module is contributing to the I/O
characterization. characterization.
...@@ -173,14 +171,15 @@ all internal data structures. ...@@ -173,14 +171,15 @@ all internal data structures.
Within darshan-runtime, the darshan-core component manages the initialization and shutdown of the Within darshan-runtime, the darshan-core component manages the initialization and shutdown of the
Darshan environment, provides instrumentation module developers an interface for registering modules Darshan environment, provides instrumentation module developers an interface for registering modules
with Darshan, and manages the compressing and the writing of the resultant I/O charracterization. with Darshan, and manages the compressing and the writing of the resultant I/O characterization.
Each of the functions defined by this interface are explained in detail below. Each of the functions defined by this interface are explained in detail below.
[source,c] [source,c]
void darshan_core_register_module( void darshan_core_register_module(
darshan_module_id mod_id, darshan_module_id mod_id,
struct darshan_module_funcs *funcs, struct darshan_module_funcs *funcs,
int *runtime_mem_limit); int *mod_mem_limit,
int *sys_mem_alignment);
The `darshan_core_register_module` function registers Darshan instrumentation modules with the The `darshan_core_register_module` function registers Darshan instrumentation modules with the
darshan-core runtime environment. This function needs to be called at least once for any module darshan-core runtime environment. This function needs to be called at least once for any module
...@@ -192,11 +191,15 @@ format header file (darshan-log-format.h). ...@@ -192,11 +191,15 @@ format header file (darshan-log-format.h).
* _funcs_ is the structure of function pointers (as described above) that a module developer must * _funcs_ is the structure of function pointers (as described above) that a module developer must
provide to interface with the darshan-core runtime. provide to interface with the darshan-core runtime.
* _runtime_mem_limit_ is a pointer to an integer which will store the amount of memory Darshan * _mod_mem_limit_ is a pointer to an integer which will store the amount of memory Darshan
allows this module to use at runtime. Currently, darshan-core will hardcode this value to 2 MiB, allows this module to use at runtime. Currently, darshan-core will hardcode this value to 2 MiB,
but in the future this may be changed to optimize Darshan's memory footprint. Note that Darshan but in the future this may be changed to optimize Darshan's memory footprint. Note that Darshan
does not allocate any memory for modules, it just informs a module how much memory it can use. does not allocate any memory for modules, it just informs a module how much memory it can use.
* _sys_mem_alignment_ is a pointer to an integer which will store the system memory alignment value
Darshan was configured with. This parameter may be set to `NULL` if a module is not concerned with the
memory alignment value.
[source,c] [source,c]
void darshan_core_unregister_module( void darshan_core_unregister_module(
darshan_module_id mod_id); darshan_module_id mod_id);
...@@ -215,7 +218,8 @@ void darshan_core_register_record( ...@@ -215,7 +218,8 @@ void darshan_core_register_record(
int len, int len,
int printable_flag, int printable_flag,
darshan_module_id mod_id, darshan_module_id mod_id,
darshan_record_id *rec_id); darshan_record_id *rec_id,
int *file_alignment);
The `darshan_core_register_record` function registers some data record with the darshan-core The `darshan_core_register_record` function registers some data record with the darshan-core
runtime. This record could reference a POSIX file or perhaps an object identifier for an runtime. This record could reference a POSIX file or perhaps an object identifier for an
...@@ -240,12 +244,16 @@ is the size of the record name type. ...@@ -240,12 +244,16 @@ is the size of the record name type.
* _rec_id_ is a pointer to a variable which will store the unique record identifier generated * _rec_id_ is a pointer to a variable which will store the unique record identifier generated
by Darshan. by Darshan.
* _file_alignment_ is a pointer to an integer which will store the the file alignment (block size)
of the underlying storage system. This parameter may be set to `NULL` if it is not applicable to a
given module.
[source,c] [source,c]
void darshan_core_unregister_record( void darshan_core_unregister_record(
darshan_record_id rec_id, darshan_record_id rec_id,
darshan_module_id mod_id); darshan_module_id mod_id);
The `darshan_core_unregister_record` functoin disassociates the given module identifier from the The `darshan_core_unregister_record` function disassociates the given module identifier from the
given record identifier. If no other modules are associated with the given record identifier, then given record identifier. If no other modules are associated with the given record identifier, then
Darshan removes all internal references to the record. This function should only be used if a Darshan removes all internal references to the record. This function should only be used if a
module registers a record with darshan-core, but later decides not to store the record internally. module registers a record with darshan-core, but later decides not to store the record internally.
...@@ -367,7 +375,13 @@ Close Darshan file descriptor `fd`. Returns `0` on success, `-1` on failure. ...@@ -367,7 +375,13 @@ Close Darshan file descriptor `fd`. Returns `0` on success, `-1` on failure.
== Adding new instrumentation modules == Adding new instrumentation modules
In this section we outline each step necessary to adding a module to Darshan. In this section we outline each step necessary to adding a module to Darshan. To assist module
developers, we have provided the example "NULL" module (`darshan-runtime/lib/darshan-null.c`)
as part of the darshan-runtime source. This example can be used as a minimal stubbed out module
implementation. It is also heavily annotated to document more specific functionality provided
by Darshan to module developers. For a full-fledged implementation of a module, developers
can examine the POSIX module (`darshan-runtime/lib/darshan-posix.c`), which wraps and instruments
a number of POSIX I/O functions.
=== Log format headers === Log format headers
...@@ -377,8 +391,12 @@ the module's record structure: ...@@ -377,8 +391,12 @@ the module's record structure:
* Add module identifier to darshan_module_id enum and add module string name to the * Add module identifier to darshan_module_id enum and add module string name to the
darshan_module_name array in `darshan-log-format.h`. darshan_module_name array in `darshan-log-format.h`.
* Add a top-level header that defines a data record structure for the module. An exemplar * Add a top-level header that defines an I/O data record structure for the module. Consider
log header for the POSIX instrumentation module is given in `darshan-posix-log-format.h`. the "NULL" module and POSIX module log format headers for examples (`darshan-null-log-format.h`
and `darshan-posix-log-format.h`, respectively).
These log format headers are defined at the top level of the Darshan source tree, since both the
darshan-runtime and darshan-util repositories depend on them.
=== Darshan-runtime === Darshan-runtime
...@@ -387,17 +405,22 @@ log header for the POSIX instrumentation module is given in `darshan-posix-log-f ...@@ -387,17 +405,22 @@ log header for the POSIX instrumentation module is given in `darshan-posix-log-f
The following modifications to the darshan-runtime build system are necessary to integrate The following modifications to the darshan-runtime build system are necessary to integrate
new instrumentation modules: new instrumentation modules:
* Necessary linker flags for wrapping this module's functions need to be added to the definition * Necessary linker flags for wrapping this module's functions need to be added to a
of `CP_WRAPPERS` in `darshan-config.in`. module-specific file which is used when linking applications with Darshan. For an example,
consider `darshan-runtime/darshan-posix-ld-opts`, the required linker options for the POSIX
module. The base linker options file for Darshan (`darshan-runtime/darshan-base-ld-opts.in`)
must also be updated to point to the new module-specific linker options file.
* Targets must be added to `Makefile.in` to build static and shared objects for the module's * Targets must be added to `Makefile.in` to build static and shared objects for the module's
source files, which will be stored in the `lib/` directory. The prerequisites to building source files, which will be stored in the `darshan-runtime/lib/` directory. The prerequisites
static and dynamic versions of `lib-darshan` must be updated to include these objects, as well. to building static and dynamic versions of `libdarshan` must be updated to include these objects,
as well.
- If the module defines a linker options file, a target must also be added to install this
file with libdarshan.
==== Instrumentation module implementation ==== Instrumentation module implementation
An exemplar instrumentation module for POSIX I/O functions is given in `lib/darshan-posix.c` as In addtion to the development notes from above and the exemplar "NULL" and POSIX modules, we
reference. In addtion to the development notes from above and the reference POSIX module, we
provide the following notes to assist module developers: provide the following notes to assist module developers:
* Modules only need to include the `darshan.h` header to interface with darshan-core. * Modules only need to include the `darshan.h` header to interface with darshan-core.
...@@ -437,6 +460,70 @@ the POSIX module, consider files `darshan-posix-logutils.c` and `darshan-posix-l ...@@ -437,6 +460,70 @@ the POSIX module, consider files `darshan-posix-logutils.c` and `darshan-posix-l
Also, the `darshan-posix-parser` source provides a simple example of a utility which can leverage Also, the `darshan-posix-parser` source provides a simple example of a utility which can leverage
libdarshan-util for analyzing the contents of a given Darshan I/O characterization log. libdarshan-util for analyzing the contents of a given Darshan I/O characterization log.
== Shared record reductions
Since Darshan perfers to aggregate data records which are shared across all processes into a single
data record, module developers should consider implementing this functionality eventually, though it
is not strictly required.
As mentioned previously, module developers must provide implementations for the `begin_reduction()`
and `record_reduction_op` functions in the darshan_module_funcs structure to leverage Darshan's
shared record reduction mechanism. These functions are described in detail as follows:
[source,c]
void (*setup_reduction)(
darshan_record_id *shared_recs,
int *shared_rec_count,
void **send_buf,
void **recv_buf,
int *rec_size
);
This function is used to prepare a module for performing a reduction operation. In general, this
just involves providing the input buffers to the reduction, and (on rank 0 only) providing output
buffer space to store the result of the reduction.
* _shared_recs_ is a set of Darshan record identifiers which are associated with this module.
These are the records which need to be reduced into single shared data records.
* _shared_rec_count_ is a pointer to an integer storing the number of shared records will
be reduced by this module. When the function is called this variable points to the number
of shared records detected by Darshan, but the module can decide not to reduce any number
of these records. Upon completion of the function, this variable should point to the number
of shared records to perform reductions on (i.e., the size of the input and output buffers).
* _send_buf_ is a pointer to the address of the send buffer used for performing the reduction
operation. Upon completion, this variable should point to a buffer containing *_shared_rec_count_
records that will be reduced.
* _recv_buf_ is a pointer to the address of the receive bufffer used for performing the reduction
operation. Upon completion, this variable should point to a buffer containing *_shared_rec_count_
records that will be reduced. This variable is only valid on the root process (rank 0). This
buffer address needs to be stored with module state, as it will be needed when retrieiving
the final output buffers from this module.
* _rec_size_ is just the size of the record structure being reduced for this module.
[source,c]
void (*record_reduction_op)(
void* a,
void* b,
int *len,
MPI_Datatype *datatype
);
This is the function which performs the actual shared record reduction operation. The prototype
of this function matches that of the user function provided to the MPI_Op_create function. Refer
to the http://www.mpich.org/static/docs/v3.1/www3/MPI_Op_create.html[documentation] for further
details.
Note that a module will likely need to clean up it's internal state after a reduction to get
all data records into a contiguous buffer, as Darshan requires. This can be done within the
`get_output_buffer()` function.
Module developers can examine the POSIX module for a comprehensive (and monolithic) implementation
of the shared record reduction functionality.
== Other resources == Other resources
* http://www.mcs.anl.gov/research/projects/darshan/[Darshan website] * http://www.mcs.anl.gov/research/projects/darshan/[Darshan website]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment