From 89437264c961eb7ba3b8dccb63d1c6c4c55786dd Mon Sep 17 00:00:00 2001 From: Shane Snyder Date: Fri, 30 Sep 2016 11:31:29 -0500 Subject: [PATCH] update documentation on instrumentation modules --- doc/darshan-modularization.txt | 306 ++++++++++++++++----------------- 1 file changed, 153 insertions(+), 153 deletions(-) diff --git a/doc/darshan-modularization.txt b/doc/darshan-modularization.txt index 023a496..2261768 100644 --- a/doc/darshan-modularization.txt +++ b/doc/darshan-modularization.txt @@ -1,7 +1,7 @@ :data-uri: -Darshan modularization branch development notes -=============================================== +Modularized I/O characterization using Darshan 3.x +================================================== == Introduction @@ -16,30 +16,7 @@ modules, which are responsible for gathering I/O data from a specific system com manage these modules at runtime and create a valid Darshan log regardless of how many or what types of modules are used. -== Checking out and building the modularization branch - -The Darshan source code is available at the following GitLab project page: -https://xgitlab.cels.anl.gov/darshan/darshan. It is worth noting that this page -also provides issue tracking to provide users the ability to browse known issues -with the code or to report new issues. - -The following commands can be used to clone the Darshan source code and checkout -the modularization branch: - ----- -git clone git@xgitlab.cels.anl.gov:darshan/darshan.git -cd darshan -git checkout dev-modular ----- - -For details on configuring, building, and using the Darshan runtime and utility -repositories, consult the documentation from previous versions -(http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-runtime.html[darshan-runtime] and -http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-util.html[darshan-util]) -- the -necessary steps for building these repositories should not have changed in the new version of -Darshan. - -== Darshan dev-modular overview +== Overview of Darshan's modularized architecture The Darshan source tree is organized into two primary components: @@ -121,7 +98,7 @@ component so it is included in the output I/O characterization. The static initialization approach is useful for modules that do not have function calls that can be intercepted and instead can just grab all I/O characterization data at Darshan startup or shutdown time. A module can be statically initialized at Darshan startup time -by adding its initializatin routine to the `mod_static_init_fns` list at the top of the +by adding its initializatin routine to the `mod_static_init_fns` array at the top of the `lib/darshan-core.c` source file. *NOTE*: Modules may wish to add a corresponding configure option to disable the module @@ -131,7 +108,7 @@ used on other systems. Most instrumentation modules can just bootstrap themselves within wrapper functions during normal application execution. Each of Darshan's current I/O library instrumentation modules -(POSIX, MPI-IO, HDF5, PnetCDF) follow this approach. Each wrapper function should just include +(POSIX, MPI-IO, stdio, HDF5, PnetCDF) follow this approach. Each wrapper function should just include logic to initialize data structures and register with `darshan-core` if this initialization has not already occurred. Darshan intercepts function calls of interest by inserting these wrappers at compile time for statically linked executables (e.g., using the linkers @@ -144,36 +121,23 @@ minimizing Darshan's impact on application I/O performance. When the instrumented application terminates and Darshan begins its shutdown procedure, it requires a way to interface with any active modules that have data to contribute to the output I/O characterization. -Darshan requires that module developers implement the following functions to allow the Darshan runtime -environment to coordinate with modules while shutting down: +The following function is implemented by each module to finalize (and perhaps reorganize) module records +before returning the record memory back to darshan-core to be compressed and written to file. [source,c] -struct darshan_module_funcs -{ - void (*begin_shutdown)(void); - void (*get_output_data)( - MPI_Comm mod_comm, - darshan_record_id *shared_recs, - int shared_rec_count, - void** mod_buf, - int* mod_buf_sz - ); - void (*shutdown)(void); -}; - -`begin_shutdown()` - -This function informs the module that Darshan is about to begin shutting down. It should disable -all wrappers to prevent the module from making future updates to internal data structures, primarily -to ensure data consistency and avoid other race conditions. - -`get_output_data()` - -This function is responsible for packing all module I/O data into a single buffer to be written -to the output I/O characterization. This function can be used to run collective MPI operations on -module data; for instance, Darshan typically tries to reduce file records which are shared across -all application processes into a single data record (more details on the shared file reduction -mechanism are given in link:darshan-modularization.html#_shared_record_reductions[Section 5]). +typedef void (*darshan_module_shutdown)( + MPI_Comm mod_comm, + darshan_record_id *shared_recs, + int shared_rec_count, + void** mod_buf, + int* mod_buf_sz +); + +This function can be used to run collective MPI operations on module data; for instance, Darshan +typically tries to reduce file records which are shared across all application processes into a +single data record (more details on the shared file reduction mechanism are given in +link:darshan-modularization.html#_shared_record_reductions[Section 5]). This function also serves +as a final opportunity for modules to cleanup and free any allocated data structures, etc. * _mod_comm_ is the MPI communicator to use for collective communication @@ -182,14 +146,11 @@ processes * _shared_rec_count_ is the size of the shared record list -* _mod_buf_ is a pointer to the buffer of this module's I/O characterization data - -* _mod_buf_sz_ is the size of the module's output buffer +* _mod_buf_ is a pointer to the buffer address of the module's contiguous set of data records -`shutdown()` - -This function is a signal from Darshan that it is safe to shutdown. It should clean up and free -all internal data structures. +* _mod_buf_sz_ is a pointer to a variable storing the aggregate size of the module's records. On +input, the pointed to value indicates the aggregate size of the module's registered records; on +ouptut, the value may be updated if, for instance, certain records are discarded ==== darshan-core @@ -206,9 +167,9 @@ described in detail below. [source,c] void darshan_core_register_module( darshan_module_id mod_id, - struct darshan_module_funcs *funcs, - int *my_rank, + darshan_module_shutdown mod_shutdown_func, int *mod_mem_limit, + int *rank, int *sys_mem_alignment); The `darshan_core_register_module` function registers Darshan instrumentation modules with the @@ -218,20 +179,18 @@ will contribute data to Darshan's final I/O characterization. * _mod_id_ is a unique identifier for the given module, which is defined in the Darshan log format header file (`darshan-log-format.h`). -* _funcs_ is the structure of function pointers (as described above in the previous section) that -a module developer must provide to interface with the darshan-core runtime. +* _mod_shutdown_func_ is the function pointer to the module shutdown function described in the +previous section. -* _my_rank_ is a pointer to an integer to store the calling process's application MPI rank in +* _inout_mod_buf_size_ is an input/output argument that stores the amount of module memory +being requested when calling the function and the amount of memory actually reserved by +darshan-core when returning. -* _mod_mem_limit_ is a pointer to an integer which will store the amount of memory Darshan -allows this module to use at runtime. Darshan's default module memory limit is currently set to -2 MiB, but the user can choose a different value at configure time (using the `--with-mod-mem` -configure option) or at runtime (using the DARSHAN_MODMEM environment variable). Note that Darshan -does not allocate any memory for modules; it just informs a module how much memory it can use. +* _rank_ is a pointer to an integer to store the calling process's application MPI rank in. +`NULL` may be passed in to ignore this value. * _sys_mem_alignment_ is a pointer to an integer which will store the system memory alignment value -Darshan was configured with. This parameter may be set to `NULL` if a module is not concerned with the -memory alignment value. +Darshan was configured with. `NULL` may be passed in to ignore this value. [source,c] void darshan_core_unregister_module( @@ -241,64 +200,56 @@ The `darshan_core_unregister_module` function disassociates the given module fro `darshan-core` runtime. Consequentially, Darshan does not interface with the given module at shutdown time and will not log any I/O data from the module. This function should only be used if a module registers itself with darshan-core but later decides it does not want to contribute -any I/O data. +any I/O data. Note that, in the current implementation, Darshan does not have the ability to +reclaim the record memory allocated to the calling module to assign to other modules. * _mod_id_ is the unique identifer for the module being unregistered. [source,c] -void darshan_core_register_record( - void *name, - int len, - darshan_module_id mod_id, - int printable_flag, - int mod_limit_flag, - darshan_record_id *rec_id, - int *file_alignment); - -The `darshan_core_register_record` function registers some data record with the darshan-core -runtime. This record could reference a POSIX file or perhaps an object identifier for an -object storage system, for instance. A unique identifier for the given record name is -generated by Darshan, which should then be used by the module for referencing the corresponding -record. This allows multiple modules to refer to a specific data record in a consistent manner -and also provides a mechanism for mapping these records back to important metadata stored by -darshan-core. It is safe (and likely necessary) to call this function many times for the same -record -- darshan-core will just set the corresponding record identifier if the record has -been previously registered. - -* _name_ is just the name of the data record, which could be a file path, object ID, etc. - -* _len_ is the size of the input record name. For string record names, this would just be the -string length, but for nonprintable record names (e.g., an integer object identifier), this -is the size of the record name type. +darshan_record_id darshan_core_gen_record_id( + const char *name); -* _mod_id_ is the identifier for the module attempting to register this record. +The `darshan_core_gen_record_id` function simply generates a unique record identifier for a +given record name. This function is generally called to convert a name string to a unique record +identifier that is needed to register a data record with darshan-core. The generation of IDs +is consistent, such that modules which reference records with the same names will store these +records using the same unique IDs, simplifying the correlation of these records for analysis. -* _printable_flag_ indicates whether the input record name is a printable ASCII string. +* _name_ is the name of the corresponding data record (often times this is just a file name). -* _mod_limit_flag_ indicates whether the calling module is out of memory to instrument new -records or not. If this flag is set, darshan-core will not create new records and instead just -search existing records for one corresponding to input _name_. +[source,c] +void *darshan_core_register_record( + darshan_record_id rec_id, + const char *name, + darshan_module_id mod_id, + int rec_len, + int *fs_info); -* _rec_id_ is a pointer to a variable which will store the unique record identifier generated -by Darshan. +The `darshan_core_register_record` function registers a data record with the darshan-core +runtime, allocating memory for the record so that it is persisted in the output log file. +This record could reference a POSIX file or perhaps an object identifier for an +object storage system, for instance. This function should only be called once for each +record being tracked by a module to avoid duplicating record memory. This function returns +the address which the record should be stored at or `NULL` if there is insufficient +memory for storing the record. -* _file_alignment_ is a pointer to an integer which will store the the file alignment (block size) -of the underlying storage system. This parameter may be set to `NULL` if it is not applicable to a -given module. +* _rec_id_ is a unique integer identifier for this record (generally generated using the +`darshan_core_gen_record_id` function). -[source,c] -void darshan_core_unregister_record( - darshan_record_id rec_id, - darshan_module_id mod_id); +* _name_ is the string name of the data record, which could be a file path, object ID, etc. +If given, darshan-core will associate the given name with the record identifier and store +this mapping in the log file so it can be retrieved for analysis. `NULL` may be passed in +to generate an anonymous (unnamed) record. -The `darshan_core_unregister_record` function disassociates the given module identifier from the -given record identifier. If no other modules are associated with the given record identifier, then -Darshan removes all internal references to the record. This function should only be used if a -module registers a record with darshan-core, but later decides not to store the record internally. +* _mod_id_ is the identifier for the module attempting to register this record. -* _rec_id_ is the record identifier we want to unregister. +* _rec_len_ is the length of the record. -* _mod_id_ is the module identifier that is unregistering _rec_id_. +* _fs_info_ is a pointer to a structure of relevant info for the file system associated +with the given record -- this structure is defined in the `darshan.h` header. Note that this +functionality only works for record names that are absolute file paths, since we determine +the file system by matching the file path to the list of mount points Darshan is aware of. +`NULL` may be passed in to ignore this value. [source,c] double darshan_core_wtime(void); @@ -307,6 +258,16 @@ The `darshan_core_wtime` function simply returns a floating point number of seco Darshan was initialized. This functionality can be used to time the duration of application I/O calls or to store timestamps of when functions of interest were called. +[source,c] +double darshan_core_excluded_path( + const char *path); + +The `darshan_core_excluded_path` function checks to see if a given file path is in Darshan's +list of excluded file paths (i.e., paths that we don't instrument I/O to/from, such as /etc, +/dev, /usr, etc.). + +* _path_ is the absolute file path we are checking. + ==== darshan-common `darshan-common` is a utility component of darshan-runtime, providing module developers with @@ -333,17 +294,20 @@ simplifying maintenance. === Darshan-util -The darshan-util component is composed of a log parsing library (libdarshan-util) and a -corresponding set of utility programs that can parse and analyze Darshan I/O characterization -logs using this library. The log parsing library includes a generic interface (see -`darshan-logutils.h`) for retrieving specific portions of a given log file. Specifically, -this interface allows utilities to retrieve a log's header metadata, job details, record -identifier mapping, and any module-specific data contained within the log. - -Module developers may wish to define additional interfaces for parsing module-specific data -that can then be integrated into the log parsing library. This extended functionality can be -implemented in terms of the generic functions offered by darshan-logutils and by module-specific -formatting information. +The darshan-util component is composed of a helper library for accessing log file data +records (`libdarshan-util`) and a set of utilities that use this library to analyze +application I/O behavior. `libdarhan-util` includes a generic interface (`darshan-logutils`) +for retrieving specific components of a given log file. Specifically, this interface allows +utilities to retrieve a log's header metadata, job details, record ID to name mapping, and +any module-specific data contained within the log. + +`libdarshan-util` additionally includes the definition of a generic module interface (`darshan-mod-logutils`) +that may be implemented by modules to provide a consistent way for Darshan utilities to interact +with module data stored in log files. This interface is necessary since each module has records +of varying size and format, so module-specific code is needed to interact with the records in a +generic manner. This interface is used by the `darshan-parser` utility, for instance, to extract +data records from all modules contained in a log file and to print these records in a consistent +format that is amenable to further analysis by other tools. ==== darshan-logutils @@ -366,22 +330,22 @@ denotes whether the log is storing partial data (that is, all possible applicati were not tracked by darshan). Returns a Darshan file descriptor on success or `NULL` on error. [source,c] -int darshan_log_getjob(darshan_fd fd, struct darshan_job *job); -int darshan_log_putjob(darshan_fd fd, struct darshan_job *job); +int darshan_log_get_job(darshan_fd fd, struct darshan_job *job); +int darshan_log_put_job(darshan_fd fd, struct darshan_job *job); Reads/writes `job` structure from/to the log file referenced by descriptor `fd`. The `darshan_job` structure is defined in `darshan-log-format.h`. Returns `0` on success, `-1` on failure. [source,c] -int darshan_log_getexe(darshan_fd fd, char *buf); -int darshan_log_putexe(darshan_fd fd, char *buf); +int darshan_log_get_exe(darshan_fd fd, char *buf); +int darshan_log_put_exe(darshan_fd fd, char *buf); Reads/writes the corresponding executable string (exe name and command line arguments) from/to the Darshan log referenced by `fd`. Returns `0` on success, `-1` on failure. [source,c] -int darshan_log_getmounts(darshan_fd fd, char*** mnt_pts, char*** fs_types, int* count); -int darshan_log_putmounts(darshan_fd fd, char** mnt_pts, char** fs_types, int count); +int darshan_log_get_mounts(darshan_fd fd, char*** mnt_pts, char*** fs_types, int* count); +int darshan_log_put_mounts(darshan_fd fd, char** mnt_pts, char** fs_types, int count); Reads/writes mounted file system information for the Darshan log referenced by `fd`. `mnt_pnts` points to an array of strings storing mount points, `fs_types` points to an array of strings storing file @@ -389,12 +353,12 @@ system types (e.g., ext4, nfs, etc.), and `count` points to an integer storing t of mounted file systems recorded by Darshan. Returns `0` on success, `-1` on failure. [source,c] -int darshan_log_gethash(darshan_fd fd, struct darshan_record_ref **hash); -int darshan_log_puthash(darshan_fd fd, struct darshan_record_ref *hash); +int darshan_log_get_namehash(darshan_fd fd, struct darshan_name_record_ref **hash); +int darshan_log_put_namehash(darshan_fd fd, struct darshan_name_record_ref *hash); Reads/writes the hash table of Darshan record identifiers to full names for all records contained in the Darshan log referenced by `fd`. `hash` is a pointer to the hash table (of type -struct darshan_record_ref *, which should be initialized to `NULL` for reading). This hash table +struct darshan_name_record_ref *), which should be initialized to `NULL` for reading. This hash table is defined by the `uthash` hash table implementation and includes corresponding macros for searching, iterating, and deleting records from the hash. For detailed documentation on using this hash table, consult `uthash` documentation in `darshan-util/uthash-1.9.2/doc/txt/userguide.txt`. @@ -402,18 +366,19 @@ The `darshan-parser` utility (for parsing module information out of a Darshan lo example of how this hash table may be used. Returns `0` on success, `-1` on failure. [source,c] -int darshan_log_getmod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz); -int darshan_log_putmod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz); +int darshan_log_get_mod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz); +int darshan_log_put_mod(darshan_fd fd, darshan_module_id mod_id, void *mod_buf, int mod_buf_sz, int ver); Reads/writes a chunk of (uncompressed) module data for the module identified by `mod_id` from/to -the Darshan log referenced by `fd`. `mod_buf_sz` specifies the number of uncompressed bytes to -read/write from/to the file and store in `mod_buf`. The `darshan_log_getmod` routine can be +the Darshan log referenced by `fd`. `mod_buf` is the buffer to read data into or write data from, +and `mod_buf_sz` is the corresponding size of the buffer. The `darshan_log_getmod` routine can be repeatedly called to retrieve chunks of uncompressed data from a specific module region of the log file given by `fd`. The `darshan_log_putmod` routine just continually appends data to a -specific module region in the log file given by `fd`. This function returns the number of bytes -read/written on success, `-1` on failure. +specific module region in the log file given by `fd` and accepts an additional `ver` parameter +indicating the version number for the module data records being written. These functions return +the number of bytes read/written on success, `-1` on failure. -*NOTE*: Darshan use a reader makes right conversion strategy to rectify endianness issues +*NOTE*: Darshan use a "reader makes right" conversion strategy to rectify endianness issues between the machine a log was generated on and a machine analyzing the log. Accordingly, module-specific log utility functions will need to check the `swap_flag` variable of the Darshan file descriptor to determine if byte swapping is necessary. 32-bit and 64-bit byte swapping @@ -431,6 +396,42 @@ The correct order for writing all log file data to file is: (1) job data, (2) ex mount data, (4) record id -> file name map, (5) each module's data, in increasing order of module identifiers. +==== darshan-mod-logutils + +The `darshan-mod-logutils` interface provides a convenient way to implement new log functionality +across all Darshan instrumentation modules, which can potentially greatly simplify the developent +of new Darshan log utilies. These functions are defined in the `darshan_mod_logutil_funcs` structure +in `darshan-logutils.h` -- instrumentation modules simply provide their own implementation of each +function, then utilities can leverage this functionality using the `mod_logutils` array defined in +`darshan-logutils.c`. A description of some of the currently implemented functions are provided below. + +[source,c] +int log_get_record(darshan_fd fd, void **buf); +int log_put_record(darshan_fd fd, void *buf); + +Reads/writes the module record stored in `buf` to the log referenced by `fd`. Notice that a +size parameter is not needed since the utilities calling this interface will likely not know +the record size -- the module-specific log utility code can determine the corresponding size +before reading/writing the record from/to file. + +*NOTE*: `log_get_record` takes a pointer to a buffer address rather than just the buffer address. +If the pointed to address is equal to `NULL`, then record memory should be allocated instead. This +functionality helps optimize memory usage, since utilities often don't know the size of records +being accessed but still must provide a buffer to read them into. + +[source,c] +void log_print_record(void *rec, char *name, char *mnt_pt, char *fs_type); + +Prints all data associated with the record pointed to by `rec`. `name` holds the corresponding name +string for this record. `mnt_pt` and `fs_type` hold the corresponding mount point path and file +system type strings associated with the record (only valid for records with names that are absolute +file paths). + +[source,c] +void log_print_description(int ver); + +Prints a description of the data stored within records for this module (with version number `ver`). + == Adding new instrumentation modules In this section we outline each step necessary for adding a module to Darshan. To assist module @@ -487,11 +488,10 @@ provide the following notes to assist module developers: * Modules only need to include the `darshan.h` header to interface with darshan-core. -* The file record identifier given when registering a record with darshan-core can be used +* The file record identifier given when registering a record with darshan-core should be used to store the record structure in a hash table or some other structure. - - The `darshan_core_register_record` function is really more like a lookup function. It - may be called multiple times for the same record -- if the record already exists, the function - simply returns its record ID. + - Subsequent calls that need to modify this record can then use the corresponding record + identifier to lookup the record in this local hash table. - It may be necessary to maintain a separate hash table for other handles which the module may use to refer to a given record. For instance, the POSIX module may need to look up a file record based on a given file descriptor, rather than a path name. @@ -527,8 +527,8 @@ data record, module developers should consider implementing this functionality e is not strictly required. Module developers should implement the shared record reduction mechanism within the module's -`get_output_data()` function, as it provides an MPI communicator for the module to use for -collective communication and a list of record identifiers which are shared globally by the +`darshan_module_shutdown()` function, as it provides an MPI communicator for the module to use +for collective communication and a list of record identifiers which are shared globally by the module (as described in link:darshan-modularization.html#_darshan_runtime[Section 3.1]). In general, implementing a shared record reduction involves the following steps: -- 2.26.2