Commit 2d18459c authored by Richard Zamora's avatar Richard Zamora
Browse files

updating some SOL documentation

parent a15b4cc5
# The Scalable Optimization Layer (SOL) for the HDF5 MPIO Virtual File Driver (MPIO-VFD)
This document is intended to provide an overview of the scalable optimization layer (SOL) for parallel HDF5 I/O operations. The purpose of the SOL is to provide a dedicated module, on *top* of the existing HDF5 MPIO virtual file driver (MPIO-VFD), for the implementation of custom parallel optimizations. Currently, use of the SOL requires that an MPI library is available on the system. When enabled, calls to the MPIO-VFD can be rerouted into the SOL, which can use either MPI-IO and/or POSIX/GNI/OFI to interact with the underlying parallel file system (PFS).
This document is intended to provide an overview of the scalable optimization layer (SOL) for parallel HDF5 I/O operations. The purpose of the SOL is to provide a dedicated module, on *top* of the existing HDF5 MPIO virtual file driver (MPIO-VFD), for the implementation of scalable collective optimizations for parallel I/O. Currently, use of the SOL requires that an MPI library is available on the system. When enabled, calls to the MPIO-VFD can be rerouted into the SOL, which can use either MPI-IO and/or POSIX/GNI/OFI to interact with the underlying parallel file system (PFS).
The SOL is designed to support/modify HDF5 `select_write` operations (and/or `select_read` in the future). In the MPIO-VFD, a `select_write` call will utilize `H5FD_mpio_custom_write`, which transfers data between *flatbuf* structures in both file and memory space. When such an operation is performed, the default HDF5 call path can be overridden by a custom SOL algorithm.
An ideal SOL end product will contain a suite of optimizations for system-aware data aggregation and PFS access. Since, by default, parallel HDF5 relies on the underlying MPI-IO library for collective I/O performance, the key advantage of this new layer is that it provides a dedicated module for the implementation of algorithms that may be missing from the available MPI-IO libraries on a given system (and/or may have unnecessary overhead within the existing HDF5 MPIO-VFD). The optimizations implemented in this new layer may be system dependent, application dependent, and/or experimental.
......@@ -55,7 +57,11 @@ Aside from the `H5FDmpio.c` file, most of the source code for SOL is located in
This file is entirely new, and dedicated to SOL algorithms.
### `H5FDmpio.c`
The `H5FD_sol_setup` function is added to define the appropriate variable structures if the `HDF5_CUSTOM_AGG` environment variable is set to `yes`. We also define the `H5FD_mpio_custom_write` function (and set it as the appropriate *select_write* call in `H5FD_mpio_g`), which is responsible for actually calling the custom aggregation procedures in `H5FDmpio_sol.c`.
The `H5FD_sol_setup` function is added to define the appropriate variable structures if the `HDF5_CUSTOM_AGG` environment variable is set to `yes`. This function is also responsible for selecting the aggregator ranks. For BGQ, this is currently achieved by using the aggregator list provided by MPICH. For Theta/Lustre, a simple `generateRanklist` function (defined in `H5FDmpio_sol.c`) is used.
The call to `H5FD_sol_setup` is performed inside the `H5FD_mpio_open` function, when opening the file. However, the `file->custom_agg_data.ranklist` is also passed to the appropriate `ADIO_*_WriteStridedColl_CA` function (in `H5FDmpio_sol.c`) during each `H5DWrite` operation. Therefore, the aggregator ranklist can be modified between operations for topology-aware aggregator placement.
We also define the `H5FD_mpio_custom_write` function (and set it as the appropriate *select_write* call in `H5FD_mpio_g`), which is responsible for actually calling the custom aggregation procedures in `H5FDmpio_sol.c`. The `custom_write` implementation uses a helper `H5FD_mpio_setup_flatbuf` function to define the appropriate *flatbuf* structure.
### `H5Dmpio.c`
Some *write* functions are modified to check for the `HDF5_CUSTOM_AGG` environment variable, and the call is redirected to `H5F_select_write` if the value of the environment variable is equal to `yes`.
......@@ -68,6 +74,9 @@ Definition of new functions (that are declared in `H5Sprivate.h`) to make some `
Note that `H5FDmpio.c` has access to `H5Sprivate.h` through the included `H5Dprivate.h` header file.
### `H5Dprivate.h`
See above.
### `H5Sprivate.h`
In order to expose the existing `H5S_<hyper,point,all>_get_seq_list` routines to the new `H5FD_mpio_custom_write` function, their definitions were moved to `H5Sprivate.h` from `H5Shyper.c`, `H5Spoint.c` and `H5Sall.c`.
......@@ -130,4 +139,4 @@ The implementation of `ADIOI_OneSidedWriteAggregation_CA` currently spans lines
## References
[1] François Tessier, Venkatram Vishwanath, Emmanuel Jeannot - TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers - IEEE Cluster 2017, Honolulu, HI (Sept. 2017)
\ No newline at end of file
[1] François Tessier, Venkatram Vishwanath, Emmanuel Jeannot - TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers - IEEE Cluster 2017, Honolulu, HI (Sept. 2017)l
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment