Commit b4cb5901 authored by Francois Tessier's avatar Francois Tessier

Update README and TODO-list

parent be99d4c1
# TAPIOCA: Topology-Aware Parellel I/O Collective Aggregation
TAPIOCA is a static library implementing the two-phase I/O scheme on top of MPI I/O. This library is topology-aware in that it provides a couple of aggregator placement strategies taking into account the network characteristics and the data access pattern. TAPIOCA is optimized for large-scale supercomputers through an implementation made using MPI one-sided communication (RMA) and non-blocking operation.
TAPIOCA (before being named like this) has been introduced in a SC'16 Workshop paper: [Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers](http://www.francoistessier.info/documents/COM-HPC16-IO.pdf)
TAPIOCA is a static library implementing the two-phase I/O scheme on top of MPI I/O. This library is topology-aware in that it provides a couple of aggregator placement strategies taking into account the network characteristics, the deep memory hierarchy and the data access pattern. TAPIOCA is optimized for large-scale supercomputers through an implementation made using MPI one-sided communication (RMA) and non-blocking operation.
TAPIOCA (before being named like this) has been introduced in a SC'16 Workshop paper: [Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers](http://www.francoistessier.info/documents/COM-HPC16-IO.pdf). A more recent paper has been published in 2017 at the IEEE Cluster Conference: [TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers](http://www.francoistessier.info/documents/CLUSTER17.pdf). This last one is the recommended reference to use to cite this work.
## Environment variables
* TAPIOCA_STRATEGY = SHORTEST_PATH / LONGEST_PATH / TOPOLOGY_AWARE / CONTENTION_AWARE
* TAPIOCA_NBAGGR = Number of aggregators per file
* TAPIOCA_BUFFERSIZE = Buffer size in bytes. Use a multiple of the file system block size to avoid lock contention. Two allocations of this buffer size will be made to perform double-buffering
* TAPIOCA_COMMSPLIT = true / false. If true, MPI_Comm_split will be used to create one sub-communicator per aggregator. If false, the sub-communicator will be created from MPI_Groups. In the case of a single shared file as output on a large-scale run, set this variable to false can divide by two the time needed to elect the aggregators.
* TAPIOCA_DEVNULL = true / false. If true, instead of effectively writing the file, the write operation is made in /dev/null. Useful for aggregation time measurements.
* TAPIOCA_STRATEGY = SHORTEST_PATH / LONGEST_PATH / TOPOLOGY_AWARE / MEMORY_AWARE
* TAPIOCA_NBAGGR = Number of aggregators per file
* TAPIOCA_NBBUFFERS = Number of aggregation buffers per aggregator. Two buffers and more allow to pipeline the aggregation and I/O phases
* TAPIOCA_BUFFERSIZE = Buffer size in bytes. Use a multiple of the file system block size to avoid lock contention
* TAPIOCA_AGGRTIER = Tier of memory where aggregation buffers will be allocated. Depending on the system: DDR, HBM, NVR, RAN
* TAPIOCA_COMMSPLIT = true / false. If true, MPI_Comm_split will be used to create one sub-communicator per aggregator. If false, the sub-communicator will be created from MPI_Groups. In the case of a single shared file as output on a large-scale run, set this variable to false can divide by two the time needed to elect the aggregators.
* TAPIOCA_PIPELINING = Enable / disable the pipelining of multiple aggregation buffers. For debug
* TAPIOCA_REELECTAGGR = Keep the same aggregators across multiple I/O transactions (Init, Read/Write, Finalize)
* TAPIOCA_DEVNULL = true / false. If true, instead of effectively writing the file, the write operation is made in /dev/null. Useful for aggregation time measurements.
### Default values
TAPIOCA_STRATEGY = SHORTEST_PATH
......
- Add Flash-IO as a benchmark (seems to be a F90 code)
- Verify all the benchmarks
* Include and adapt the getopt function (miniHACC-AoS-Tapioca-W.cpp)
* Adapt the running scripts to the binary parameters (getopt)
......@@ -19,6 +17,9 @@
- Reuse aggregators? Does not work (wrong data read) with HACC-IO, SSF, SoA, while the AoS case works
- Isolate the three features
* Random placement
- Foreach architecture (copy from Theta?) :
* Change the tiers of memory available: PFS, NLS (node-local storage), NVR, ...
* Function to determine the target storage used according to the output file (i.e. no need for the user to set memTarg)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment