Commit 36263a53 authored by Francois Tessier's avatar Francois Tessier

More details in the README

parent 3062b7f2
......@@ -5,16 +5,30 @@ TAPIOCA is a static library implementing the two-phase I/O scheme on top of MPI
TAPIOCA (before being named like this) has been introduced in a SC'16 Workshop paper: [Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers](http://www.francoistessier.info/documents/COM-HPC16-IO.pdf). A more recent paper has been published in 2017 at the IEEE Cluster Conference: [TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers](http://www.francoistessier.info/documents/CLUSTER17.pdf). This last one is the recommended reference to use to cite this work.
## Environment variables
* TAPIOCA_STRATEGY = SHORTEST_PATH / LONGEST_PATH / TOPOLOGY_AWARE / MEMORY_AWARE
* TAPIOCA_STRATEGY = (see below)
* TAPIOCA_NBAGGR = Number of aggregators per file
* TAPIOCA_NBBUFFERS = Number of aggregation buffers per aggregator. Two buffers and more allow to pipeline the aggregation and I/O phases
* TAPIOCA_BUFFERSIZE = Buffer size in bytes. Use a multiple of the file system block size to avoid lock contention
* TAPIOCA_AGGRTIER = Tier of memory where aggregation buffers will be allocated. Depending on the system: DDR, HBM, NVR, RAN
* TAPIOCA_AGGRTIER = Tier of memory where aggregation buffers will be allocated. See below for more details.
* TAPIOCA_COMMSPLIT = true / false. If true, MPI_Comm_split will be used to create one sub-communicator per aggregator. If false, the sub-communicator will be created from MPI_Groups. In the case of a single shared file as output on a large-scale run, set this variable to false can divide by two the time needed to elect the aggregators.
* TAPIOCA_PIPELINING = Enable / disable the pipelining of multiple aggregation buffers. For debug
* TAPIOCA_REELECTAGGR = Keep the same aggregators across multiple I/O transactions (Init, Read/Write, Finalize)
* TAPIOCA_DEVNULL = true / false. If true, instead of effectively writing the file, the write operation is made in /dev/null. Useful for aggregation time measurements.
### Aggregator placement strategies
* SHORTEST_PATH: The aggregators are placed close to the I/O nodes
* LONGEST_PATH: The aggregators are placed as far as possible to the I/O nodes
* TOPOLOGY_AWARE: Strategy taking into account the network interconnect topology and the data access pattern to find an efficient aggregators placement
* MEMORY_AWARE: Strategy based on the network interconnect topology as well as the deep memory hierarchy to select the most appropriate location for data aggregation
### Deep memory hierarchy
* DDR: Main memory
* HBM: High bandwidth memory (Intel Knights Landing)
* NVR: Non-volatile memory. It can be a node-local SSD for instance. Based on a file to manage the aggregation buffers.
* NAM: Network-attached memory (Kove RAN for instance)
* NLS: Node-local storage. Cannot be used at an aggregation layer. Internally used when a local storage is used as a file destination.
* PFS: Parallel file system (GPFS, Lustre, ...)
### Default values
TAPIOCA_STRATEGY = SHORTEST_PATH
TAPIOCA_NBAGGR = 8
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment