test_write_cache.cpp is the benchmark code for evaluating the performance. In this testing case, each MPI rank has a local
buffer BI to be written into a HDF5 file organized in the following way: [B0|B1|B2|B3]|[B0|B1|B2|B3]|...|[B0|B1|B2|B3]. The repeatition of [B0|B1|B2|B3] is the number of iterations
* --dim: dimension of the 2D array [BI] // this is the local buffer size