Commit 83ffb9a7 authored by Huihuo Zheng's avatar Huihuo Zheng
Browse files

read

parent 23804dd9
......@@ -9,20 +9,22 @@ For write workflow,the main program writes data to the node-local storage and th
For read workflow, the main program reads data from the filesystem and then the pthread moves them to the node-local storage. The main program reads data from node-local storage afterwards.
## ./hdf5
* H5Dio_cache.c, H5Dio_cache.h -- source codes for incorporating node-local storage into parallel read and write HDF5.
* H5Dio_cache.cpp, H5Dio_cache.h -- source codes for incorporating node-local storage into parallel read and write HDF5.
* test_read_cache.cpp -- testing code for read
* test_write_cache.cpp -- testing code for write
## ./mpiio
mpiio_cache.c mpiio_cache.h -- source code for incorporating node-local storage into MPI I/O
* mpiio_cache.c mpiio_cache.h -- source code for incorporating node-local storage into MPI I/O
* memory_map.cpp, run.py --benchmark codes for evaluating the performance
memory_map.cpp, run.py --benchmark codes for evaluating the performance
## ./h5py
* Some python codes mimic the python frontend of loading dataset.
## ./tests
* Some testing codes for understanding the cache effect, MPI_windows, etc.
## ./utils
Some timing, debugging functions.
Some timing, debugging, and profiling functions.
## ./doc
Design document and initial evaluation report
\ No newline at end of file
# Incorparating node local storage in HDF5
# Incorparating node local storage in HDF5
Author: Huihuo Zheng <huihuo.zheng@anl.gov>
This folder contains the prototype of system-aware HDF5 incoroprating node-local
storage. We developed this for multiple read workflows
storage. This is part of the ExaHDF5 ECP project lead by Suren Byna <sbyna@lbl.gov>.
## Source file
* H5Dio_cache.c, H5Dio_cache.h -- source codes for incorporating node-local storage into parallel read and write HDF5.
* test_read_cache.cpp -- testing code for read
* test_write_cache.cpp -- testing code for write
* prepare_dataset.cpp -- preparing dataset for the read testing.
* H5Dio_cache.cpp, H5Dio_cache.h -- source codes for incorporating node-local storage into parallel read and write HDF5.
* test_read_cache.cpp -- testing code for read
* test_write_cache.cpp -- testing code for write
* prepare_dataset.cpp -- preparing dataset for the read testing.
## Function APIs
* H5Dwrite_cache -- writing data to node local storage and then the pthread move the data to the parallel file system
......@@ -22,8 +23,8 @@ storage. We developed this for multiple read workflows
* --scratch: the location of the raw data
* --sleep: sleep between different iterations
environmental variable to turn on the SSD_CACH=yes
SSD_PATH -- environmental variable setting the path of the
In this benchmark code, one can turns on the SSD cache effect by setting the environmental variable to SSD_CACH=yes.
SSD_PATH -- environmental variable setting the path of the SSD.
## Parallel HDF5 Read incorporating node-local storage
**Preparing the dataset**
......@@ -35,7 +36,7 @@ python prepare_dataset.py --num_images 8192 --sz 224 --output images.h5
This will generate a hdf5 file, images.h5, which contains 8192 samples, each with 224*224*3 (image-base dataset)
**Benchmarks**
test_read_cache.cpp is the benchmark code for evaluating the performance.
test_read_cache.cpp is the benchmark code for evaluating the performance.
* --input: HDF5 file
* --dataset: the name of the dataset in the HDF5 file
* --num_epochs [Default: 2]: Number of epochs (at each epoch/iteration, we sweep through the dataset)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment