README.md 5.34 KB
Newer Older
1
## Building and Running the Exerciser on Theta (ALCF - Cray XC40)
2

3
4
These are instructions for building the parallel HDF5 exerciser, and
running a basic exerciser test on the Cray XC40 Theta machine (at ALCF).
5

6
I am also including instructions for building HDF5 (the CCIO development branch), because it is likely that you may need to do these things together. Feel free to skip the HDF5-build steps if they don't apply to you.
7
8

### Setting up the Directory Structure
9

Richard's avatar
Richard committed
10
For all instructions in this document, we assume the directory structure defined in this section. This structure is not necessary for the exerciser code to function correctly, but it will allow you to follow the instructions as closely as possible. The structure assumes that you will be building the `CCIO` version of HDF5 yourself within the defined structure.  This is not required (you can simply skip the HDF5 build instructions, and use a different `HDF5_INSTALL_DIR` when you build the exerciser).
11

12
First, define the root directory for building HDF5 and the Exerciser:
13

14
15
16
```
export HDF5_ROOT=<your-desired-root-directory>
```
17

18
Create the top level of the directory structure for this example:
19

20
21
22
```
mkdir $HDF5_ROOT
cd $HDF5_ROOT
23
24
mkdir exerciser
mkdir library
25
26
27
mkdir repos
```

28
Clone the necessary git repositories. Note that the Custom Collective I/O (CCIO) version of the HDF5 is under development for the ExaHDF5 project.  The code is currently in the `ccio` branch of the official HDF5 development repository (see: https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse).
29
30
31
32
33

First, clone the repo with the Exerciser (if you already did this, just move the repo to this location):

```
cd gitrepos
34
git clone git@xgitlab.cels.anl.gov:ExaHDF5/BuildAndTest.git
35
```
36

37
If using CCIO, clone it (be sure to use the 'ccio' branch of HDF5):
38

39
```
40
41
git clone https://bitbucket.hdfgroup.org/scm/hdffv/hdf5.git
cd hdf5
42
43
44
git checkout ccio
cd ..
```
45

46
47
48
49
Create the rest of the directory structure for this example:

```
cd ../library
50
51
52
mkdir build
mkdir install
cd install
53
mkdir ccio
54
cd ../build
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
mkdir ccio
```


### (If Desired) Building the CCIO Branch of HDF5

Here, we are using the `ccio` directories to build and install the code:

```
cd $HDF5_ROOT/library/build/ccio
```

Run autogen:

```
./autogen.sh
```
72

73
74
75
First, we need to unload the darshan module (darshan has issues with cray-mpi ad_lustre) and swap the `craype-mic-knl` compiler module with that of `craype-haswell`. The compiler swap is necessary, because the configure script will need to run some commands on the head node (which is not a knl node).  We also need to ensure that we can dynamically link to the LUSTRE library API:

```
76
module unload darshan
77
78
79
80
module swap craype-mic-knl craype-haswell
export LDFLAGS="-llustreapi"
export CRAYPE_LINK_TYPE=dynamic
```
81

82
Now, it is time to run the configuration step. Make sure the following line is correct for your installation (make sure the `configure` and `--prefix` paths are correct):
83

84
85
86
```
CC=cc CFLAGS='-O3 -DTHETA -Dtopo_timing -Dtopo_debug' $HDF5_ROOT/library/build/ccio/configure --enable-parallel --enable-build-mode=production --enable-symbols=yes --prefix=$HDF5ROOT/library/install/ccio
```
87

88
Once the configuration completes, you can build and install:
89

90
91
92
```
make -j 16 install
```
93

94
Once the HDF5 library is built, it should be in `$HDF5_ROOT/library/install/ccio/lib/libhdf5.a`
95

96
To build the exerciser agains the CCIO version of HDF5, you will need to use the `ccio` installation location of HDF5 in the example below (by setting `HDF5_INSTALL_DIR=$HDF5_ROOT/library/install/ccio`).
97
98


99
### Building the Exerciser
100

101
The specific instructions here will assume that you have used the same directory structure as the optional instructions for using the CCIO branch of HDF5 (above). However, the makefile example can be used with any HDF5 installation location (`HDF5_INSTALL_DIR`).
102

103
Create and enter to the `exerciser` build directory:
104

105
106
107
108
109
110
111
112
```
cd $HDF5_ROOT/exerciser
mkdir run
mkdir ccio
cd ccio
cp $HDF5_ROOT/xgitlabrepos/BuildAndTest/Exerciser/THETA/Makefile.theta .
cp $HDF5_ROOT/xgitlabrepos/BuildAndTest/Exerciser/exerciser.c .
```
113

114
Now, change the `HDF5_INSTALL_DIR` variable in `Makefile.vesta.xl` to the desired HDF5 installation, and run make:
115

116
117
118
119
120
121
122
```
make -f Makefile.theta
```

This should generate the `hdf5Exerciser` executable.

### Running the Exerciser
123

124
125
126
127
128
For these specific instructions, we assume that you want to test the CCIO version of HDF5. First, go to the run directory and create a link to the CCIO-Exerciser executable:

```
cd $HDF5_ROOT/exerciser/run
ln -s  ../ccio/hdf5Exerciser hdf5Exerciser-ccio
129
```
130
131
132
133
134
135
136
137
138
139
140
141
142
143

Copy the example python submission script:

```
cp $HDF5_ROOT/xgitlabrepos/BuildAndTest/Exerciser/Common/run-example.py .
```

This script will setup and run a simple example with 8 aggregator ranks (set by `lfs_count`). Note that the `lfs_count` and `lfs_size` variables in the `run-example.py` script correspond to the number of aggregators and the size of the aggregators in the CCIO code, respectively. The also correspond to the LUSTRE stripe count and size for the test file that is generated (and deleted upon completion). To run the code on 32 nodes on Theta:

```
qsub -A datascience -t 30 -n 32 run-example.py --machine theta --exec ./hdf5Exerciser-ccio --ppn 16 --ccio
```

Note that I am using the `datascience` allocation (you should change this to whatever makes sense for you). Leave off the `--ccio` flag if you are not using the CCIO version of HDF5.