README.md 10.3 KB
Newer Older
Matthieu Dorier's avatar
Matthieu Dorier committed
1 2
# FlameStore

Matthieu Dorier's avatar
Matthieu Dorier committed
3 4 5 6
> Important: this project has been deprecated. A new version of FlameStore is
> currently under development.


Matthieu Dorier's avatar
Matthieu Dorier committed
7
FlameStore is a storage system specifically designed to store
Matthieu Dorier's avatar
Matthieu Dorier committed
8 9 10
Keras models in the context of the [CANDLE](http://candle.cels.anl.gov/) 
research workflows. It is based on the 
[Mochi](https://www.mcs.anl.gov/research/projects/mochi/)
Matthieu Dorier's avatar
Matthieu Dorier committed
11 12
components developed at Argonne National Laboratory.

Matthieu Dorier's avatar
Matthieu Dorier committed
13 14 15 16 17 18 19 20 21 22 23
## Overview of FlameStore

FlameStore is an in-memory distributed storage service meant to
store and keep track of Keras models (i.e. deep neural networks).
These models are composed of an _architecture_ (which we call _metadata_)
that can be represented in JSON format, and a set of _layers_,
which are (potentially large) Numpy arrays.

FlameStore is composed of a _Master_ process (also called _Manager_)
and some _Worker_ processes. The Master stores metadata and takes decisions
regarding where to store new models, or whether some models should be
Matthieu Dorier's avatar
Matthieu Dorier committed
24 25 26
persisted or discarded. The Workers offer storage spaces to store 
if deep neural networks layers as well as input datasets (training,
test, and validation sets).
Matthieu Dorier's avatar
Matthieu Dorier committed
27

Matthieu Dorier's avatar
Matthieu Dorier committed
28 29
## Installing

Matthieu Dorier's avatar
Matthieu Dorier committed
30
FlameStore itself is written in Python. However It depends
Matthieu Dorier's avatar
Matthieu Dorier committed
31
on the Mochi components and their Python wrappers. It also depends
Matthieu Dorier's avatar
Matthieu Dorier committed
32 33
on Keras and on the Python HDF5 package. The best way to get all
of these dependencies is to use [spack](https://spack.io/).
Matthieu Dorier's avatar
Matthieu Dorier committed
34

Matthieu Dorier's avatar
Matthieu Dorier committed
35 36
Once you have spack installed and setup, clone the `sds-repo`
repository and add it for spack to use:
Matthieu Dorier's avatar
Matthieu Dorier committed
37

Matthieu Dorier's avatar
Matthieu Dorier committed
38 39 40 41 42 43
```
git clone https://xgitlab.cels.anl.gov/sds/sds-repo.git
cd sds-repo
spack repo add .
```

Matthieu Dorier's avatar
Matthieu Dorier committed
44
You are now ready to install FlameStore by entering:
Matthieu Dorier's avatar
Matthieu Dorier committed
45 46

```
Matthieu Dorier's avatar
Matthieu Dorier committed
47 48 49 50 51 52
spack install flamestore
```

Note that two heavy dependencies of FlameStore are Python and Boost.
If you have them installed on your platform already, you can follow
[this tutorial](https://spack.readthedocs.io/en/latest/getting_started.html#system-packages) 
Matthieu Dorier's avatar
Matthieu Dorier committed
53 54 55 56
to tell spack about them. **Make sure that boost has been compiled with
Boost.Python and Boost.Numpy support, that Python has its library installed
(e.g., python-dev package on Debian), and Numpy installed, otherwise 
FlameStore will not work.**
Matthieu Dorier's avatar
Matthieu Dorier committed
57 58 59 60 61 62

Once spack has finished installing flamestore and all its dependencies,
you can load them into your environment by calling the following:

```
source <(spack module loads  -m tcl --dependencies flamestore)
Matthieu Dorier's avatar
Matthieu Dorier committed
63 64 65
```

You can check that the installation worked by typping `import flamestore`
Matthieu Dorier's avatar
Matthieu Dorier committed
66 67 68 69 70 71 72 73
in a Python interpreter. Alternatively if you type

```
flamestore -h
```

You should get the help message of the `flamestore` program.

Matthieu Dorier's avatar
Matthieu Dorier committed
74 75
## Using FlameStore

Matthieu Dorier's avatar
Matthieu Dorier committed
76 77 78 79
### FlameStore Workspace

To use FlameStore, we first need to create a Workspace, that is,
a folder containing input data, Keras models, and configuration
Matthieu Dorier's avatar
Matthieu Dorier committed
80
files. To create a workspace, type the following:
Matthieu Dorier's avatar
Matthieu Dorier committed
81 82 83 84 85 86 87

```
flamestore create --name myworkspace
```
This will create a _myworkspace_ directory containing some
subdirectories and a config.json configuration file.

Matthieu Dorier's avatar
Matthieu Dorier committed
88 89 90
You can also use the `--path` parameter to specify the directory
in which the workspace should be created (by default it will create
it in the current working directory), and `--override`
Matthieu Dorier's avatar
Matthieu Dorier committed
91
to indicate that, should a workspace already exist, it will
Matthieu Dorier's avatar
Matthieu Dorier committed
92 93 94
be deleted first (should a directory with the same name exist but
is not a FlameStore workspace, this option will not attempt to
remove it and fail safely).
Matthieu Dorier's avatar
Matthieu Dorier committed
95 96 97 98 99 100

The workspace contains two folders.
  * _input_: this folder will contain your input data.
  * _models_: this folder will contain persisted Keras models
  in the form of HDF5 files.

Matthieu Dorier's avatar
Matthieu Dorier committed
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
### Configuring the workspace

Once the workspace is created, there are a few things you can tune.
Open the _config.json_ file that the workspace contains. You should
see something like this:

```json
{
  "worker": {
    "provider_id": 1
  },
  "manager": {
    "provider_id": 1,
    "controller": {
      "config": {},
      "class": "flamestore.controllers.DefaultController"
    }
  },
  "bake": {
    "provider_id": 1,
    "targets": [
      {
         "info": null,
	 "name": "/dev/shm/flamestore.%{rank}.dat",
	 "size": 200
      }
    ]
  },
  "protocol": "tcp"
}
```

One thing you may want to change is the name and size 
of the target file in the `bake.targets` array.
This file is where the data will be stored in worker. Typically,
this should be the path to a file on a local disk or on a memory
device (here for instance `/dev/shm`). The size of the file is
expressed in MB.

Note that the `%{rank}` token will be replaced by the rank of
the process in its `MPI_COMM_WORLD`. You can also use `%{time}`,
which will be replaced with the number of seconds since epoch
(useful to make sure a new target file is created at every run).

**Important:** right now these targets are not deleted when
the workers terminate. You need to manually remove them.

Matthieu Dorier's avatar
Matthieu Dorier committed
148 149
### Starting up FlameStore

Matthieu Dorier's avatar
Matthieu Dorier committed
150 151 152 153 154 155 156 157 158
FlameStore can be started in four ways.
  * **Singe machine mode:** one process is created that acts as both
  a master and a worker. This is useful for debugging on a local
  workstation.
  * **Statically distributed mode:** all processes are started at the 
  same time as a single MPI program. This mode is useful to start
  FlameStore independently of the application(s) that use it. However the
  number of processes running it will not be able to change over time.
  * **Dynamically distributed mode:** processes (master and workers) are started
Matthieu Dorier's avatar
Matthieu Dorier committed
159 160 161 162 163
  independently. When starting, workers attach to the master, which
  can start using them. This mode is useful to start FlameStore
  independently of the application(s) that use it and to enable elasticity
  (workers can be added and removed).
  * **Embedded mode:** a Python application can deploy FlameStore
Matthieu Dorier's avatar
Matthieu Dorier committed
164 165
  by itself by importing the flamestore package and by creating
  instances of the `FlameStoreMaster` and `FlameStoreWorker` classes.
Matthieu Dorier's avatar
Matthieu Dorier committed
166 167 168
  This is useful if the life time of the service is tied to the application
  that uses it.

Matthieu Dorier's avatar
Matthieu Dorier committed
169 170 171 172 173 174 175
The following sections go through each deployment method in detail.

#### Single machine mode

In your terminal, type the following:

```
Matthieu Dorier's avatar
Matthieu Dorier committed
176
flamestore run --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
Matthieu Dorier's avatar
Matthieu Dorier committed
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192
```

This will run a process that encapsulates both a master and a worker
instance. Note that you can omit the `--path` argument if you want
flamestore to pick the current working directory.

#### Statically distributed mode

You can use _mpirun_ to start flamestore in distributed mode:

```
mpirun -np X flamestore run --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```

By default, this will run a master on process rank 0, and workers on
process ranks 0 to X-1 (so rank 0 acts both as a master and a worker).
Matthieu Dorier's avatar
Matthieu Dorier committed
193

Matthieu Dorier's avatar
Matthieu Dorier committed
194
If you want rank 0 to act only as a master, use the `--master` flag.
Matthieu Dorier's avatar
Matthieu Dorier committed
195

Matthieu Dorier's avatar
Matthieu Dorier committed
196 197
Just like in single machine mode, you can omit the `--path` argument 
if you want flamestore to pick the current working directory.
Matthieu Dorier's avatar
Matthieu Dorier committed
198

Matthieu Dorier's avatar
Matthieu Dorier committed
199
#### Dynamically distributed mode
Matthieu Dorier's avatar
Matthieu Dorier committed
200

Matthieu Dorier's avatar
Matthieu Dorier committed
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
Assume we have multiple hosts machines _host1_ to _hostN_. We want _host1_
to run the FlameStore master process and the other hosts to run the workers.
For the sake of this guide, we assume two machines _host1_ and _host2_.

In host1, run the following command to start the master:

```
flamestore run --master --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```

Once the master has started, in all the other hosts, use the following command:

```
flamestore run --worker --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```

Note that if you omit the `--master` flag when running the master, _host1_
will run both a master and a worker.

Note also that (although this isn't very useful) you can start the master
individually and then start all workers as an MPI program by providing
the `--worker` flag when deploying flamestore with MPI.
Matthieu Dorier's avatar
Matthieu Dorier committed
223 224 225

#### Embedded mode

Matthieu Dorier's avatar
Matthieu Dorier committed
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271
The embedded mode is different from the other modes. It enables a Python program
to start a FlameStore master or a FlameStore worker. Embedded mode can be coupled
with the other modes. For instance, you can deploy the master using a command line
above, and deploy the workers in embedded mode.

To start a master in embedded mode, the following Python code can be used:

```python
import flamestore as fs

master = fs.FlameStoreMaster(path='abc')
master.wait_for_finalize()
```

Where 'abc' is the path to the workspace.

Note that this will block the Python code as it now runs the FlameStore master
instance.

To start a worker in embedded mode, the following Python code can be used:

```python
import flamestore as fs

worker = fs.FlameStoreWorker(path='abc')
worker.connect()
worker.wait_for_finalize()
```

Again, 'abc' is the path to the workspace, and this will block the Python
program as it runs the FlameStore worker.
Note that the `worker.connect()` call will work only if a master instance
is already running (otherwise there is nothing to connect to).

To start both a master and a worker instance on the same Python program,
use the following:

```python
import flamestore as fs

master = fs.FlameStoreMaster(path='abc')
mid = master.get_margo_instance()
worker = fs.FlameStoreWorker(path='abc', mid=mid)
worker.connect()
mid.wait_for_finalize()
```
Matthieu Dorier's avatar
Matthieu Dorier committed
272 273 274

### Accessing FlameStore from an application

Matthieu Dorier's avatar
Matthieu Dorier committed
275 276 277
The following example shows how to create a WorkspaceHandle to
access the workspace and store/load Keras model to/from it.
A more complete example can be found in the
Matthieu Dorier's avatar
Matthieu Dorier committed
278 279 280 281 282
_test_ directory of the source, which trains a 7-layer CNN model on 
the MNIST dataset, stores the resuling model, reloads it, and checks
that the reloaded model performs the same way as the original one.

```python
Matthieu Dorier's avatar
Matthieu Dorier committed
283
from flamestore import WorkspaceHandle
Matthieu Dorier's avatar
Matthieu Dorier committed
284 285 286
from keras.models import Sequential
from keras.layers import Dense

Matthieu Dorier's avatar
Matthieu Dorier committed
287
ws = WorkspaceHandle("abc")
Matthieu Dorier's avatar
Matthieu Dorier committed
288 289 290 291 292

model = Sequential()
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

Matthieu Dorier's avatar
Matthieu Dorier committed
293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316
ws.store_model("mymodel", model)

reloaded_model = ws.load_model("mymodel")
```

The following example show how to store and load input datasets.

```python
from flamestore import WorkspaceHandle

ws = WorkspaceHandle("abc")
myarray = np.random.randn(5,7)

# Simply storing without attaching metadata

ws.store_input("myinput", myarray)
_, myarray2 = ws.load_input("myinput")

# Storing with metadata attached

info = { 'mymetadata' : 'something', 'othermetadata' : 'somethingelse' }
ws.store_input("myinput_with_info", myarray, info)

stored_info , myarray3 = ws.load_input("myinput_with_info")
Matthieu Dorier's avatar
Matthieu Dorier committed
317

Matthieu Dorier's avatar
Matthieu Dorier committed
318 319 320 321
# Shutting down the remote service
#   you want to do this only if you really want to shut down,
#   since all the stored data will be lost
ws.shutdown()
Matthieu Dorier's avatar
Matthieu Dorier committed
322
```