Commit 6f68e6f0 authored by Matthieu Dorier's avatar Matthieu Dorier

finished README.md

parent a41c528e
# FlameStore
FlameStore is a storage system specifically designed to store
Keras models in the context of the CANDLE research workflows.
It is based on the [Mochi](https://www.mcs.anl.gov/research/projects/mochi/)
Keras models in the context of the [CANDLE](http://candle.cels.anl.gov/)
research workflows. It is based on the
[Mochi](https://www.mcs.anl.gov/research/projects/mochi/)
components developed at Argonne National Laboratory.
## Overview of FlameStore
......@@ -16,11 +17,13 @@ which are (potentially large) Numpy arrays.
FlameStore is composed of a _Master_ process (also called _Manager_)
and some _Worker_ processes. The Master stores metadata and takes decisions
regarding where to store new models, or whether some models should be
persisted or discarded. The Workers offer storage spaces to store layers.
persisted or discarded. The Workers offer storage spaces to store
if deep neural networks layers as well as input datasets (training,
test, and validation sets).
## Installing
FlameStore itself is purely written in Python. However It depends
FlameStore itself is written in Python. However It depends
on the Mochi components and their Python wrappers. It also depends
on Keras and on the Python HDF5 package. The best way to get all
of these dependencies is to use [spack](https://spack.io/).
......@@ -34,7 +37,7 @@ cd sds-repo
spack repo add .
```
You are now ready to install FlameStore by doing:
You are now ready to install FlameStore by entering:
```
spack install flamestore
......@@ -43,8 +46,10 @@ spack install flamestore
Note that two heavy dependencies of FlameStore are Python and Boost.
If you have them installed on your platform already, you can follow
[this tutorial](https://spack.readthedocs.io/en/latest/getting_started.html#system-packages)
to tell spack about them. Make sure that boost has been compiled with
Boost.Python and Boost.Numpy support, otherwise FlameStore will not work.
to tell spack about them. **Make sure that boost has been compiled with
Boost.Python and Boost.Numpy support, that Python has its library installed
(e.g., python-dev package on Debian), and Numpy installed, otherwise
FlameStore will not work.**
Once spack has finished installing flamestore and all its dependencies,
you can load them into your environment by calling the following:
......@@ -62,14 +67,13 @@ flamestore -h
You should get the help message of the `flamestore` program.
## Using FlameStore
### FlameStore Workspace
To use FlameStore, we first need to create a Workspace, that is,
a folder containing input data, Keras models, and configuration
files. To create a workspace, simply type the following:
files. To create a workspace, type the following:
```
flamestore create --name myworkspace
......@@ -77,68 +81,191 @@ flamestore create --name myworkspace
This will create a _myworkspace_ directory containing some
subdirectories and a config.json configuration file.
You can also use the `--path` parameter to specify in which
directory the workspace should be created, and `--override`
You can also use the `--path` parameter to specify the directory
in which the workspace should be created (by default it will create
it in the current working directory), and `--override`
to indicate that, should a workspace already exist, it will
be deleted first.
be deleted first (should a directory with the same name exist but
is not a FlameStore workspace, this option will not attempt to
remove it and fail safely).
The workspace contains two folders.
* _input_: this folder will contain your input data.
* _models_: this folder will contain persisted Keras models
in the form of HDF5 files.
We will see in the following how to use these folders.
### Starting up FlameStore
FlameStore can be started in three ways.
* **Standalone mode:** processes (master and workers) are started
FlameStore can be started in four ways.
* **Singe machine mode:** one process is created that acts as both
a master and a worker. This is useful for debugging on a local
workstation.
* **Statically distributed mode:** all processes are started at the
same time as a single MPI program. This mode is useful to start
FlameStore independently of the application(s) that use it. However the
number of processes running it will not be able to change over time.
* **Dynamically distributed mode:** processes (master and workers) are started
independently. When starting, workers attach to the master, which
can start using them. This mode is useful to start FlameStore
independently of the application(s) that use it and to enable elasticity
(workers can be added and removed).
* **MPI mode:** all processes are started at the same time as a single
MPI program. This mode is also useful to start FlameStore independently
of the application(s) that use it. However, while it provides an easier
way to start it, it cannot be elastic anymore.
* **Embedded mode:** a Python application can deploy FlameStore
processes by itself by importing the flamestore package and by creating
instances of the FlameStoreMaster and FlameStoreWorker classes.
by itself by importing the flamestore package and by creating
instances of the `FlameStoreMaster` and `FlameStoreWorker` classes.
This is useful if the life time of the service is tied to the application
that uses it.
#### Standalone mode
The following sections go through each deployment method in detail.
#### Single machine mode
In your terminal, type the following:
```
flamstore run --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```
This will run a process that encapsulates both a master and a worker
instance. Note that you can omit the `--path` argument if you want
flamestore to pick the current working directory.
#### Statically distributed mode
You can use _mpirun_ to start flamestore in distributed mode:
```
mpirun -np X flamestore run --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```
By default, this will run a master on process rank 0, and workers on
process ranks 0 to X-1 (so rank 0 acts both as a master and a worker).
TODO
If you want rank 0 to act only as a master, use the `--master` flag.
Just like in single machine mode, you can omit the `--path` argument
if you want flamestore to pick the current working directory.
#### MPI mode
#### Dynamically distributed mode
TODO
Assume we have multiple hosts machines _host1_ to _hostN_. We want _host1_
to run the FlameStore master process and the other hosts to run the workers.
For the sake of this guide, we assume two machines _host1_ and _host2_.
In host1, run the following command to start the master:
```
flamestore run --master --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```
Once the master has started, in all the other hosts, use the following command:
```
flamestore run --worker --name=<nameOfYourWorkspace> --path=<pathToYourWorkspace>
```
Note that if you omit the `--master` flag when running the master, _host1_
will run both a master and a worker.
Note also that (although this isn't very useful) you can start the master
individually and then start all workers as an MPI program by providing
the `--worker` flag when deploying flamestore with MPI.
#### Embedded mode
TODO
The embedded mode is different from the other modes. It enables a Python program
to start a FlameStore master or a FlameStore worker. Embedded mode can be coupled
with the other modes. For instance, you can deploy the master using a command line
above, and deploy the workers in embedded mode.
To start a master in embedded mode, the following Python code can be used:
```python
import flamestore as fs
master = fs.FlameStoreMaster(path='abc')
master.wait_for_finalize()
```
Where 'abc' is the path to the workspace.
Note that this will block the Python code as it now runs the FlameStore master
instance.
To start a worker in embedded mode, the following Python code can be used:
```python
import flamestore as fs
worker = fs.FlameStoreWorker(path='abc')
worker.connect()
worker.wait_for_finalize()
```
Again, 'abc' is the path to the workspace, and this will block the Python
program as it runs the FlameStore worker.
Note that the `worker.connect()` call will work only if a master instance
is already running (otherwise there is nothing to connect to).
To start both a master and a worker instance on the same Python program,
use the following:
```python
import flamestore as fs
master = fs.FlameStoreMaster(path='abc')
mid = master.get_margo_instance()
worker = fs.FlameStoreWorker(path='abc', mid=mid)
worker.connect()
mid.wait_for_finalize()
```
### Accessing FlameStore from an application
The following example shows how to create a Workspace and store/load
Keras model to/from it. A more complete example can be found in the
The following example shows how to create a WorkspaceHandle to
access the workspace and store/load Keras model to/from it.
A more complete example can be found in the
_test_ directory of the source, which trains a 7-layer CNN model on
the MNIST dataset, stores the resuling model, reloads it, and checks
that the reloaded model performs the same way as the original one.
```python
from flamestore import Workspace
from flamestore import WorkspaceHandle
from keras.models import Sequential
from keras.layers import Dense
ws = Workspace("/tmp")
ws = WorkspaceHandle("abc")
model = Sequential()
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
ws.store("mymodel", model)
ws.store_model("mymodel", model)
reloaded_model = ws.load_model("mymodel")
```
The following example show how to store and load input datasets.
```python
from flamestore import WorkspaceHandle
ws = WorkspaceHandle("abc")
myarray = np.random.randn(5,7)
# Simply storing without attaching metadata
ws.store_input("myinput", myarray)
_, myarray2 = ws.load_input("myinput")
# Storing with metadata attached
info = { 'mymetadata' : 'something', 'othermetadata' : 'somethingelse' }
ws.store_input("myinput_with_info", myarray, info)
stored_info , myarray3 = ws.load_input("myinput_with_info")
reloaded_model = ws.load("mymodel")
# Shutting down the remote service
# you want to do this only if you really want to shut down,
# since all the stored data will be lost
ws.shutdown()
```
import logging
import sys
# Custom formatter
class FlameStoreFormatter(logging.Formatter):
err_fmt = "\033[91m[ERROR]\033[0m %(asctime)-15s %(msg)s"
dbg_fmt = "\033[94m[DEBUG]\033[0m %(asctime)-15s %(module)s: %(lineno)d: %(msg)s"
info_fmt = "\033[92m[INFO]\033[0m %(asctime)-15s %(msg)s"
critic_fmt = "\033[91m[FATAL]\033[0m %(asctime)-15s %(msg)s"
warn_fmt = "\033[93m[WARN]\033[0m %(asctime)-15s %(msg)s"
def __init__(self, fmt="%(levelno)s: %(msg)s"):
logging.Formatter.__init__(self, fmt)
def format(self, record):
# Save the original format configured by the user
# when the logger formatter was instantiated
format_orig = self._fmt
# Replace the original format with one customized by logging level
if record.levelno == logging.DEBUG:
self._fmt = FlameStoreFormatter.dbg_fmt
elif record.levelno == logging.INFO:
self._fmt = FlameStoreFormatter.info_fmt
elif record.levelno == logging.WARNING:
self._fmt = FlameStoreFormatter.warn_fmt
elif record.levelno == logging.ERROR:
self._fmt = FlameStoreFormatter.err_fmt
elif record.levelno == logging.CRITICAL:
self._fmt = FlameStoreFormatter.critic_fmt
# Call the original formatter class to do the grunt work
result = logging.Formatter.format(self, record)
# Restore the original format configured by the user
self._fmt = format_orig
return result
def static_vars(**kwargs):
def decorate(func):
for k in kwargs:
setattr(func, k, kwargs[k])
return func
return decorate
@static_vars(counter=0)
def init_logging(level):
if(init_logging.counter == 0) :
fmt = FlameStoreFormatter()
hdlr = logging.StreamHandler(sys.stdout)
hdlr.setFormatter(fmt)
logging.root.addHandler(hdlr)
logging.root.setLevel(level)
init_logging.counter += 1
def get_logger():
return logging.getLogger('flamestore')
......@@ -2,5 +2,5 @@ import sys
sys.path.append('.')
import flamestore as fs
master = fs.FlameStoreMaster()
master = fs.FlameStoreMaster(path='abc')
master.wait_for_finalize()
......@@ -2,6 +2,6 @@ import sys
sys.path.append('.')
import flamestore as fs
worker = fs.FlameStoreWorker()
worker = fs.FlameStoreWorker(path='abc')
worker.connect()
worker.wait_for_finalize()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment