Commit 07963878 authored by Matthieu Dorier's avatar Matthieu Dorier
Browse files

added documentation

parent 80099ac0
# Minimal makefile for Sphinx documentation
# You can set these variables from the command line.
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
.toggle {
padding-bottom: 12px;
.toggle .header {
display: block;
clear: both;
/* background-color:#f3f6f6;
box-sizing: border-box;
margin: 0;
padding: 12px 12px;
overflow: auto;
white-space: pre;
font-size: 14px;
line-height: 1.4; */
.toggle .header:after {
content: " ▶";
.toggle {
content: " ▼";
{% extends "!page.html" %}
{% block footer %}
<script type="text/javascript">
$(document).ready(function() {
$(".toggle > *").hide();
$(".toggle .header").show();
$(".toggle .header").click(function() {
{% endblock %}
Creating and accessing objects
# -*- coding: utf-8 -*-
# Configuration file for the Sphinx documentation builder.
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import sys
import os
sys.path.insert(0, os.path.abspath('..'))
# -- Project information -----------------------------------------------------
project = 'HEPnOS'
copyright = '2018-2020, Argonne National Laboratory'
author = 'Argonne National Laboratory'
# The short X.Y version
version = ''
# The full version, including alpha/beta/rc tags
release = ''
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
exclude_patterns = []
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
'logo_only': False
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
# html_sidebars = {}
#html_logo = ''
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'HEPnOSdoc'
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
# 'preamble': '',
# Latex figure (float) alignment
# 'figure_align': 'htbp',
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'HEPnOS.tex', 'HEPnOS Documentation',
'Argonne National Laboratory', 'manual'),
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'hepnos', 'HEPnOS Documentation',
[author], 1)
# -- Options for Texinfo output ----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'HEPnOS', 'HEPnOS Documentation',
author, 'HEPnOS', 'One line description of project.',
# -- Extension configuration -------------------------------------------------
# -- Options for todo extension ----------------------------------------------
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
def setup(app):
Client connection
Creating a configuration file
The first step before deploying HEPnOS is to create a configuration file.
This configuration file should look like the following.
.. code-block:: yaml
address: ofi+gni://
threads: 63
name: hepnos-datasets
path: /dev/shm
type: map
targets: 1
providers: 1
name: hepnos-runs
path: /dev/shm
type: map
targets: 1
providers: 1
name: hepnos-subruns
path: /dev/shm
type: map
targets: 1
providers: 1
name: hepnos-events
path: /dev/shm
type: map
targets: 1
providers: 1
name: hepnos-products
path: /dev/shm
type: map
targets: 1
providers: 1
The first field of the configuration is an address, or more precisely,
a protocol to use. Here *ofi+gni* indicates that libfabric should be
used with the Cray GNI backend. On a laptop or single node, the *na+sm*
(shared memory) protocol may be used. *ofi+tcp* may be used on Linux
clusters with traditional TCP networks.
The *threads* field indicates how many threads should be used by
HEPnOS on each node. Typically this value should be set to the number
of cores available, minus one core that is used to run the network
progress loop.
Then come five databases entries, respectively for DataSets, Runs,
SubRuns, Events, and Products. Each of these entries must have a name,
a path, a type, a number of targets per provider and a number of providers.
The name should be distinct for each database. The type of database can
be *map* (in memory database), *ldb* (LevelDB), or *bdb* (BerkeleyDB).
If *map* is used, the *path* is ignored since the database is stored in
memory. Otherwise, the path should point to a directory in a local
file system.
There is no real reason for changing the value of the *providers* entry
to anything else than 1 at the moment. However, changing the number of *targets*
may be useful to improve performance under heavy concurrency. "Target" is
another term for "database instance." Setting *targets* to a value greater
than 1 will make each node handle multiple databases.
.. important::
If the *providers* or *targets* fields are set to a value greater than 1,
the name of the database should include the $PROVIDER or $TARGET keys
respectively. These keys will be replaced with the provider number and
the target number.
.. important::
If multiple HEPnOS daemons are started on the same node, the $RANK
key should be used either in the database name or in the database
path. This $RANK key will be replaced with the MPI rank of the
Deploying HEPnOS on a single node
Simply ssh into the node where you want to run the HEPnOS service and type:
.. code-block:: console
hepnos-daemon config.yaml client.yaml
This tells HEPnOS to start and configure itself using the *config.yaml* file (written before).
HEPnOS will generate a *client.yaml* file that can be used for clients to connect to it.
The command will block. To run it as a daemon, put it in the background, use nohup, or
another other mechanism available on your platform.
Deploying HEPnOS on multiple nodes
The hepnos-daemon program is actually an MPI program that can be deployed on multiple nodes:
.. code-block:: console
mpirun -np N -f hostfile hepnos-daemon config.yaml client.yaml
Replacing N with the number of nodes and hostfile with the name of a file containing the list
of hosts on which to deploy HEPnOS.
.. Mochi documentation master file, created by
sphinx-quickstart on Thu Apr 11 10:32:12 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to the HEPnOS project
HEPnOS is an ephemeral, in-memory, distributed storage system for
high energy physics (HEP) workflows running on supercomputers.
It is based on software components fron the
`Mochi project <>`_
and was designed in the context of the
`SciDAC-4 "HEP on HPC" <>`_
collaboration between Argonne National Laboratory, and FermiLab.
This website gathers documentation and tutorials on how to install
it and use it.
.. toctree::
:maxdepth: 2
Indices and tables
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
The recommended way to install the HEPnOS and its dependencies
is to use `Spack <>`_.
Spack is a package management tool designed to support multiple
versions and configurations of software on a wide variety of
platforms and environments.
Installing Spack and the Mochi repository
First, you will need to install Spack as explained
`here <>`_.
Once Spack is installed and available in your path, clone the following
git reporitory and add it as a Spack namespace.
.. code-block:: console
git clone
spack repo add sds-repo
You can then check that Spack can find HEPnOS by typping:
.. code-block:: console
spack info hepnos
You should see something like the following.
.. code-block:: console
CMakePackage: hepnos
Object store for High Energy Physics, build around Mochi components
... (more lines follow) ...
Installing HEPnOS
Installing HEPnOS is then as simple as typping the following.
.. code-block:: console
spack install hepnos
Loading and using HEPnOS
Once installed, you can load HEPnOS using the following command.
.. code-block:: console
spack load -r hepnos
This will load HEPnOS and its dependencies (Mercury, Thallium, Argobots, etc.).
You are now ready to use HEPnOS!
Using the HEPnOS client library with cmake
Within a cmake project, HEPnOS can be found using:
.. code-block:: console
find_package(hepnos REQUIRED)
You can now link targets as follows.
.. code-block:: console
add_executable(my_hepnos_client source.c)
target_link_libraries(my_hepnos_client hepnos)
Using the HEPnOS client libraries with pkg-config
pkg-config is not yet supported.
Optimizing accesses
Concepts and data organization
HEPnOS handles data in a hierarchy of DataSets, Runs, SubRuns, and Events.
Each of these constructs can be used to store data objects, or Products.
**DataSets** are named containers. They can contain other DataSets,
as well as Runs. DataSet can be seen as the equivalent of file system
directories. While HEPnOS enables iterating over the DataSets stored in
a parent DataSet, it has not been deesigned to efficiently handle a large
number of them. Operations on a DataSet include creating a child DataSet,
creating a Run, iterating over the child DataSets, iterating over Runs,
searching for child DataSets by name and child Runs by run number.
**Runs** are numbered containers. They are identified by an integer between
0 and *InvalidRunNumber*, and can contain only SubRuns. Operations on a Run
include creating and accessing individual SubRuns, iterating over SubRuns,
and searching for specific SubRuns.
**SubRuns** are numbered containers. They are identified by an integer
between 0 and *InvalidSubRunNumber*-1, and can contain only Events.
Operations on a SubRun include creating and accessing individual Events,
iterating over events, and searching for specific Events.
**Events** are numbered containers. They are identified by an integer
between 0 and *InvalidEventNumber*-1. They may only be used to store
and load Products.
**Products** are *key/value* pairs where the *key* is formed of a string
label and the C++ type of the *value* object, while *value* is the data from
the stored C++ object. While Products can be stored in DataSets, Runs, SubRuns,
and Events, they are typically only stored in Events.
As the only type of named container, DataSets are a convenient way of
naming data coming out of an experiment or a step in a workflow.
Runs, SubRuns, and Events are stored in a way that optimizes search and
iterability in a distributed manner. A DataSet can be expected to store
a large number of runs themselves containing a large number of subruns
and ultimately events.
Products are stored in a way that does not make them iterable. It is
not possible, from a container, to list the contained Products. The
label and C++ type of a Product have to be known in order to retrieve
the corresponding Product data from a container.
Storing and retrieving products
Deploying and running on Theta
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment