Commit 14f2549a authored by Matthieu Dorier's avatar Matthieu Dorier
Browse files

filled documentation

parent e2bece0a
Accessing DataSets
==================
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/03_datasets/main.cpp
:language: cpp
Accessing Events
================
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/06_events/main.cpp
:language: cpp
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/07_events_from_datasets/main.cpp
:language: cpp
Creating and accessing Products
===============================
Accessing Runs
==============
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/04_runs/main.cpp
:language: cpp
......@@ -10,7 +10,17 @@ created by HEPnOS when starting up.
.. literalinclude:: ../../examples/01_init_shutdown/main.cpp
:language: cpp
The :code:`DataStore::shutdown()` method can be used to tell
HEPnOS to shutdown. This method should be called by only one client
and will terminate all the HEPnOS processes. If HEPnOS is setup to
use in-memory databases, you will loose all the data store in HEPnOS.
The :code:`DataStore::connect()` function may also take an additional
boolean parameter indicating whether to use a background thread for
network progress. Setting this value to :code:`true` can be useful
if the application relies on asynchronous operations (:code:`AsyncEngine`).
The :code:`DataStore::shutdown()` method can be used to tell the
HEPnOS service to shutdown.
.. important::
The :code:`DataStore::shutdown()` method should be called by only one
client and will terminate all the HEPnOS processes. If HEPnOS is setup to
use in-memory databases, you will loose all the data store in HEPnOS.
If multiple clients call this method, they will either block or fail,
depending on the network protocol used by HEPnOS.
Accessing DataSets
==================
The example code bellow show how to create DataSets inside other
DataSets, how to iterate over all the child datasets of a parent
DataSet, how to access a DataSet using an "absolute path" from
a parent DataSet, and how to search for DataSets.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/03_datasets/main.cpp
:language: cpp
The DataSet class presents an interface very similar to that
of an :code:`std::map<std::string,DataSet>`, providing users
with :code:`begin` and :code:`end` functions to get forward
iterators, as well as :code:`find`, :code:`lower_bound`, and
:code:`upper_bound` to search for DataSets.
DataSets are sorted in alphabetical order when iterating.
Accessing Events
================
Accessing from a SubRun
-----------------------
The example code bellow show how to create SubRuns inside
Runs, how to iterate over all the SubRuns in a
Run, how to access a SubRun from
a Run, and how to search for SubRuns.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/06_events/main.cpp
:language: cpp
The SubRun class presents an interface very similar to that
of an :code:`std::map<EventNumber,Event>`, providing users
with :code:`begin` and :code:`end` functions to get forward
iterators, as well as :code:`find`, :code:`lower_bound`, and
:code:`upper_bound` to search for specific Events.
Events are sorted in increasing order of event number.
Accessing from a DataSet
------------------------
Events are stored in SubRuns, hence they can be accessed
from their parent SubRuns, as show above. They can also be
accessed directly from their parent DataSet, providing a
more convenient way of iterating through them without
having to iterate through intermediate Run and SubRun levels.
The following example code shows how to use the
:code:`DataSet::events()` method to get an :code:`EventSet` object.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/07_events_from_dataset/main.cpp
:language: cpp
The EventSet object is a view of all the Events
inside a give DataSet. It provides :code:`begin` and
:code:`end` methods to iterate over the events.
The :code:`DataSet::events()` method can accept an integer
argument representing a given target number. The available
number of targets can be obtained using :code:`DataStore::numTargets()`,
passing :code:`ItemType::EVENT` to indicate that we are interested
in the number of targets that are used for storing events.
Passing such a target number to :code:`DataSet::events()`
will restrict the view of the resulting EventSet to the Events
stored in that target. This feature allows parallel programs
to have distinct processes interact with distinct targets.
Note the Events in an EventSet are not sorted lexicographically
by (run number, subrun number, event number). Rather, the EventSet
provides a number of guarantees on its ordering of Events:
* In an EventSet restricted to a single target, the Events are
sorted lexicographically by (run number, subrun number, event number).
* All the Events of a given SubRun are gathered in the same target,
hence an EventSet restricted to a single target will contain
*all* the Events of *a subset* of SubRuns of *a subset of Runs*.
* When iterating through an EventSet that is not restricted to a specific
target, we are guaranteed to see all the Events of a given SubRun before
another SubRun starts.
In the above sample program, iterating over the global EventSet yields
the same result as iterating over restricted EventSet by increasing
target number.
......@@ -27,11 +27,11 @@ Contents
organization.rst
deployment.rst
connection.rst
accessing_datasets.rst
accessing_runs.rst
accessing_subruns.rst
accessing_events.rst
accessing_products.rst
datasets.rst
runs.rst
subruns.rst
events.rst
products.rst
optimizations.rst
theta.rst
......
Optimizing accesses
===================
Creating and accessing millions of Runs, SubRuns, or Events
can hace a large performance impact. Hence, multiple optimizations
are available to speed them up.
Batching writes
---------------
The creation of Runs, SubRuns, and Events, as well as the storage
of data products can be batched. The following code sample illustrates
how to use the :code:`WriteBatch` object for this purpose.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/11_batching/main.cpp
:language: cpp
The WriteBatch object is initialized with a datastore. A second argument,
:code:`unsigned int max_batch_size` (which defaults to 128), can be provided
to indicate that at most this number of operations may be batched together.
When this number of operations have been added to the batch, the batch will
automatically flush its content. The WriteBatch can be flushed manually
using :code:`WriteBatch::flush()`, and any remaining operations will be
flushed automatically when the WriteBatch goes out of scope.
The WriteBatch object can be passed to :code:`DataSet::createRun`,
:code:`Run::createSubRun`, :code:`SubRun::createEvent`, as well
as all the :code:`store` methods.
Prefetching reads
-----------------
Prefetching is a common technique to speed up read accesses. Used alone,
the Prefetcher class will read batches of items when iterating through a
container. The following code sample examplifies its use.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/12_prefetching/main.cpp
:language: cpp
The Prefetcher object is initialized with a DataStore instance,
and may also be passed a :code:`unsigned int cache_size` and
:code:`unsigned int batch_size`. The cache size is the maximum
number of items that can be prefetched and stored in the prefetcher's cache.
The batch size is the number of items that are requested from the underlying
DataStore in a single operation.
A Prefetcher instance can be passed to most functions from the
RunSet, Run, and SubRun classes that return an iterator. This iterator
will then use the Prefetcher when iterating through the container.
The syntax illustrated above, passing the subrun to the
:code:`Prefetcher::operator()()` method, shows a simple way of enabling
prefetching in a modern C++ style for loop.
By default, a Prefetcher will not prefetch products. To enable prefetching
products as well, the :code:`Prefetcher::fetchProduct<T>(label)` can be
used. This method tells the Prefetcher to prefetch products of type T
with the specified label as the iteration goes on. The :code:`load` function
that is used to load the product then needs to take the prefetcher instance
as first argument so that it looks in the prefetcher's cache first rather
than the datastore.
.. important::
The prefetching is enabled for a given product/label, it is expected
that the client program consumes the prefetched product by calling
:code:`load`. If it does not, the prefetcher's memory will fill up
with prefetched products that are never consumed.
Using asynchronous operations
-----------------------------
Most of the operations on Runs, SubRuns, and Events,
as well as Prefetcher and WriteBatch, can be turned
asynchronous simply by using an :code:`AsyncEngine`
instance. The following code examplifies how.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/13_async/main.cpp
:language: cpp
The AsyncEngine object is initialized with a DataStore instance
and a number of threads to spawn. Note that using 0 threads is perfectly
fine since the AsyncEngine turns all communication operations into non-blocking
operations, the lack of background threads will not prevent the AsyncEngine
from being able to make some amount of progress in the background.
The AsyncEngine object can be passed to :code:`DataSet::createRun`,
:code:`Run::createSubRun`, :code:`SubRun::createEvent`, as well
as all the :code:`store` methods. When used, these operations will
be queued in the AsyncEngine and eventually execute in the background.
The AsyncEngine instance can also be passed to the constructor of
WriteBatch and Prefetcher. When used with a WriteBatch, the AsyncEngine
will continually take operations from the WriteBatch, batch them, and
execute them. Hence the batches issued by the AsyncEngine may be smaller
than the maximum batch size of the WritBatch object.
When used with a Prefetcher, the Prefetcher will not long prefetch
batches of objects, it will do so asynchronously using the AsyncEngine's
threads.
Creating and accessing Products
===============================
DataSets, Runs, SubRuns, and Events can store *Products*.
A Product is an instance of a any C++ object. Since the mechanism
for storing and loading products is the same when using DataSets,
Runs, SubRuns, and Events, the following code sample illustrates
only how to store products in events.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/08_load_store/main.cpp
:language: cpp
In this example, we want to store instances of the Particle class.
For this, we need to provide a serialization function for Boost
to use when serializing the object into storage.
We then use the :code:`Event::store()` method to store the
desired object into the event. This method takes a *label* as
a first argument. The pair *(label, product type)* uniquely
addresses a product inside an event. It is not possible to
overwrite an existing product. Hence multiple products of
the same type may be stored in the same event using different
labels. The same label may be used to store products of
different types in the same event.
The second part of the example shows how to use the vector
storage interface. In this example, the :code:`Event::store`
function is used to store a sub-vector of the vector *v*,
from index 1 (included) to index 3 (excluded). The type
of product stored by this way is :code:`std::vector<Particle>`.
Hence it can be reloaded into a :code:`std::vector<Particle>`
later on.
Accessing Runs
==============
The example code bellow show how to create Runs inside
DataSets, how to iterate over all the runs in a
DataSet, how to access a Run from
a parent DataSet, and how to search for Runs.
.. container:: toggle
.. container:: header
.. container:: btn btn-info
main.cpp (show/hide)
.. literalinclude:: ../../examples/04_runs/main.cpp
:language: cpp
The Runs in a DataSets can be accessed using the :code:`DataSet::runs()`
method, which produces a :code:`RunSet` object. A :code:`RunSet` is
a view of the DataSet for the purpose of accessing Runs.
The RunSet class presents an interface very similar to that
of an :code:`std::map<RunNumber,Run>`, providing users
with :code:`begin` and :code:`end` functions to get forward
iterators, as well as :code:`find`, :code:`lower_bound`, and
:code:`upper_bound` to search for specific Runs.
Runs are sorted in increasing order of run number.
Accessing SubRuns
=================
The example code bellow show how to create SubRuns inside
Runs, how to iterate over all the SubRuns in a
Run, how to access a SubRun from
a Run, and how to search for SubRuns.
.. container:: toggle
.. container:: header
......@@ -11,3 +16,10 @@ Accessing SubRuns
.. literalinclude:: ../../examples/05_subruns/main.cpp
:language: cpp
The Run class presents an interface very similar to that
of an :code:`std::map<SubRunNumber,SubRun>`, providing users
with :code:`begin` and :code:`end` functions to get forward
iterators, as well as :code:`find`, :code:`lower_bound`, and
:code:`upper_bound` to search for specific SubRuns.
SubRuns are sorted in increasing order of subrun number.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment