Commits (3)
......@@ -6,3 +6,4 @@ Nicolas Denoyelle <ndenoyelle@anl.gov>
Clement Foyer <cfoyer@cray.com>
Brice Videau <bvideau@anl.gov>
Aleksandr Danilin <danilin96@gmail.com>
Florence Monna <fmonna@anl.gov>
......@@ -21,8 +21,8 @@ blocks*, to develop explicit memory and data management policies. AML goals
* **composability**: application developers and performance experts should be
to pick and choose which building blocks to use depending on their specific
able to pick and choose which building blocks to use depending on their
specific needs.
* **flexibility**: users should be able to customize, replace, or change the
configuration of each building block as much as possible.
......@@ -35,7 +35,7 @@ As of now, AML implements the following abstractions:
* :doc:`Areas <pages/areas>`, a set of addressable physical memories,
* :doc:`Layout <pages/layout>`, a description of data structures organization,
* :doc:`Tilings <pages/tilings>`, (soon to be replaced),
* :doc:`Tilings <pages/tilings>`, a description of data decomposition (soon to be replaced),
* :doc:`DMAs <pages/dmas>`, an engine to asynchronously move data structures between areas,
* :doc:`Scratchpads <pages/scratchs>`, a stage-in, stage-out abstraction for prefetching.
......@@ -76,7 +76,7 @@ Installation
Include aml header:
Include AML header:
.. code-block:: c
......@@ -93,7 +93,7 @@ Check AML version:
return 1;
Initialize and Cleanup AML:
Initialize and cleanup AML:
.. code-block:: c
......@@ -106,8 +106,8 @@ Initialize and Cleanup AML:
Link your program with *-laml*.
See above building blocks specific pages for further examples and information
on library features.
See the above pages on specific building blocks for further examples and
information on library features.
Area Linux Implementation API
Area Linux Implementation
This is the Linux implementation of AML areas.
This building block relies on the libnuma implementation and linux mmap / munmap
to provide mmap/munmap on the NUMA host processor memory.
New areas may be created to allocate a specific subset of memories.
This building block also include a static declaration of a default initialized
area that can be used out of the box with the abstract area API.
Using built-in feature of linux areas:
We allocate data accessible by several processes with the same address, spread
across all CPU memories (using linux interleave policy)
.. codeblock:: c
// include ..
struct aml_area* area;
aml_area_linux_create(&area, AML_AREA_LINUX_MMAP_FLAG_SHARED, NULL,
// When work is done with this area, free resources associated with it
Integrating new feature in a new area implementation with some linux features:
You need an area feature not integrated in AML, but you want to work with AML
features around areas.
You can extend the features of linux area and reimplement a custom
implementation of mmap and munmap functions with
additional fields.
.. codeblock:: c
// include ..
// declaration of data field used in generic areas
struct aml_area_data {
// uses features of linux areas
struct aml_area_linux_data linux_data;
// implements additional features
void* my_data;
// create your struct my_area_data with custom linux settings
struct aml_area_data {
.linux_data = {
.nodeset = NULL,
.my_data = whatever_floats_your_boat,
} my_area_data;
// implements mmap using linux area features and custom features
void* my_mmap(const struct aml_area_data* data, void* ptr, size_t size){
program_data = aml_area_linux_mmap(data->linux_data, ptr, size);
aml_area_linux_mbind(data->linux_data, program_data, size);
// additional work we wnat to do on top of area linux work
whatever_shark(data->my_data, program_data, size);
return program_data;
// same for munmap
int* my_munmap(cont struct aml_area_data* data, void* ptr, size_t size);
// builds your custom area
struct aml_area_ops {
.mmap = my_mmap,
.munmap = my_munmap,
} my_area_ops;
struct aml_area {
.ops = my_area_ops,
.data = my_area_data,
} my_area;
void* program_data = aml_area_mmap(&my_area, NULL, size);
And now you can call the generic API on your area.
Area Linux API
.. doxygengroup:: aml_area_linux
Areas: Addressable Physical Memories
AML areas represent places where data can belong.
In shared memory systems, locality is a major concern for performance.
Being able to query memory from specific places is of major interest to achieve
this goal.
AML areas provide mmap / munmap low level functions to query memory from
specific places materialized as areas.
Available area implementations dictate the way such places can be arranged and
with which properties.
.. image:: img/area.png
"Illustration of areas on a complex system."
An AML area is an implementation of memory operations for several type of
devices through a consistent abstraction.
This abstraction is meant to be implemented for several kind of devices, i.e.
the same function calls allocate different kinds of devices depending on the
area implementation provided.
With the high level API, you can:
* Use an area to allocate space for your data
* Release the data in this area
Let's look how these operations can be done in a C program.
.. codeblock:: c
#include <aml.h>
#include <aml/area/linux.h>
int main(){
void* data = aml_area_mmap(&aml_area_linux, s);
aml_area_munmap(data, s);
We start by importing the AML interface, as well as the area implementation we
want to use.
We then proceed to allocate space for the data of size s using the default from
the AML Linux implementation.
The data will be only visible by this process and bound to the CPU with the
default linux allocation policy.
Finally, when the work is done with data, we free it:
Area API
It is important to notice that the functions provided through the Area API are
low-level functions and are not optimized for performance as allocators are.
.. doxygengroup:: aml_area
Aware users may create or modify implementation by assembling appropriate
operations in an aml_area_ops structure.
The linux implementation is go to for using simple areas on NUMA CPUs with
linux operating system.
There is an ongoing work on hwloc, CUDA and OpenCL areas.
Let's look at an example of a dynamic creation of a linux area identical to the
static default aml_area_linux:
.. codeblock:: c
#include <aml.h>
#include <aml/area/linux.h>
int main(){
struct aml_area* area;
aml_area_linux_create(&area, AML_AREA_LINUX_MMAP_FLAG_PRIVATE, NULL,
.. toctree::
Layout: Description of Data Organization
A layout describes how contiguous elements of a flat memory address space are
organized into a multidimensional array of fixed-size elements.
The abstraction provides functions to build layouts, access elements, reshape a
layout, or subset a layout.
A layout is characterized by:
* A pointer to the data it describes
* A set of dimensions on which data spans.
* A stride in between elements of a dimension.
* A pitch indicating the space between contiguous elements of a dimension.
The figure below describes a 2D layout with a sub-layout (obtained with
aml_layout_slice()) operation.
The sub-layout has a stride of 1 element along the second dimension.
The slice has an offset of 1 element along the same dimension, and its pitch is
the pitch of the original layout.
Calling aml_layout_deref() on this sublayout with appropriate coordinates will
return a pointer to elements noted (coor_x, coord_y).
.. image:: img/layout.png
"2D layout with a 2D slice."
Access to specific elements of a layout can be done with the aml_layout_deref()
Access to an element is always done relatively to the dimension order set by at
creation time.
However, internally, the library will store dimensions from the last dimension
to the first dimension such that elements along the first dimension are
contiguous in memory.
This order is defined with the value AML_LAYOUT_ORDER_FORTRAN.
Therefore, AML provides access to elements without the overhead of user order
choice through function suffixed with "native".
The layout abstraction also provides a function to reshape data with a different
set of dimensions.
A reshaped layout will access the same data but with different coordinates as
pictured in the figure below.
.. image:: img/reshape.png
"2D layout turned into a 3D layout."
Let's look at a problem where layouts can be quite useful: DGEMM in multiple
Let's say you want to multiply matrix A (size [m, k]) with matrix B
(size [k, n]) to get matrix C (size [m, n]).
The naive matrix multiplication algorithm should look something like:
.. code:: c
for (i = 0; i < m; i++){
for (j = 0; j < n; j++){
cij = C[i*n + j];
for (l = 0; l < k; l++)
cij += A[i*n + l] * B[l*n + j];
C[i*n + j] = cij;
Unfortunately this algorithm does not have a great runtime complexity...
We can then have 3 nested loops running on blocks of the matrices.
With several sizes of memory, we want to lverage the power of using blocks of
different sizes.
Let's take an algorithm with three levels of granularity.
The first level is focused on fitting our blocks in the smallest cache.
We compute a block of C of size [mr, nr] noted C_r using a block of
A of size [mr, kb] noted A_r, and a block of B of size [kb, nr] noted B_r.
A_r is stored in column major order while C_r and B_r are stored in row major
order, allowing us to read A_r row by row, and go with B_r and C_r column by
.. code:: c
for (i = 0; i < m_r; i++){
for (j = 0; j < n_r; j++){
for (l = 0; l < k_b; l++)
C_r[i][j] += A_r[i][l] + B_r[l][j];
These are our smallest blocks.
The implementation at this level is simply doing the multiplication at a level
where is fast enough.
B_r blocks need to be transposed before they can be accessed column by column.
The second level is when the matrices are so big that you need a second
We then use blocks of intermediate size.
We compute a block of C of size [mb, n] noted C_b using a block
of A of size [mb, kb] noted A_b, and a block of B of size [kb, n] noted B_b.
To be efficient, A_b is stored as mb/mr consecutive blocks of size [mr, kb]
(A_r) in column major order while C_b is stored as (mb/mr)*(n/nr) blocks of
size [mr, nr] (C_r) in row major order and B_b is stored as n/nr blocks of size
[kb, nr] (B_r) in row major order.
This means we need to have Ab laid out as a 3-dimensional array mr x kb x (mb/mr),
B as nr x kb x (n/nr), C with 4 dimensions as nr x mr x (mb/mr) x (n/nr).
The last level uses the actual matrices, of any size.
The original matrices are C of size [m, n], A of size [m, k] and B of size
[k, n].
The layout used here are: C is stored as m/mb blocks of C_b, A is stored as
(k/kb) * (m/mb) blocks of A_b and B is stored as k/kb blocks of B_b.
This means we need to rework A to be laid out in 5 dimensions as
mr x kb x mb/mr x m/mb x k/kb,
B in 4 dimensions as nr x kb x n/nr x k/kb,
C in 5 dimensions as nr x mr x mb/mr x n/nr x m/mb
High level API
.. doxygengroup:: aml_layout
Tilings: Decomposing Data
Tiling is a representation of data structures as arrays.
An AML tiling structure can be defined as a multi-dimensional grid of data,
like a matrix, a stencil, etc...
Tilings are used in AML as a description of a macro data structure that will be
used by a library to do its own work.
This structure is exploitable by AML to perform optimized movement operations.
You can think of a tiling as 1D or 2D contiguous array.
The tiles in the structure can be of custom size and AML provides iterators to
easily access tile elements.
The 1D type tiling is a regular linear tiling with uniform tile sizes.
The 2D type tiling is a 2 dimensional cartesian tiling with uniform tile sizes,
that can be stored in two different orders, rowmajor and columnmajor.
With the tiling API, you can create and destroy a tiling.
You can also perform some operations over a tiling.
You can create and destroy an iterator, access the indexing, size of tiles or
their tiling dimensions.
Tiling High Level API
.. doxygengroup:: aml_tiling
There are so far two implementations for the AML tiling, in 1D and in 2D:
.. toctree::
This diff is collapsed.
......@@ -15,15 +15,6 @@
* @defgroup aml_area_linux "AML Linux Areas"
* @brief Linux Implementation of Areas.
* Linux implementation of AML areas.
* This building block relies on libnuma implementation and
* linux mmap/munmap to provide mmap/munmap on NUMA host
* host processor memory. New areas may be created
* to allocate a specific subset of memories.
* This building block also include a static declaration of
* a default initialized area that can be used out of the box with
* abstract area API.
* #include <aml/area/linux.h>
* @{
......@@ -85,7 +76,7 @@ struct aml_area_linux_mmap_options {
* \brief Linux area creation.
* Allocate and initialize a struct aml_area implemented by aml_area_linux
* Allocates and initializes a struct aml_area implemented by aml_area_linux
* operations.
* @param[out] area pointer to an uninitialized struct aml_area pointer to
* receive the new area.
......@@ -107,7 +98,7 @@ int aml_area_linux_create(struct aml_area **area,
* \brief Linux area destruction.
* Destroy (finalize and free resources) a struct aml_area created by
* Destroys (finalizes and frees resources) a struct aml_area created by
* aml_area_linux_create().
* @param area is NULL after this call.
......@@ -115,13 +106,13 @@ int aml_area_linux_create(struct aml_area **area,
void aml_area_linux_destroy(struct aml_area **area);
* Bind memory of size "size" pointed by "ptr" to binding set in "bind".
* Binds memory of size "size" pointed by "ptr" to binding set in "bind".
* If mbind call was not successfull, i.e AML_FAILURE is returned, then errno
* should be inspected for further error checking.
* @param bind: The binding settings. mmap_flags is actually unused.
* @param ptr: The data to bind.
* @param size: The size of the data pointed by ptr.
* @return an AML error code.
* @return An AML error code.
aml_area_linux_mbind(struct aml_area_linux_data *bind,
......@@ -129,8 +120,8 @@ aml_area_linux_mbind(struct aml_area_linux_data *bind,
size_t size);
* Function to check whether binding of a ptr obtained with
* aml_area_linux_mmap() then aml_area_linux_mbind() match area settings.
* Function to check whether the binding of a ptr obtained with
* aml_area_linux_mmap() and aml_area_linux_mbind() matches the area settings.
* @param area_data: The expected binding settings.
* @param ptr: The data supposely bound.
* @param size: The data size.
......@@ -145,7 +136,7 @@ aml_area_linux_check_binding(struct aml_area_linux_data *area_data,
* \brief mmap block for aml area.
* This function is a wrapper on mmap function using arguments set in
* This function is a wrapper on the mmap function using arguments set in the
* mmap_flags of area_data.
* This function does not perform binding, unlike it is done in areas created
* with aml_area_linux_create().