Commit 72bf8ea4 authored by Valentin Reis's avatar Valentin Reis

Initial commit.

parent cebb7e10
# benchmark-applications
This repository contains sample benchmark applications instrumented to interact with NRM.
It contains similar code as the previous "progress-benchmarks" repo, without the 600MB extra branches.
\ No newline at end of file
This repository contains sample benchmark applications instrumented to report progress to NRM through libNRM.
It contains similar code as the previous "progress-benchmarks" repo, without the 600MB extra branches.
# "simple" - contains a random walk and a dgemm
On KNL machines at ANL, the correct env vars are obtained via:
source /opt/intel/bin/compilervars.sh intel64
# "graph500 - contains the graph500 benchmark
*~
src/graph500_*
Copyright (c) 2011-2017 Graph500 Steering Committee
New code under University of Illinois/NCSA Open Source License
see license.txt or https://opensource.org/licenses/NCSA
====
Old code, including but not limited to generator code:
/* Copyright (C) 2009-2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
This diff is collapsed.
Graph500-3.0.0
Compiling should be pretty straightforward as long as you have a valid MPI-3 library loaded in your PATH.
There is no more OpenMP,Sequential and XMT versions of benchmark.
On single node you can run MPI code with reasonable performance.
To build binaries change directory to src and execute make.
If you are lucky four binaries would be built, two of which are of interest:
graph500_reference_bfs runs BFS kernel (and skips weight generation)
graph500_reference_bfs_sssp runs both BFS and SSSP kernels
Both binaries require one integer parameter which is scale of the graph.
Validation can be deactivated by specifying SKIP_VALIDATION=1 as an environment variable.
bfs_sssp binary would skip BFS part if SKIP_BFS=1 is present in your environment.
If you want to store/read generated graph from/to file use environment variables TMPFILE=<filename> and also REUSEFILE=1 to keep the file.
It's advised to use bfs_sssp binary to generate graph files as it generates both files of edges and weights (filename.weights)
bfs binary would only use/write edges file. And once bfs_sssp cant open weights file it would generate both files even if edges files is present.
N.B:
Current settings assume you are using powers of 2: total number of cores and number of cores per node.
It's possible to have non-power of two of nodes if you comment macro defined in common.h SIZE_MUST_BE_POWER_OF_TWO.
Be aware normally that will drop your performance by more then 20%.
If you want to use non-power of two processes per node, you should add -DPROCS_PER_NODE_NOT_POWER_OF_TWO to CFLAGS in src/Makefile,
this one will enable SIZE_MUST_BE_POWER_OF_TWO automatically.
AML = Active Messages Library
AML is an SPMD communication library built on top of MPI3 intented to be used in fine grain applications like Graph500
Two main goals of AML : user code clarity while delivering high performance through tricky internal implementation
It's targeted to support asynchronous small messages delivery
while having reasonable performance on modern multicore systems by
doing transparantly to user following
1. message coalescing
2. software routing on multicore systems
To enable both optimizations messages are delivered asynchronously.
To ensure delivery = an completion of handler executions on remote nodes collective barrier should be called.
Current version support only one-sided message (cannot send a response from active message handler)
but future version would support two-sided active messages.
For each process all delivered AMs are executed sequentially, so atomicity is guaranted and no locking required.
Progress of AM delivery is passive which means that handlers are executed inside library calls (aml_send and aml_barrier).
How to send messages:
1. call aml_init(..)
2. register handler of an active message whose prototype should be:
void handler(int fromPE,void* data,int dataSize)
where fromPE is sender's rank, data is pointer to message sent by sender and dataSize being size in bytes
registration is done using function aml_register_handler( handler, handlerid) where handlerid is integer in range [0..255]
3. send messages to other nodes using
aml_send(data,handlerid,dataSize,destPE)
where data is dataSize bytes of data to be sent to PE with rank destPE and to be processed by handler registered under handlerid
4. call collectively aml_barrier() which would not only synchronize all processes but also ensure that all active messages
sent prior to aml_barrier call are delivered (and requested handlers completed its execution) after exit from aml_barrier
5. call aml_finalize()
This diff is collapsed.
/* Copyright (c) 2011-2017 Graph500 Steering Committee
All rights reserved.
Developed by: Anton Korzh anton@korzh.us
Graph500 Steering Committee
http://www.graph500.org
New code under University of Illinois/NCSA Open Source License
see license.txt or https://opensource.org/licenses/NCSA
*/
#ifdef __cplusplus
extern "C" {
#endif
//MPI-like init,finalize calls
extern int aml_init(int *,char***);
extern void aml_finalize(void);
//barrier which ensures that all AM sent before the barrier are completed everywhere after the barrier
extern void aml_barrier( void );
//register active message function(collective call)
extern void aml_register_handler(void(*f)(int,void*,int),int n);
//send AM to another(myself is ok) node
//execution of AM might be delayed till next aml_barrier() call
extern void aml_send(void *srcaddr, int type,int length, int node );
// rank and size
extern int aml_my_pe( void );
extern int aml_n_pes( void );
#ifdef __cplusplus
}
#endif
#define my_pe aml_my_pe
#define num_pes aml_n_pes
#define aml_time() MPI_Wtime()
#define aml_long_allsum(p) MPI_Allreduce(MPI_IN_PLACE,p,1,MPI_LONG_LONG,MPI_SUM,MPI_COMM_WORLD)
#define aml_long_allmin(p) MPI_Allreduce(MPI_IN_PLACE,p,1,MPI_LONG_LONG,MPI_MIN,MPI_COMM_WORLD)
#define aml_long_allmax(p) MPI_Allreduce(MPI_IN_PLACE,p,1,MPI_LONG_LONG,MPI_MAX,MPI_COMM_WORLD)
Boost Software License - Version 1.0 - August 17th, 2003
Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:
The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
Graph500 Benchmark: Scalable Kronecker Graph Generator
Jeremiah Willcock and Andrew Lumsdaine
This directory contains a parallel Kronecker graph generator set up for the
requirements of the Graph 500 Search benchmark. The generator is designed to
produce reproducible results for a given graph size and seed, regardless of the
level or type of parallelism in use.
The file make_graph.h declares a simplified interface suitable for
benchmark implementations; this interface hides many of the parameters that are
fixed in the benchmark specification. Most compile-time settings for user
modification are in generator/user_settings.h and mpi/datatypes.h, with the
Kronecker parameters set at the top of generator/graph_generator.c.
Copyright (C) 2009-2011 The Trustees of Indiana University.
Use, modification and distribution is subject to the Boost Software License,
Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt)
/* Copyright (C) 2009-2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */
#include <stdlib.h>
#include <stdint.h>
#include <assert.h>
#ifndef __STDC_FORMAT_MACROS
#define __STDC_FORMAT_MACROS
#endif
#include <inttypes.h>
#include "user_settings.h"
#include "splittable_mrg.h"
#include "graph_generator.h"
/* Initiator settings: for faster random number generation, the initiator
* probabilities are defined as fractions (a = INITIATOR_A_NUMERATOR /
* INITIATOR_DENOMINATOR, b = c = INITIATOR_BC_NUMERATOR /
* INITIATOR_DENOMINATOR, d = 1 - a - b - c. */
#define INITIATOR_A_NUMERATOR 5700
#define INITIATOR_BC_NUMERATOR 1900
#define INITIATOR_DENOMINATOR 10000
/* If this macro is defined to a non-zero value, use SPK_NOISE_LEVEL /
* INITIATOR_DENOMINATOR as the noise parameter to use in introducing noise
* into the graph parameters. The approach used is from "A Hitchhiker's Guide
* to Choosing Parameters of Stochastic Kronecker Graphs" by C. Seshadhri, Ali
* Pinar, and Tamara G. Kolda (http://arxiv.org/abs/1102.5046v1), except that
* the adjustment here is chosen based on the current level being processed
* rather than being chosen randomly. */
#define SPK_NOISE_LEVEL 0
/* #define SPK_NOISE_LEVEL 1000 -- in INITIATOR_DENOMINATOR units */
static int generate_4way_bernoulli(mrg_state* st, int level, int nlevels) {
#if SPK_NOISE_LEVEL == 0
/* Avoid warnings */
(void)level;
(void)nlevels;
#endif
/* Generate a pseudorandom number in the range [0, INITIATOR_DENOMINATOR)
* without modulo bias. */
static const uint32_t limit = (UINT32_C(0x7FFFFFFF) % INITIATOR_DENOMINATOR);
uint32_t val = mrg_get_uint_orig(st);
if (/* Unlikely */ val < limit) {
do {
val = mrg_get_uint_orig(st);
} while (val < limit);
}
#if SPK_NOISE_LEVEL == 0
int spk_noise_factor = 0;
#else
int spk_noise_factor = 2 * SPK_NOISE_LEVEL * level / nlevels - SPK_NOISE_LEVEL;
#endif
unsigned int adjusted_bc_numerator = (unsigned int)(INITIATOR_BC_NUMERATOR + spk_noise_factor);
val %= INITIATOR_DENOMINATOR;
if (val < adjusted_bc_numerator) return 1;
val = (uint32_t)(val - adjusted_bc_numerator);
if (val < adjusted_bc_numerator) return 2;
val = (uint32_t)(val - adjusted_bc_numerator);
#if SPK_NOISE_LEVEL == 0
if (val < INITIATOR_A_NUMERATOR) return 0;
#else
if (val < INITIATOR_A_NUMERATOR * (INITIATOR_DENOMINATOR - 2 * INITIATOR_BC_NUMERATOR) / (INITIATOR_DENOMINATOR - 2 * adjusted_bc_numerator)) return 0;
#endif
#if SPK_NOISE_LEVEL == 0
/* Avoid warnings */
(void)level;
(void)nlevels;
#endif
return 3;
}
/* Reverse bits in a number; this should be optimized for performance
* (including using bit- or byte-reverse intrinsics if your platform has them).
* */
static inline uint64_t bitreverse(uint64_t x) {
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
#define USE_GCC_BYTESWAP /* __builtin_bswap* are in 4.3 but not 4.2 */
#endif
#ifdef FAST_64BIT_ARITHMETIC
/* 64-bit code */
#ifdef USE_GCC_BYTESWAP
x = __builtin_bswap64(x);
#else
x = (x >> 32) | (x << 32);
x = ((x >> 16) & UINT64_C(0x0000FFFF0000FFFF)) | ((x & UINT64_C(0x0000FFFF0000FFFF)) << 16);
x = ((x >> 8) & UINT64_C(0x00FF00FF00FF00FF)) | ((x & UINT64_C(0x00FF00FF00FF00FF)) << 8);
#endif
x = ((x >> 4) & UINT64_C(0x0F0F0F0F0F0F0F0F)) | ((x & UINT64_C(0x0F0F0F0F0F0F0F0F)) << 4);
x = ((x >> 2) & UINT64_C(0x3333333333333333)) | ((x & UINT64_C(0x3333333333333333)) << 2);
x = ((x >> 1) & UINT64_C(0x5555555555555555)) | ((x & UINT64_C(0x5555555555555555)) << 1);
return x;
#else
/* 32-bit code */
uint32_t h = (uint32_t)(x >> 32);
uint32_t l = (uint32_t)(x & UINT32_MAX);
#ifdef USE_GCC_BYTESWAP
h = __builtin_bswap32(h);
l = __builtin_bswap32(l);
#else
h = (h >> 16) | (h << 16);
l = (l >> 16) | (l << 16);
h = ((h >> 8) & UINT32_C(0x00FF00FF)) | ((h & UINT32_C(0x00FF00FF)) << 8);
l = ((l >> 8) & UINT32_C(0x00FF00FF)) | ((l & UINT32_C(0x00FF00FF)) << 8);
#endif
h = ((h >> 4) & UINT32_C(0x0F0F0F0F)) | ((h & UINT32_C(0x0F0F0F0F)) << 4);
l = ((l >> 4) & UINT32_C(0x0F0F0F0F)) | ((l & UINT32_C(0x0F0F0F0F)) << 4);
h = ((h >> 2) & UINT32_C(0x33333333)) | ((h & UINT32_C(0x33333333)) << 2);
l = ((l >> 2) & UINT32_C(0x33333333)) | ((l & UINT32_C(0x33333333)) << 2);
h = ((h >> 1) & UINT32_C(0x55555555)) | ((h & UINT32_C(0x55555555)) << 1);
l = ((l >> 1) & UINT32_C(0x55555555)) | ((l & UINT32_C(0x55555555)) << 1);
return ((uint64_t)l << 32) | h; /* Swap halves */
#endif
}
/* Apply a permutation to scramble vertex numbers; a randomly generated
* permutation is not used because applying it at scale is too expensive. */
static inline int64_t scramble(int64_t v0, int lgN, uint64_t val0, uint64_t val1) {
uint64_t v = (uint64_t)v0;
v += val0 + val1;
v *= (val0 | UINT64_C(0x4519840211493211));
v = (bitreverse(v) >> (64 - lgN));
assert ((v >> lgN) == 0);
v *= (val1 | UINT64_C(0x3050852102C843A5));
v = (bitreverse(v) >> (64 - lgN));
assert ((v >> lgN) == 0);
return (int64_t)v;
}
/* Make a single graph edge using a pre-set MRG state. */
static
void make_one_edge(int64_t nverts, int level, int lgN, mrg_state* st, packed_edge* result, uint64_t val0, uint64_t val1) {
int64_t base_src = 0, base_tgt = 0;
while (nverts > 1) {
int square = generate_4way_bernoulli(st, level, lgN);
int src_offset = square / 2;
int tgt_offset = square % 2;
assert (base_src <= base_tgt);
if (base_src == base_tgt) {
/* Clip-and-flip for undirected graph */
if (src_offset > tgt_offset) {
int temp = src_offset;
src_offset = tgt_offset;
tgt_offset = temp;
}
}
nverts /= 2;
++level;
base_src += nverts * src_offset;
base_tgt += nverts * tgt_offset;
}
write_edge(result,
scramble(base_src, lgN, val0, val1),
scramble(base_tgt, lgN, val0, val1));
}
/* Generate a range of edges (from start_edge to end_edge of the total graph),
* writing into elements [0, end_edge - start_edge) of the edges array. This
* code is parallel on OpenMP and XMT; it must be used with
* separately-implemented SPMD parallelism for MPI. */
void generate_kronecker_range(
const uint_fast32_t seed[5] /* All values in [0, 2^31 - 1), not all zero */,
int logN /* In base 2 */,
int64_t start_edge, int64_t end_edge,
packed_edge* edges
#ifdef SSSP
, float* weights
#endif
) {
mrg_state state;
int64_t nverts = (int64_t)1 << logN;
int64_t ei;
mrg_seed(&state, seed);
uint64_t val0, val1; /* Values for scrambling */
{
mrg_state new_state = state;
mrg_skip(&new_state, 50, 7, 0);
val0 = mrg_get_uint_orig(&new_state);
val0 *= UINT64_C(0xFFFFFFFF);
val0 += mrg_get_uint_orig(&new_state);
val1 = mrg_get_uint_orig(&new_state);
val1 *= UINT64_C(0xFFFFFFFF);
val1 += mrg_get_uint_orig(&new_state);
}
#ifdef _OPENMP
#pragma omp parallel for
#endif
#ifdef __MTA__
#pragma mta assert parallel
#pragma mta block schedule
#endif
for (ei = start_edge; ei < end_edge; ++ei) {
mrg_state new_state = state;
mrg_skip(&new_state, 0, (uint64_t)ei, 0);
make_one_edge(nverts, 0, logN, &new_state, edges + (ei - start_edge), val0, val1);
#ifdef SSSP
weights[ei-start_edge]=mrg_get_float_orig(&new_state);
#endif
}
}
/* Copyright (C) 2009-2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */
#ifndef GRAPH_GENERATOR_H
#define GRAPH_GENERATOR_H
#include "user_settings.h"
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#ifndef __STDC_FORMAT_MACROS
#define __STDC_FORMAT_MACROS
#endif
#include <inttypes.h>
#ifdef __cplusplus
extern "C" {
#endif
#ifdef GENERATOR_USE_PACKED_EDGE_TYPE
typedef struct packed_edge {
uint32_t v0_low;
uint32_t v1_low;
uint32_t high; /* v1 in high half, v0 in low half */
} packed_edge;
static inline int64_t get_v0_from_edge(const packed_edge* p) {
return (p->v0_low | ((int64_t)((int16_t)(p->high & 0xFFFF)) << 32));
}
static inline int64_t get_v1_from_edge(const packed_edge* p) {
return (p->v1_low | ((int64_t)((int16_t)(p->high >> 16)) << 32));
}
static inline void write_edge(packed_edge* p, int64_t v0, int64_t v1) {
p->v0_low = (uint32_t)v0;
p->v1_low = (uint32_t)v1;
p->high = (uint32_t)(((v0 >> 32) & 0xFFFF) | (((v1 >> 32) & 0xFFFF) << 16));
}
#else
typedef struct packed_edge {
int64_t v0;
int64_t v1;
} packed_edge;
static inline int64_t get_v0_from_edge(const packed_edge* p) {
return p->v0;
}
static inline int64_t get_v1_from_edge(const packed_edge* p) {
return p->v1;
}
static inline void write_edge(packed_edge* p, int64_t v0, int64_t v1) {
p->v0 = v0;
p->v1 = v1;
}
#endif
/* Generate a range of edges (from start_edge to end_edge of the total graph),
* writing into elements [0, end_edge - start_edge) of the edges array. This
* code is parallel on OpenMP and XMT; it must be used with
* separately-implemented SPMD parallelism for MPI. */
void generate_kronecker_range(
const uint_fast32_t seed[5] /* All values in [0, 2^31 - 1) */,
int logN /* In base 2 */,
int64_t start_edge, int64_t end_edge /* Indices (in [0, M)) for the edges to generate */,
packed_edge* edges /* Size >= end_edge - start_edge */
#ifdef SSSP
,float* weights
#endif
);
#ifdef __cplusplus
}
#endif
#endif /* GRAPH_GENERATOR_H */
/* Copyright (C) 2009-2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
#include <assert.h>
#include <math.h>
#ifdef __MTA__
#include <sys/mta_task.h>
#endif
#ifdef GRAPH_GENERATOR_OMP
#include <omp.h>
#endif
/* Simplified interface to build graphs with scrambled vertices. */
#include "graph_generator.h"
#include "utils.h"
#ifndef GRAPH_GENERATOR_MPI
void make_graph(int log_numverts, int64_t M, uint64_t userseed1, uint64_t userseed2, int64_t* nedges_ptr_in, packed_edge** result_ptr_in) {
/* Add restrict to input pointers. */
int64_t* restrict nedges_ptr = nedges_ptr_in;
packed_edge* restrict* restrict result_ptr = result_ptr_in;
/* Spread the two 64-bit numbers into five nonzero values in the correct
* range. */
uint_fast32_t seed[5];
make_mrg_seed(userseed1, userseed2, seed);
*nedges_ptr = M;
packed_edge* edges = (packed_edge*)xmalloc(M * sizeof(packed_edge));
*result_ptr = edges;
/* In OpenMP and XMT versions, the inner loop in generate_kronecker_range is
* parallel. */
generate_kronecker_range(seed, log_numverts, 0, M, edges);
}
#endif /* !GRAPH_GENERATOR_MPI */
/* PRNG interface for implementations; takes seed in same format as given by
* users, and creates a vector of doubles in a reproducible (and
* random-access) way. */
void make_random_numbers(
/* in */ int64_t nvalues /* Number of values to generate */,
/* in */ uint64_t userseed1 /* Arbitrary 64-bit seed value */,
/* in */ uint64_t userseed2 /* Arbitrary 64-bit seed value */,
/* in */ int64_t position /* Start index in random number stream */,
/* out */ double* result /* Returned array of values */
) {
int64_t i;
uint_fast32_t seed[5];
make_mrg_seed(userseed1, userseed2, seed);
mrg_state st;
mrg_seed(&st, seed);
mrg_skip(&st, 2, 0, 2 * (uint64_t)position); /* Each double takes two PRNG outputs */
for (i = 0; i < nvalues; ++i) {
result[i] = mrg_get_double_orig(&st);
}
}
/* Copyright (C) 2009-2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */
#ifndef MAKE_GRAPH_H
#define MAKE_GRAPH_H
#include <stdint.h>
#include "graph_generator.h"
#ifdef __cplusplus
extern "C" {
#endif
/* Simplified interface for users; implemented in different ways on different
* platforms. */
void make_graph(
/* in */ int log_numverts /* log_2 of vertex count */,
/* in */ int64_t desired_nedges /* Target number of edges */,
/* in */ uint64_t userseed1 /* Arbitrary 64-bit seed value */,
/* in */ uint64_t userseed2 /* Arbitrary 64-bit seed value */,
/* out */ int64_t* nedges /* Number of generated edges */,
/* out */ packed_edge** result /* Array of edges; allocated by
make_graph() but must be freed using
free() by user */
/* See functions in graph_generator.h for the definition of and how to
* manipulate packed_edge objects (functions are write_edge,
* get_v0_from_edge, get_v1_from_edge). */
);
/* PRNG interface for implementations; takes seed in same format as given by
* users, and creates a vector of doubles in a reproducible (and
* random-access) way. */
void make_random_numbers(
/* in */ int64_t nvalues /* Number of values to generate */,
/* in */ uint64_t userseed1 /* Arbitrary 64-bit seed value */,
/* in */ uint64_t userseed2 /* Arbitrary 64-bit seed value */,
/* in */ int64_t position /* Start index in random number stream */,
/* out */ double* result /* Returned array of values */
);
#ifdef __cplusplus
}
#endif
#endif /* MAKE_GRAPH_H */
/* Copyright (C) 2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */
#ifndef MOD_ARITH_H
#define MOD_ARITH_H
#include "user_settings.h"
/* Various modular arithmetic operations for modulus 2^31-1 (0x7FFFFFFF).
* These may need to be tweaked to get acceptable performance on some platforms
* (especially ones without conditional moves). */
/* This code is now just a dispatcher that chooses the right header file to use
* per-platform. */
#ifdef __MTA__
#include "mod_arith_xmt.h"
#else
#ifdef FAST_64BIT_ARITHMETIC
#include "mod_arith_64bit.h"
#else
#include "mod_arith_32bit.h"
#endif
#endif
#endif /* MOD_ARITH_H */
/* Copyright (C) 2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */
#ifndef MOD_ARITH_32BIT_H
#define MOD_ARITH_32BIT_H
#include <stdint.h>
#include <assert.h>
/* Various modular arithmetic operations for modulus 2^31-1 (0x7FFFFFFF).
* These may need to be tweaked to get acceptable performance on some platforms
* (especially ones without conditional moves). */
static inline uint_fast32_t mod_add(uint_fast32_t a, uint_fast32_t b) {
uint_fast32_t x;
assert (a <= 0x7FFFFFFE);
assert (b <= 0x7FFFFFFE);
#if 0
return (a + b) % 0x7FFFFFFF;
#else
x = a + b; /* x <= 0xFFFFFFFC */
x = (x >= 0x7FFFFFFF) ? (x - 0x7FFFFFFF) : x;
return x;
#endif
}
static inline uint_fast32_t mod_mul(uint_fast32_t a, uint_fast32_t b) {
uint_fast64_t temp;
uint_fast32_t temp2;
assert (a <= 0x7FFFFFFE);
assert (b <= 0x7FFFFFFE);
#if 0
return (uint_fast32_t)((uint_fast64_t)a * b % 0x7FFFFFFF);
#else
temp = (uint_fast64_t)a * b; /* temp <= 0x3FFFFFFE00000004 */
temp2 = (uint_fast32_t)(temp & 0x7FFFFFFF) + (uint_fast32_t)(temp >> 31); /* temp2 <= 0xFFFFFFFB */
return (temp2 >= 0x7FFFFFFF) ? (temp2 - 0x7FFFFFFF) : temp2;
#endif
}
static inline uint_fast32_t mod_mac(uint_fast32_t sum, uint_fast32_t a, uint_fast32_t b) {
uint_fast64_t temp;
uint_fast32_t temp2;
assert (sum <= 0x7FFFFFFE);
assert (a <= 0x7FFFFFFE);
assert (b <= 0x7FFFFFFE);
#if 0
return (uint_fast32_t)(((uint_fast64_t)a * b + sum) % 0x7FFFFFFF);
#else
temp = (uint_fast64_t)a * b + sum; /* temp <= 0x3FFFFFFE80000002 */
temp2 = (uint_fast32_t)(temp & 0x7FFFFFFF) + (uint_fast32_t)(temp >> 31); /* temp2 <= 0xFFFFFFFC */
return (temp2 >= 0x7FFFFFFF) ? (temp2 - 0x7FFFFFFF) : temp2;
#endif
}
<