1. 06 Aug, 2018 14 commits
    • Swann Perarnau's avatar
      [fix] Force mbind on allocation from arena · 759ec35a
      Swann Perarnau authored
      The way jemalloc handles big allocations can often result in surprising
      calls to mmap/mbind (splitting allocations, rounded up sizes). It also
      makes the path between an aml_alloc and mbind quite difficult to see.
      More worrying, if jemalloc reuses a previous allocation, the mbind will
      not be called again, which might result in the wrong binding happening.
      To fix those issues, we move the mbind logic to be around the
      allocations returned from jemalloc. This will ensure that we always bind
      properly. The only issue is that it might slow down allocations.
      It can also cause issues if the same arena is used by multiple areas, as
      allocations might be overlapping a page. We will move away from sharing
      arenas for benchmarks from now on.
    • Kamil Iskra's avatar
      [feature/fix] Duplicate dgemm_nofetch.c changes · b808beb2
      Kamil Iskra authored
      Duplicate changes to dgemm_nofetch.c from commits
      6a0d1cbd and
    • Kamil Iskra's avatar
      [fix] Make offset/pointer variables thread-local · 9326c388
      Kamil Iskra authored
      We had a race condition where OpenMP threads were accidentally reusing
      the same variables on stack, resulting in races and incorrect results.
      The number of FP operations was probably correct, although the memory
      accesses to the arrays may have been wrong.
    • Kamil Iskra's avatar
      [feature] Respect tiling representation of arrays · 6a0d1cbd
      Kamil Iskra authored
      The A, B, and C matrices are tiled (tiles in A are also transposed).
      Add initialization code for A and B and conversion code for C that
      respects the tiling, thus enabling a direct comparison of results with
      mkl and vanilla.
    • Swann Perarnau's avatar
      [fix] Wrong variable/type to aml calls · b67dc368
      Swann Perarnau authored
      Benchmarks were crashing due to bad parameters.
    • Swann Perarnau's avatar
      [refactor] use new tilings for dgemm_prefetch · 47c021c8
      Swann Perarnau authored
      Use the new 2D tilings for dgemm_prefetch, also refactor the code to
      match the rest of the benchmarks. The code should be a lot more cleaner
    • Swann Perarnau's avatar
      [feature/fix] add column-major 2D tiling · 9764f3c6
      Swann Perarnau authored
      Fix dgemm_noprefetch to match pattern from @suchyb in #19.
      In order to do so we split our 2d tiling into column-major and
      row-major ones. Note that those are refering to the order of the tiles,
      not the internal data of a tile, as a tiling should be agnostic to it.
    • Swann Perarnau's avatar
      [refactor/fix] use proper tiling and tile order · f673af2e
      Swann Perarnau authored
      1. refactor overall main function to match intended benchmark interface.
      2. Use the new tiling type to cleanup the noprefetch version. Careful
      inspection unearthed some bad offset computations, which are fixed here.
      3. double checked the way we were spawning threads, new code should be
      I believe that code should be easier to read and to play with.
      Converting the prefetch versions might not be as easy.
    • Swann Perarnau's avatar
      [feature] add 2d tiling of contiguous tiles · 508c4695
      Swann Perarnau authored
      Add a tiling representing a 2d array of contiguous tiles. Also add a
      ndims function to retrieve the dimensions in tiles of the tiling.
      It also became quite obvious that the iterators are useless right now.
      We should think about changing that.
    • Swann Perarnau's avatar
      [feature] add regular dgemm with custom alloc · 0da98aca
      Swann Perarnau authored
      Just to be able to test various matrix placements.
    • Swann Perarnau's avatar
      [fix] remove exec flag from sources · 4f575f8c
      Swann Perarnau authored
    • Brian Suchy's avatar
      Removed unneeded line · 47e2be44
      Brian Suchy authored and Swann Perarnau's avatar Swann Perarnau committed
    • Swann Perarnau's avatar
      [refactor] improve mkl version for benchmarking · 75aaeff0
      Swann Perarnau authored
      Match intended benchmarking interface, including computing
      flops directly.
    • Swann Perarnau's avatar
      [fix] make sure CI runs fast on KNL · 9982d4e4
      Swann Perarnau authored
      Add parallel option to the compile step to speed things up.
  2. 30 Jul, 2018 1 commit
  3. 26 Jul, 2018 1 commit
  4. 25 Jul, 2018 3 commits
    • Brian Suchy's avatar
      [feature] add matrix multiply benchmarks · 2c452094
      Brian Suchy authored and Swann Perarnau's avatar Swann Perarnau committed
      Implement 3 dgemm benchmarks versions:
      - standard mkl code on total matrix
      - prefetch scheme based on UTK/INRIA/ANL on-going collaboration
      - same tiling but no prefetch version of the code
      This is the version ready for merging, after several weeks of work on
      independant branch. Further improvements to API/code will continue after
      it reaches master.
    • Brian Suchy's avatar
      [feature] add 2D tiling, additional methods. · a13ddad2
      Brian Suchy authored and Swann Perarnau's avatar Swann Perarnau committed
      Implement a 2D tiling with continuous tiles in memory, with tiles
      organized in row-major order inside the virtual address range.
      Also adds functions to query the size of a tile inside the tiling.
    • Swann Perarnau's avatar
      [fix] Avoid conflicts when jemalloc is used twice · c1ec7da8
      Swann Perarnau authored
      When a code using aml is also linking against jemalloc, errors can occur
      because we use the default jemk prefix for the aml specific jemalloc
      install. To fix these issues, we instead use a prefix aml-specific.
      Discovered when using mkl on a knl box.
  5. 24 Jul, 2018 2 commits
  6. 20 Jul, 2018 1 commit
    • Swann Perarnau's avatar
      [refactor] move functional tests, proper OpenMP · 51167d12
      Swann Perarnau authored
      We are starting to work on benchmarks to evaluate the usefulness of this
      library. Instead of integrating them into the testing infrastructure, it
      makes more sense for them to have their own directory and a different
      way of handling them.
      This patch:
       - creates a benchmark directory for actual codes that we want to use as
         benchmarks of our library.
       - moves functional tests into it.
       - add proper OpenMP detection for these codes
  7. 05 Jul, 2018 1 commit
  8. 02 Jul, 2018 3 commits
  9. 26 Jun, 2018 1 commit
  10. 23 May, 2018 1 commit
    • Swann Perarnau's avatar
      [feature] add gitlab CI pipeline · bce9e3ee
      Swann Perarnau authored
      Simple gitlab-ci config with a single step, running the full list of
      configure, make, make install and make check.
      No tags, no split build and test for now, as artifacts are a bit tricky
      to get right.
      This config should grow in the future to ensure that we run all the test
      on all the platforms we want.
  11. 30 Mar, 2018 6 commits
    • Swann Perarnau's avatar
      [fix] enforce type convertion, better pointer management · 54d0b418
      Swann Perarnau authored
      Fix small issues with type convertion across several tests,
      as a change of architecture might trigger bad behavior in
      variadic functions.
    • Swann Perarnau's avatar
      [fix] unlocks are too early in dma_linux_* · c445b498
      Swann Perarnau authored
      We were unlocking the dma before the request type get set to a
      proper value, resulting in requests sometimes overlapping when
      multiple threads were used in benchmarks.
    • Swann Perarnau's avatar
      [test] add openmp version of mt stream_add · 956d9453
      Swann Perarnau authored
      This is a second type of use for the scratchpad: a single master
      thread is responsible for launching all data movements, but the tiles
      are worked on in parallel. We support this model by using a sequential
      scratch on top of a parallel dma.
    • Swann Perarnau's avatar
      [refactor] add openmp version of stream_add_pth · 21d3724e
      Swann Perarnau authored
      Add openmp version of the previous functional test. We also rename them,
      to mark the fact that those two tests are designed to use a *single-thread*
      to run the kernel across an entire tile.
    • Swann Perarnau's avatar
      [refactor] make use of functional tests again · f47dc685
      Swann Perarnau authored
      This patch reintroduce the first functional test, a stream add
      implementation using pthreads for parallelism. We make use of our
      scratch_par implementation to implement a pipelined version of the
      application, where each worker thread is using its own batch of tiles,
      and migrating data asynchronously.
    • Swann Perarnau's avatar
      [feature] add function to release a scratch tile · 7260868d
      Swann Perarnau authored
      When a user doesn't need a tile to be pushed back into the scratchpad,
      it is better to just `release` that tile instead. This is particularly
      useful for read-only data for applications that are bandwidth limited.
  12. 29 Mar, 2018 2 commits
  13. 28 Mar, 2018 4 commits
    • Swann Perarnau's avatar
      [feature] make scratch_par thread-safe · 1e1f1ced
      Swann Perarnau authored
      Add mutex to make request creation and destruction thread-safe. As for
      scratch_seq, we need to deal both with requests and tiles during these
      functions, so we lock the entire section.
    • Swann Perarnau's avatar
      [feature] make scratch_seq thread-safe · cd9dba51
      Swann Perarnau authored
      Add mutex to make request creation and destruction thread-safe. As we
      need to deal both we requests and tiles during these functions, we lock
      the entire section.
    • Swann Perarnau's avatar
      [feature] make dma_linux_par thread-safe · 7a69c840
      Swann Perarnau authored
      Add mutex to make request creation and destruction thread-safe. Same as
      dma_linux_seq, the changes are quite simple, as we only need to protect
      modifications to the requests array.
    • Swann Perarnau's avatar
      [feature] make dma_linux_seq thread-safe · 9f2b685d
      Swann Perarnau authored
      Add a mutex to make request creation and destruction thread-safe. As the
      code here is quite simple, we only need to protect modifications to the
      request array.