- 20 Aug, 2018 1 commit
-
-
Swann Perarnau authored
We are going to need more of those flags, and keeping track of the conversions is tricky. So let's use a copy of the macros.
-
- 06 Aug, 2018 4 commits
-
-
Swann Perarnau authored
mbind requires that the input ptr be aligned on a page. NOTE: we could also figure out a way to ask jemalloc for page-aligned allocations, but that would probably be too much for each alloc.
-
Swann Perarnau authored
The way jemalloc handles big allocations can often result in surprising calls to mmap/mbind (splitting allocations, rounded up sizes). It also makes the path between an aml_alloc and mbind quite difficult to see. More worrying, if jemalloc reuses a previous allocation, the mbind will not be called again, which might result in the wrong binding happening. To fix those issues, we move the mbind logic to be around the allocations returned from jemalloc. This will ensure that we always bind properly. The only issue is that it might slow down allocations. It can also cause issues if the same arena is used by multiple areas, as allocations might be overlapping a page. We will move away from sharing arenas for benchmarks from now on.
-
Swann Perarnau authored
Fix dgemm_noprefetch to match pattern from @suchyb in #19. In order to do so we split our 2d tiling into column-major and row-major ones. Note that those are refering to the order of the tiles, not the internal data of a tile, as a tiling should be agnostic to it.
-
Swann Perarnau authored
Add a tiling representing a 2d array of contiguous tiles. Also add a ndims function to retrieve the dimensions in tiles of the tiling. It also became quite obvious that the iterators are useless right now. We should think about changing that.
-
- 30 Jul, 2018 1 commit
-
-
Kamil Iskra authored
-
- 25 Jul, 2018 2 commits
-
-
Implement a 2D tiling with continuous tiles in memory, with tiles organized in row-major order inside the virtual address range. Also adds functions to query the size of a tile inside the tiling.
-
Swann Perarnau authored
When a code using aml is also linking against jemalloc, errors can occur because we use the default jemk prefix for the aml specific jemalloc install. To fix these issues, we instead use a prefix aml-specific. Discovered when using mkl on a knl box.
-
- 05 Jul, 2018 1 commit
-
-
Swann Perarnau authored
Useful and currently missing.
-
- 30 Mar, 2018 2 commits
-
-
Swann Perarnau authored
We were unlocking the dma before the request type get set to a proper value, resulting in requests sometimes overlapping when multiple threads were used in benchmarks.
-
Swann Perarnau authored
When a user doesn't need a tile to be pushed back into the scratchpad, it is better to just `release` that tile instead. This is particularly useful for read-only data for applications that are bandwidth limited.
-
- 29 Mar, 2018 2 commits
-
-
Kamil Iskra authored
Also add documentation to two forgotten functions in the header file.
-
Note that several comments are still missing, specifically for the area's acquire()/release()/available() routines, the function of which is not clear to me.
-
- 28 Mar, 2018 7 commits
-
-
Swann Perarnau authored
Add mutex to make request creation and destruction thread-safe. As for scratch_seq, we need to deal both with requests and tiles during these functions, so we lock the entire section.
-
Swann Perarnau authored
Add mutex to make request creation and destruction thread-safe. As we need to deal both we requests and tiles during these functions, we lock the entire section.
-
Swann Perarnau authored
Add mutex to make request creation and destruction thread-safe. Same as dma_linux_seq, the changes are quite simple, as we only need to protect modifications to the requests array.
-
Swann Perarnau authored
Add a mutex to make request creation and destruction thread-safe. As the code here is quite simple, we only need to protect modifications to the request array.
-
Swann Perarnau authored
scratch_request_seq contains one extra tiling that is unnecessary. Remove it.
-
Swann Perarnau authored
The request type contains two much stuff, remove extra pointers to win some space.
-
Swann Perarnau authored
Add a scratchpad that creates one pthread per request, to call synchronous dma operations. The intent is to end up with a cross product of programming language support between dma and scratch: - scratch_par + dma_seq gives users parallel scratch requests - scratch_seq + dma_par gives users sequential access to parallel moves The two other options don't make as much sense though.
-
- 27 Mar, 2018 3 commits
-
-
Swann Perarnau authored
Replace custom code with generic vectors for the scratch implementation. In the process, fix a bug in the management of tiles, as they were being freed on pull completion, which is wrong.
-
Swann Perarnau authored
Use the newly introduced vector type to manage requests inside dmas. This cleans up the API a bit, and remove dubious ops from the dma internals.
-
Swann Perarnau authored
Add a generic vector type to the library, with some special features: - the elements are embedded in the vector, and not pointers - each element must include an int field that is used as a "key" - the element has a "null" value for its key, used to indicate that this element of the vector is null. - add/remove functions provide access to a new element/free it from the vector, but don't "destroy" it. - resize on add is exponential. This patch includes implementation and unit test.
-
- 26 Mar, 2018 2 commits
-
-
Swann Perarnau authored
Move the scratchpad tiles into an internal concern: - the scratchpad does the allocation - the scratchpad tracks available tiles internally - the user can ask for the scratch baseptr. This is necessary to abstract move-based scratchs, and to remove from the user responsibility of maintaining tiling and baseptr tracking. We still fail-hard when tiles are not available, and the design is not thread safe. But we are getting there.
-
Swann Perarnau authored
This is the initial implementation and validation of a scratchpad: a logic unit that handles tracking data across a "main" area and a "scratch" area. The API and internals will probably change again soon, as there's no clear way to implement a move based scratchpad on this one. Note that this implementation doesn't do any tracking, not really, and that's the next step.
-
- 23 Mar, 2018 5 commits
-
-
Kamil Iskra authored
Replace the unused "max" argument for file-based mappings with an offset argument (until now the offset was hardcoded to 0).
-
Swann Perarnau authored
Add working implementation of copy and move to dma_linux_par, and corresponding unit test.
-
Swann Perarnau authored
Fix a few typos in the dma_linux_seq code, that for some reason didn't raise any flags so far. Also add a small validation to the unit test.
-
Swann Perarnau authored
Add a dma that spawns a fixed amount of theads for each request created. The number of threads is configured at dma creation time.
-
Swann Perarnau authored
This patch refactors dma request types to remove generic function pointers from the library. This include modifying the linux_seq implementation to: - move the copy/move implementation to the dma ops - remove one layer of indirection, as the request type no longer need _data and _ops substructures. Enforcing dma requests to have a fully qualified generic type, with function pointers, will cause issues for future kinds of dma implementation, that might require a different way of handling requests altogether. This work is driven by our current work on a parallel dma implementation.
-
- 22 Mar, 2018 3 commits
-
-
Kamil Iskra authored
-
Kamil Iskra authored
-
Kamil Iskra authored
-
- 11 Mar, 2018 4 commits
-
-
Swann Perarnau authored
This patch adds the basics for a dma interface, including type-dependent requests structures, and an API based on explicit copy/move calls. The APIs is flexible enough to deal with sync/async calls. The internal design is inspired by aml_area, with the goal that create/init stay type specific, but the core interactions are generic.
-
Swann Perarnau authored
Using variable arguments on the tile id for retrieving tiling info makes the API difficult to use when more than one tile must be used at the same time. We change the API to use a tileid, with the assumption that any valuable tiling will be able to define a workable uuid scheme.
-
Swann Perarnau authored
A few missing declarations in aml.h, to make it easier to deal with the library.
-
Swann Perarnau authored
As we cannot find out in advance the binding an area uses, it is not possible to use a correctly allocated pointer to aml_area_binding. Fixes a segfault we observed outside of current unit-tests.
-
- 08 Mar, 2018 3 commits
-
-
Swann Perarnau authored
Allows memory movement logic to ask a target area how memory should be bound to it. Note that it would be safer in the long term to have areas take a binding at creation time, and translate to nodemasks internally.
-
Swann Perarnau authored
Still the same schema, although it looks a bit messier on linux because of all the options needed.
-
Swann Perarnau authored
Same schema as for arena, we create init functions for each type of area, to make sure that users know what they are working with. The functions are easy here, as posix is more an allocator than anything proper.
-