[refactor] Rework code for better abstractions
This is a rewrite of the existing code into a memory library exposing more of its internal abstractions. This refactoring is required to: - make progress faster by focusing on the core new features - abstract more of the underlying components and expose those abstractions - build upon existing libraries (memkind) for the internal stuff. Memkind is used as a crutch here, we do not intend to use it in the long term, as some of its internal are opposed to what we want (topology management in particular). Nevertheless, it currently provides a good allocator internally, and decent access to deep memory, for now. Over time, we figured out that the best way to build this API was to create several layers of APIs, each with more abstractions over the devices. At the same time, we want each layer to expose its internal mechanisms, so that a user can customize any of them. This is why we end up with areas and dma engines, and we will add in the future other stuff, like data decomposition and distribution methods, as well as direct support for "pipelining".