Add a simple dma with cuda backend. A dma contains a cuda stream and a request queue. One dma can issue one type of request: device to host, host to device, or device to device.
This merge requests changes existing files:
- configure.ac: For efficiency cuda callbacks are used and require recent drivers. The critical function from latest runtime library is used to check for cuda support at compile time.
- layout cuda: A field device id is used. In case of a dma from device to device, we need to keep track of which device are
used to call underlying
cudaMemcpyPeerAsync()
. Also fix header guards name. - area cuda: Add missing guards in header.