[feature] add matrix multiply benchmarks

Implement 3 dgemm benchmarks versions:
- standard mkl code on total matrix
- prefetch scheme based on UTK/INRIA/ANL on-going collaboration
- same tiling but no prefetch version of the code

This is the version ready for merging, after several weeks of work on
independant branch. Further improvements to API/code will continue after
it reaches master.
2 jobs for icc-benchs in 4 minutes and 54 seconds (queued for 1 second)
Status Job ID Name Coverage
  Build
passed #29336
make:generic

00:02:50

passed #29337
make:knl

00:04:54