improve performance measurement methodology
Right now there is a fair amount of work involved in computing the time spent performing I/O on a given process, and certain use scenarios (ie, multithreaded concurrent I/O) can obfuscate the calculation.
We should add explicit support for measuring I/O time per process at least, and maybe also directly calculate a performance estimate at runtime.