add per-OST metric tabulation to Lustre module
Right now the Lustre module retains the OSTs over which each file is striped, but it does not track any more detailed metrics such as
- number of bytes read/written to each OST
- number of ops issued to each OST
A recent post to the lustre-discuss mailing list pointed out that the FIEMAP
ioctl, issued from any Lustre client, will return the exact OST corresponding to a file and an offset. Andreas went on to describe the placement algorithm as a simple three-step process:
- fetch file layout via llapi_layout_get_by_path() or similar
- stripe_index = (logical file offset / stripe_size) % stripe_count
- OST index = llapi_layout_ost_index_get(layout, stripe_index)
This would allow us to tell if certain OSTs are receiving a large re-read or modify workload. There are two major drawbacks though:
- We would have to scope this very carefully. It may be possible to track a large chunk of the POSIX module counters on a per-file, per-OST basis which would be a tremendous amount of data. We would have to make choices as to which POSIX counters are worth tracking at such fine granularity, and which ones aren't.
- The placement algorithm might change in future versions of Lustre, which would cause Darshan to report reasonable-looking but wrong data. The
llapi_*
calls are designed explicitly to work around this, but we would need to carefully measure the overheads of using these over the standard ioctls.
This isn't a high priority, but it'd give us a tremendous amount of insight for data-intensive applications that do a lot of modify-in-place or re-read.