request queue should be per LP
The triton rebuild simulator includes a request queue (queued_reqs_head etc.) that queues up requests that haven't been serviced yet. This allows requests to stall waiting for send buffer availability.
The queue is currently global per MPI process, though, when it should have been local to each LP. This could lead to the wrong LP servicing a request. I don't think any safety checks would catch this condition, either. It may lead to inflated performance numbers?