    Originally we only allows LOCK request to be piggybacked
    with small RMA operations (all data can be fit in packet
    header). This brings communication overhead for larger
    operations since origin side needs to wait for the LOCK
    ACK before it can transmit data to the target.
    In this patch we add support of piggybacking LOCK with
    RMA operations with arbitrary size. Note that (1) this
    only works with basic datatypes; (2) if the LOCK cannot
    be satisfied, we temporarily buffer this operation on
    the target side.
