    We use new algorithms for RMA synchronization
    functions and RMA epochs. The old implementation
    uses a lazy-issuing algorithm, which queues up
    all operations and issues them at end. This
    forbid opportunites to do hardware RMA operations
    and can use up all memory resources when we
    queue up large number of operations.
    Here we use a new algorithm, which will initialize
    the synchonization at beginning, and issue operations
    as soon as the synchronization is finished.
