    Originally, do_accumulate_op() is used to perform the ACC
    computation on target between data from origin side and
    data on the target window. It requires that the target side
    must first unpack the received origin data into the same data
    layout as the target data before calling this function, which
    may consume potentially large of memory.
    This patch fixes do_accumulate_op() function in the following
    (1) It requires that the origin data passed to the function
    must be "in a packed manner", which means it looks as if all
    basic type elements in the origin data is placed one by one.
    Note that the origin data is not necessarily contiguous, since
    we may use non-contiguous basic type. If the basic type
    is contiguous, then the origin data must be contiguous.
    (2) It adds a new function argument, stream_offset, which
    specifies a starting location in the target data. This allows
    the origin data to work with partial of target data with stream
    Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
