  01 Nov, 2014 2 commits
      Bug-fix: avoid free NULL pointer in RMA. · 72a1e6f8
      req->dev.user_buf points to the data sent from origin process
      to target process, and for FOP sometimes it points to the IMMED
      area in packet header when data can be fit in packet header.
      In such case, we should not free req->dev.user_buf in final
      request handler since that data area will be freed by the
      runtime when packet header is freed.
      In this patch we initialize user_buf to NULL when creating the
      request, and set it to NULL when FOP is completed, and avoid free
      a NULL pointer in final request handler.
      Signed-off-by: Min Si <msi@il.is.s.u-tokyo.ac.jp>
      Bug-fix: always waiting for remote completion in Win_unlock. · c76aa786
      The original implementation includes an optimization which
      allows Win_unlock for exclusive lock to return without
      waiting for remote completion. This relys on the
      assumption that window memory on target process will not
      be accessed by a third party until that target process
      finishes all RMA operations and grants the lock to other
      processes. However, this assumption is not correct if user
      uses assert MPI_MODE_NOCHECK. Consider the following code:
                P0                              P1           P2
          MPI_Win_lock(P1, NULL, exclusive);
          MPI_Win_unlock(P1, exclusive);
          MPI_Send (P2);                                MPI_Recv(P0);
                                                        MPI_Win_lock(P1, MODE_NOCHECK, exclusive);
                                                        MPI_Win_unlock(P1, exclusive);
      Both P0 and P2 issue exclusive lock to P1, and P2 uses assert
      MPI_MODE_NOCHECK because the lock should be granted to P2 after
      synchronization between P2 and P0. However, in the original
      implementation, GET operation on P2 might not get the updated
      value since Win_unlock on P0 return without waiting for remote
      In this patch we delete this optimization. In Win_free, since every
      Win_unlock guarantees the remote completion, target process no
      longer needs to do additional counting works to detect target-side
      completion, but only needs to do a global barrier.
      Signed-off-by: Pavan Balaji <balaji@anl.gov>
      Fix completion on target side in Active Target synchronization. · aa36f043
      For Active Target synchronization, the original implementation
      does not guarantee the completion of all ops on target side
      when Win_wait / Win_fence returns. It is implemented using a
      counter, which is decremented when the last operation from that
      origin finishes. Win_wait / Win_fence waits until that counter
      reaches zero. Problem is that, when the last operation finishes,
      the previous GET-like operation (for example with a large data
      volume) may have not finished yet. This breaks the semantic of
      Win_wait / Win_fence.
      Here we fix this by increment the counter whenever we meet a
      GET-like operation, and decrement it when that operation finishes
      on target side. This will guarantee that when counter reaches
      zero and Win_wait / Win_fence returns, all operations are completed
      on the target.
      Signed-off-by: Pavan Balaji <balaji@anl.gov>
      Simplified RMA_Op structure. · 274a5a70
      We were creating duplicating information in the operation structure
      and in the packet structure when the message is actually issued.
      Since most of the information is the same anyway, this patch just
      embeds a packet structure into the operation structure.
      Signed-off-by: Xin Zhao <xinzhao3@illinois.edu>
      Make handling of request cleanup more uniform · 1e171ff6
      There are quite a few places where the request cleanup is done via:
      MPIU_Object_set_ref(req, 0);
      when it should be:
      This makes the handling more uniform so requests are cleaned up by releasing
      references rather than hitting them with the destroy hammer.
      Fixes #1664
      Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
      Removed unused single_op_opt field from MPID_Request · 255fb4a6
      The single_op_opt flag in the request object was previously used to
      track whether an operation is a lock-op-unlock type, for the purposes of
      completion.  Tracking this state has been merged into the packet header
      flags, so the single_op_opt flag is no longer needed.
      Reviewer: goodell
      RMA sync. piggybacking from origin->target · 4e67607f
      This patch uses packet header flags to piggyback the unlock operation on other
      RMA operations.  For most operations, there is no net change.  However, FOP and
      GACC, unlock piggybacking was previously not implemented.
      Reviewer: goodell
      Consolidated RMA op finalization code · bba35589
      This patch consolidates the synchronization and tracking of RMA operations into
      a single routine that is called whenever we complete an operation.  The only
      exception are lock-op-unlock operations that are completed from within the lock
      operation processing code.
      This code is pretty ugly, but it will get cleaner once packet flags are been
      Reviewer: goodell
      Temporarily reverted is_gacc_op bugfix · c5312557
      Partially reverted [0b364068] in preparation for incorporating new
      piggybacking infrastructure.  This temporarily re-introduces that bug
      and it will be fixed again with the new piggybacking patch.
      Reviwer: goodell
      BUGFIX: Unlock piggybacking for Get_accumulate · 0b364068
      GACC operations were both piggybacking the unlock message to the origin, and
      sending back a PT done packet.  This was causing the origin to be unlocked
      twice.  When another lock operation was performed between the GACC and PT done
      unlock operations, there was a synchronization race.
      Updated the fetch_and_op implementation to have two data transfer paths; one
      where data can be embedded in the packet header and one where it is sent
      separately.  With this change, the header size is back to 40 bytes.
      Reviewer: buntinas
      [svn-r7416] Major improvement to RMA performance for long lists of operations, an immediate mode accumulate for single ints, store the MPID_Comm within the window, and added a basic performance instrumentation interface that was extensively used to improve the RMA performance (enabled with --enable-g=instr).  With these fixes, MPICH2 can run the one-sided version of the Graph500 benchmark at a respectable if not great rate
      [svn-r5368] Use the correct type in the segment calls - it must be an MPI_Aint, not an MPIDI_msg_sz_t, particularly when the size of MPI_Aint is changed to match MPI_Offset (the segment calls specify an MPI_Aint at the last argument, for example.  This is part of the changes needed to make attributes work properly when --with-aint-size=8 is selected
      [svn-r4062] 1) Modifying the datatype code to find the number of contig blocks in an instance of an MPI derived datatype - Since finding the real number of contig blocks is not easy we find a reasonable upper bound instead. This also fixes the case where the number of contig blocks was uninitialized for contiguous datatypes. Refer ticket #428 for details (2) Adding a test case, test1_dt.c, to test the fix - Review @ rross, thakur
      [svn-r3070] Added ATTRIBUTE((unused)) (which is defined to work with gcc and disappear for other compilers only to the functions whose arguments are defined by a general pattern (such as a request handler) and thus must be used even if not needed.  For functions that have parameters that are not used and are not needed to conform to a pattern, do not use ATTRIBUTE((unused)), instead, fix the routine to either make use of the argument in a real way or change the routine to not pass the argument.
