1. 31 Jul, 2014 1 commit
    • Wesley Bland's avatar
      Add MPIX_Comm_failure_ack/get_acked · 8652e0ad
      Wesley Bland authored
      
      
      This commit adds the new functions MPI(X)_COMM_FAILURE_ACK and
      MPI(X)_COMM_FAILURE_GET_ACKED. These two functions together allow the user to
      get the group of failed processes.
      
      Most of the implementation for this is pushed into the MPID layer since some
      systems won't support this (PAMI). The existing function
      MPIDI_CH3U_Check_for_failed_procs has been modified to give back the group of
      acknowledged failed processes. There is an inefficiency here in that the list
      of failed processes is retrieved from PMI and parsed every time the user calls
      both failure_ack and get_acked, but this means we don't have to try to cache
      the list that comes back from PMI (which could potentially be expensive, but
      would have some cost even in the failure-free case).
      
      This commit adds a failed to the MPID_Comm structure. There is now a field
      called last_ack_rank. This is a single integer that stores the last
      acknowledged failure for this communicator which is used to determine when to
      stop parsing when getting back the list of acknowledged failed processes.
      
      Lastly, this commit includes a test to make sure that all of the above works
      (test/mpi/ft/failure_ack). This tests that a failure is appropriately included
      in the failed group and excluded if the failure was not previously
      acknowledged.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      8652e0ad
  2. 22 Jul, 2014 2 commits
  3. 21 Jul, 2014 1 commit
    • Pavan Balaji's avatar
      Don't start enums with 0. · faa37d89
      Pavan Balaji authored
      
      
      This is to help with debugging.  Zero is too common a value, and is
      often set automatically by the system if not initialized.  Starting at
      a different value helps us catch uninitialized cases more easily.
      
      We pick "42" as our magic number as it is the answer to the ultimate
      question of life, the Universe, and everything.
      Signed-off-by: default avatarWesley Bland <wbland@anl.gov>
      faa37d89
  4. 23 Mar, 2014 1 commit
    • Wesley Bland's avatar
      Remove the use of MPIDI_TAG_UB · 055abbd3
      Wesley Bland authored
      
      
      The constant MPIDI_TAG_UB is used in only one place at the moment, in the
      initialization of ch3 (source:src/mpid/ch3/src/mpid_init.c@4b35902a#L131). The
      problem is that the value which is being set (MPIR_Process.attrs.tag_ub) is
      set differently in pamid (INT_MAX). This leads to weird results when we set
      apart a bit in the tag space for failure propagation in non-blocking
      collectives (see #2008).
      
      Since this value isn't being referenced anywhere else, there doesn't seem to
      be a use for it and it's just leading to confusion. To avoid this, here we
      remove this value and just set MPIR_Process.attrs.tag_ub to INT_MAX in both
      ch3 and pamid.
      
      See #2009
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      055abbd3
  5. 27 Jan, 2014 2 commits
    • Wesley Bland's avatar
      Remove a comment that doesn't apply anymore. · 201b0dbf
      Wesley Bland authored
      No reviewer
      201b0dbf
    • Wesley Bland's avatar
      Moves the tag reservation to MPI layer · bb755b5c
      Wesley Bland authored
      
      
      Resets MPIDI_TAG_UB back to 0x7fffffff. This value was changed a while back,
      but the change should have happened at the MPI layer instead of the CH3 layer.
      This resets the value to allow CH3 to use the tag space.
      
      Instead, the value is now set in the MPI layer during initthread. This means
      that it will be safe regardless of the device being used. This prevents a
      collision that was occurring on the pamid device where the values for
      MPIR_TAG_ERROR_BIT and the MPIR_Process.attr.tagged_coll_mask values were the
      same.
      
      Fixes #2008
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      bb755b5c
  6. 31 Oct, 2013 1 commit
  7. 01 Aug, 2013 1 commit
  8. 28 Jul, 2013 1 commit
    • Xin Zhao's avatar
      Add "alloc_shm" info to MPI_Win_allocate. · 384d96b7
      Xin Zhao authored
      
      
      Add "alloc_shm" to window's info arguments and initialize it to FALSE.
      In MPID_Win_allocate, if "alloc_shm" is set to true, call ALLOCATE_SHARED,
      otherwise call ALLOCATE.
      
      Free window memory only when SHM region is not allocated, therwise it is
      already freed in MPIDI_CH3I_SHM_Win_free.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      384d96b7
  9. 25 Jul, 2013 2 commits
  10. 07 May, 2013 3 commits
  11. 01 Apr, 2013 1 commit
    • Ralf Gunter's avatar
      Add per-communicator eager threshold support. · a3c816ac
      Ralf Gunter authored
      Message transfers now respect the communicator-specific threshold.  This
      change has not been carefully checked for impact on our shared-memory
      ping-pong latency.
      
      Reviewed-by: goodell
      a3c816ac
  12. 21 Feb, 2013 2 commits
    • James Dinan's avatar
      Removed unused single_op_opt field from MPID_Request · 255fb4a6
      James Dinan authored
      The single_op_opt flag in the request object was previously used to
      track whether an operation is a lock-op-unlock type, for the purposes of
      completion.  Tracking this state has been merged into the packet header
      flags, so the single_op_opt flag is no longer needed.
      
      Reviewer: goodell
      255fb4a6
    • James Dinan's avatar
      Added flags to MPID_Request · 90be9ee1
      James Dinan authored
      Added a flags field to MPID_Request that we can use to stash flags from
      suspended RMA ops and retrieve them later when we complete the operation.
      
      Reviewer: goodell
      90be9ee1
  13. 06 Feb, 2013 1 commit
    • James Dinan's avatar
      Eliminate enqueueing of lock op in RMA ops list · fbd95593
      James Dinan authored
      Prior to this patch, a lock entry was enqueued in the RMA ops list when
      Win_lock was called.  This patch adds a new state tracking mechanism, which we
      use to record the synchronization state with respect to each RMA target.  This
      new mechanism absorbs tracking of lock operation and the lock state at the
      target.  It significantly simplifies the RMA synchronization and ops list
      processing.
      
      Reviewer: goodell
      fbd95593
  14. 11 Jan, 2013 1 commit
    • James Dinan's avatar
      Implemented interprocess shared memory RMA ops · 58ec39c5
      James Dinan authored
      Communication operations on shared memory windows now perform the op directly
      on the shared buffer.  This requried the addition of a per-window interprocess
      mutex to ensure that atomics and accumulates are performed atomically.
      
      Reviewer: buntinas
      58ec39c5
  15. 17 Dec, 2012 1 commit
  16. 27 Nov, 2012 3 commits
  17. 08 Nov, 2012 5 commits
    • James Dinan's avatar
      c57c3bb1
    • James Dinan's avatar
      [svn-r10593] Renamed EPOCH_GAT to PSCW · d45a8f45
      James Dinan authored
      d45a8f45
    • James Dinan's avatar
      [svn-r10592] Updated active target to use a shared ops list · 5510107a
      James Dinan authored
      This fixes the performance regression that was introduced by concatenation of
      per-target lists.
      
      Reviewer: goodell
      5510107a
    • James Dinan's avatar
      [svn-r10591] Moved device-only MPID_Win members to CH3 · f344bc2e
      James Dinan authored
      Moved fence_issued and start_assert MPID_Win members into CH3.  These should
      not be a required part of the ADI, since they are not needed above the ADI and
      implementations should be free to choose different mechanisms for tracking
      the state of synchronization operations.
      
      Reviewer: buntinas
      f344bc2e
    • James Dinan's avatar
      [svn-r10587] RMA epoch tracking · b001136e
      James Dinan authored
      This patch adds code to track the RMA epoch state of the local process.
      Currently, we are tracking the synchronization states that are allowed by
      MPICH; in the future, we may want to restrict this to only states that are
      allowed by the standard.  The addition of epoch tracking has several benefits:
      
       * It allows us to detect synchronization errors (implemented in this patch).
       * It allows us to implement lock_all more efficiently (implemented in this
         patch).
       * It will allow us to distinguish between active and passive target epochs and
         avoid O(p) op list concatenation (future patch).
      
      Reviewer: balaji
      b001136e
  18. 05 Nov, 2012 5 commits
  19. 25 Oct, 2012 1 commit
  20. 23 Oct, 2012 1 commit
  21. 20 Oct, 2012 2 commits
    • James Dinan's avatar
      [svn-r10426] MPI-3 RMA Flush implementation · 7e3e73a2
      James Dinan authored
      This commit implements MPI-3 RMA's flush and flush_all operations.
      
      Reviewer: buntinas
      7e3e73a2
    • James Dinan's avatar
      [svn-r10423] Added passive target immediate locking · 5109ab1b
      James Dinan authored
      When enabled, this mode of operation immediately requests the lock when
      MPI_Win_lock is called.  Currently, this is enabled by setting the
      MPICH_RMA_LOCK_IMMED environment variable.  In the future, we can also make
      this mode of operation available though an info/assert.  This capability is
      needed to implement MPI-3's flush operations.
      
      Reviewer: buntinas
      5109ab1b
  22. 11 Oct, 2012 1 commit
  23. 10 Oct, 2012 1 commit