1. 03 Nov, 2014 10 commits
    • Xin Zhao's avatar
      Rewrite all synchronization routines. · 38b20e57
      Xin Zhao authored
      
      
      We use new algorithms for RMA synchronization
      functions and RMA epochs. The old implementation
      uses a lazy-issuing algorithm, which queues up
      all operations and issues them at end. This
      forbid opportunites to do hardware RMA operations
      and can use up all memory resources when we
      queue up large number of operations.
      
      Here we use a new algorithm, which will initialize
      the synchonization at beginning, and issue operations
      as soon as the synchronization is finished.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      38b20e57
    • Xin Zhao's avatar
      Control no. of active RMA requests in the runtime. · 257faca2
      Xin Zhao authored
      
      
      When there are too many active requests in the runtime,
      the internal memory might be used up. This patch
      prevents such situation by triggering blocking
      wait loop in operation routines when no. of active
      requests reaches certain threshold value.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      257faca2
    • Xin Zhao's avatar
      Enable making progress in operation routines. · 33d96690
      Xin Zhao authored
      
      
      We no longer use the lazy-issuing model, which delays
      all operations to the end to issue, but issues them
      as early as possible. To achieve this, we enable
      making progress in RMA routines, so that RMA operations
      can be issued out as long as synchronization is finished.
      
      Sometimes we also need to poke the progress in
      operation routines to make sure that target side
      makes enough progress to receiving packets. Here
      we trigger it when no. of posted operations reaches
      certain threshold value.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      33d96690
    • Xin Zhao's avatar
      Keep track of no. of non-empty slots on window. · f91d4633
      Xin Zhao authored
      
      
      Keep track of no. of non-empty slots on window so that
      when number is 0, there are no operations needed to
      be processed and we can ignore that window.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      f91d4633
    • Xin Zhao's avatar
      Add new RMA states on window / target and modify state checking. · f076f3fe
      Xin Zhao authored
      
      
      We define new states to indicate the current situation of
      RMA synchronization. The states contain both ACCESS states
      and EXPOPSURE states, and specify if the synchronization
      is initialized (_CALLED), on-going (_ISSUED) and completed
      (_GRANTED). For single lock in Passive Target, we use
      per-target state whereas the window state is set to PER_TARGET.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      f076f3fe
    • Xin Zhao's avatar
      Add global window list. · 1d873639
      Xin Zhao authored
      
      
      Add a list of created windows on this process,
      so that we can make progress on all windows in
      the progress engine.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1d873639
    • Xin Zhao's avatar
      Add RMA slots and related APIs. · 0f596c48
      Xin Zhao authored
      
      
      We allocate a fixed size of targets array on window
      during window creation. The size can be configured
      by the user via CVAR. Each slot entry contains a list
      of target elements.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0f596c48
    • Xin Zhao's avatar
      Add target element and global / local pools and related APIs. · 5dd8a0a4
      Xin Zhao authored
      
      
      Here we add a data structure to store information of active target.
      The information includes operation lists, pasive lock state,
      sync state, etc.
      
      The target element is created by origin on-demand, and can
      be freed after the remote completion of all previous oeprations
      is detected. After RMA ending synchrnization calls, all
      target elements should be freed.
      
      Similiarly with operation pools, we create two-level target
      pools for target elements: one pre-window target pool and
      one global target pool.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      5dd8a0a4
    • Xin Zhao's avatar
      Add global / local pools of RMA ops and related APIs. · fc7617f2
      Xin Zhao authored
      
      
      Instead of allocating / deallocating RMA operations whenever
      an RMA op is posted by user, we allocate fixed size operation
      pools beforehand and take the op element from those pools
      when an RMA op is posted.
      
      With only a local (per-window) op pool, the number of ops
      allocated can increase arbitrarily if many windows are created.
      Alternatively, if we only use a global op pool, other windows
      might use up all operations thus starving the window we are
      working on.
      
      In this patch we create two pools: a local (per-window) pool and a
      global pool.  Every window is guaranteed to have at least the number
      of operations in the local pool.  If we run out of these operations,
      we check in the global pool to see if we have any operations left.
      When an operation is released, it is added back to the same pool it
      was allocated from.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      fc7617f2
    • Xin Zhao's avatar
      Temporarily remove all RMA PVARs. · 5c513032
      Xin Zhao authored
      
      
      Because we are going to rewrite the RMA infrastructure
      and many PVARs will no longer be used, here we temporarily
      remove all PVARs and will add needed PVARs back after new
      implementation is done.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      5c513032
  2. 01 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Bug-fix: always waiting for remote completion in Win_unlock. · c76aa786
      Xin Zhao authored
      
      
      The original implementation includes an optimization which
      allows Win_unlock for exclusive lock to return without
      waiting for remote completion. This relys on the
      assumption that window memory on target process will not
      be accessed by a third party until that target process
      finishes all RMA operations and grants the lock to other
      processes. However, this assumption is not correct if user
      uses assert MPI_MODE_NOCHECK. Consider the following code:
      
                P0                              P1           P2
          MPI_Win_lock(P1, NULL, exclusive);
          MPI_Put(X);
          MPI_Win_unlock(P1, exclusive);
          MPI_Send (P2);                                MPI_Recv(P0);
                                                        MPI_Win_lock(P1, MODE_NOCHECK, exclusive);
                                                        MPI_Get(X);
                                                        MPI_Win_unlock(P1, exclusive);
      
      Both P0 and P2 issue exclusive lock to P1, and P2 uses assert
      MPI_MODE_NOCHECK because the lock should be granted to P2 after
      synchronization between P2 and P0. However, in the original
      implementation, GET operation on P2 might not get the updated
      value since Win_unlock on P0 return without waiting for remote
      completion.
      
      In this patch we delete this optimization. In Win_free, since every
      Win_unlock guarantees the remote completion, target process no
      longer needs to do additional counting works to detect target-side
      completion, but only needs to do a global barrier.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c76aa786
  3. 30 Oct, 2014 1 commit
  4. 01 Oct, 2014 1 commit
  5. 28 Sep, 2014 1 commit
  6. 23 Sep, 2014 1 commit
    • Xin Zhao's avatar
      Bug-fix: waiting for ACKs for Active Target Synchronization. · 74189446
      Xin Zhao authored
      
      
      The original implementation of FENCE and PSCW does not
      guarantee the remote completion of issued-out RMA operations
      when MPI_Win_complete and MPI_Win_fence returns. They only
      guarantee the local completion of issued-out operations and
      the completion of coming-in operations. This is not correct
      if we try to get updated values on target side using synchronizations
      with MPI_MODE_NOCHECK.
      
      Here we modify it by making runtime wait for ACKs from all
      targets before returning from MPI_Win_fence and MPI_Win_complete.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      74189446
  7. 31 Jul, 2014 1 commit
    • Wesley Bland's avatar
      Add MPI_Comm_revoke · 57f6ee88
      Wesley Bland authored
      
      
      MPI_Comm_revoke is a special function because it does not have a matching call
      on the "receiving side". This is because it has to act as an out-of-band,
      resilient broadcast algorithm. Because of this, in this commit, in addition to
      the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.),
      we add a new CH3 packet type that will handle revoking a communicator without
      involving a matching call from the MPI layer (similar to how RMA is currently
      implemented).
      
      The thing that must be handled most carefully when revoking a communicator is
      to ensure that a previously used context ID will eventually be returned to the
      pool of available context IDs and that after this occurs, no old messages will
      match the new usage of the context ID (for instance, if some messages are very
      slow and show up late). To accomplish this, revoke is implemented as an
      all-to-all algorithm. When one process calls revoke, it will send a message to
      all other processes in the communicator, which will trigger that process to
      send a message to all other processes, and so on. Once a process has already
      revoked its communicator locally, it won't send out another wave of messages.
      As each process receives the revoke messages from the other processes, it will
      track how many messages have been received. Once it has either received a
      revoke message or a message about a process failure for each other process, it
      will release its refcount on the communicator object. After the application
      has freed all of its references to the communicator (and all requests, files,
      etc. associated with it), the context ID will be returned to the available
      pool.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      57f6ee88
  8. 11 Apr, 2014 1 commit
  9. 13 Mar, 2014 1 commit
    • Huiwei Lu's avatar
      Fixes inconsistent definition of parameters · 33337436
      Huiwei Lu authored
      
      
      In MPID_Win_allocate and MPID_Win_allocate_shared, baseptr are defined
      as void * and void ** separately, while in MPIDI_Win_fns, both
      MPID_Win_allocate and MPID_Win_allocate_shared are registered as
      MPIDI_CH3U_Win_allocate, where baseptr is defined as void *.
      
      Fixes #1995
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      33337436
  10. 17 Dec, 2013 1 commit
  11. 26 Sep, 2013 1 commit
  12. 01 Aug, 2013 3 commits
  13. 28 Jul, 2013 1 commit
    • Xin Zhao's avatar
      Add "alloc_shm" info to MPI_Win_allocate. · 384d96b7
      Xin Zhao authored
      
      
      Add "alloc_shm" to window's info arguments and initialize it to FALSE.
      In MPID_Win_allocate, if "alloc_shm" is set to true, call ALLOCATE_SHARED,
      otherwise call ALLOCATE.
      
      Free window memory only when SHM region is not allocated, therwise it is
      already freed in MPIDI_CH3I_SHM_Win_free.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      384d96b7
  14. 08 Jul, 2013 1 commit
  15. 07 May, 2013 1 commit
  16. 08 Nov, 2012 3 commits
    • James Dinan's avatar
      [svn-r10592] Updated active target to use a shared ops list · 5510107a
      James Dinan authored
      This fixes the performance regression that was introduced by concatenation of
      per-target lists.
      
      Reviewer: goodell
      5510107a
    • James Dinan's avatar
      [svn-r10590] Renamed fence_cnt to fence_issued · b054ac23
      James Dinan authored
      The fence_cnt field in MPID_Win is not a counter, it's a flag that indicates if
      fence has been called.
      
      Reviewer: buntinas
      b054ac23
    • James Dinan's avatar
      [svn-r10587] RMA epoch tracking · b001136e
      James Dinan authored
      This patch adds code to track the RMA epoch state of the local process.
      Currently, we are tracking the synchronization states that are allowed by
      MPICH; in the future, we may want to restrict this to only states that are
      allowed by the standard.  The addition of epoch tracking has several benefits:
      
       * It allows us to detect synchronization errors (implemented in this patch).
       * It allows us to implement lock_all more efficiently (implemented in this
         patch).
       * It will allow us to distinguish between active and passive target epochs and
         avoid O(p) op list concatenation (future patch).
      
      Reviewer: balaji
      b001136e
  17. 05 Nov, 2012 5 commits
    • James Dinan's avatar
      [svn-r10531] Refactored struct and enum naming to MPICH style · 7e179a85
      James Dinan authored
      Updated RMA code to remove trailing "_e" and "_s" on enum and struct type
      names to match the MPICH style.
      
      Reviewer: goodell
      7e179a85
    • James Dinan's avatar
      [svn-r10515] Implementation of passive multi-target synch · 656b26f5
      James Dinan authored
      Updated RMA implementation to track the passive target status individually, for
      each target.  Includes new implementation for lock/unlock_all.  Lock_all is
      currently unoptimized, see #1734 for future plans.
      
      Reviewer: buntinas
      656b26f5
    • James Dinan's avatar
      [svn-r10513] Support for one RMA op list per target · ab97edb7
      James Dinan authored
      The use of a dense array is a temporary measure to support the reference
      implementation.  This will be much improved by ticket #1735.
      
      Reviewer: goodell
      ab97edb7
    • James Dinan's avatar
      [svn-r10511] Removed old synch. error checking in RMA · 4bff013d
      James Dinan authored
      The old "lockRank" error checking is no longer sufficient in MPI 3.0 and must
      be removed to add support for locking multiple targets.
      
      Reviewer: balaji
      4bff013d
    • James Dinan's avatar
      [svn-r10508] Refactoring RMA Ops list to DL · cdb1b3e4
      James Dinan authored
      In this patch, I have refactored the RMA ops list again to use the MPL UTList
      doubly-linked list and to treat the list as a proper object.  This should set
      us up to work with multiple lists, as we will soon have one list per target.
      Doubly-linking the list is a big help in terms of maintainability (no more
      prevNext pointers) and flexibility (better implementation of request-based
      ops and other optimizations).
      
      Reviewer: goodell
      cdb1b3e4
  18. 25 Oct, 2012 2 commits
  19. 22 Oct, 2012 1 commit
  20. 20 Oct, 2012 1 commit
    • James Dinan's avatar
      [svn-r10423] Added passive target immediate locking · 5109ab1b
      James Dinan authored
      When enabled, this mode of operation immediately requests the lock when
      MPI_Win_lock is called.  Currently, this is enabled by setting the
      MPICH_RMA_LOCK_IMMED environment variable.  In the future, we can also make
      this mode of operation available though an info/assert.  This capability is
      needed to implement MPI-3's flush operations.
      
      Reviewer: buntinas
      5109ab1b
  21. 19 Oct, 2012 1 commit
  22. 10 Oct, 2012 1 commit