1. 26 Jun, 2015 1 commit
  2. 24 Jun, 2015 1 commit
  3. 20 Jun, 2015 1 commit
  4. 12 Jun, 2015 1 commit
  5. 04 Mar, 2015 3 commits
    • Xin Zhao's avatar
      Add a function hook to initialize window attributes in channel layer. · 7c1a8fb1
      Xin Zhao authored
      
      
      There are some window attributes in the channel layer that
      needs to be initialized during window creation. In this
      patch, we first add a win_hooks table that contains pointers
      to the channel's implementation of the function hooks. Secondly,
      we add a function hook 'win_init' to allow the channel layer to
      initialize its own attributes. The hook is called from the
      CH3 win_init function.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      7c1a8fb1
    • Xin Zhao's avatar
      Reduce size of shm_base_addrs[] from comm_size to node_size. · eddd8b91
      Xin Zhao authored
      
      
      Given one process, shm_base_addrs[] is used to store the base
      addresses (in the address space of this process) of SHM window
      on other processes. The original size of it is comm_size. However,
      the maximum number of SHM windows that this process can access
      to is node_size instead of comm_size, which results in a waste
      of memory since most slots in the array is NULL. In this patch
      we reduce the size of shm_base_addrs[] from comm_size to node_size.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      eddd8b91
    • Xin Zhao's avatar
      Store window basic attributes into a struct on window. · 9404e953
      Xin Zhao authored
      
      
      In this patch, we gather window basic attributes of other
      processes (base_addr, size, disp_unit, win_handle) using a
      struct called "basic_info_table". By doing this, we can use
      one contiguous memory region to store them.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      9404e953
  6. 03 Mar, 2015 1 commit
  7. 16 Dec, 2014 7 commits
  8. 20 Nov, 2014 1 commit
  9. 13 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Perf-tuning: issue FLUSH, FLUSH ACK, UNLOCK ACK messages only when needed. · a9d968cc
      Xin Zhao authored
      
      
      When operation pending list and request lists are all empty, FLUSH message
      needs to be sent by origin only when origin issued PUT/ACC operations since
      the last synchronization calls, otherwise origin does not need to issue FLUSH
      at all and does not need to wait for FLUSH ACK message.
      
      Similiarly, origin waits for ACK of UNLOCK message only when origin issued
      PUT/ACC operations since the last synchronization calls. However, UNLOCK
      message always needs to be sent out because origin needs to unlock the
      target process. This patch avoids issuing unnecessary
      FLUSH / FLUSH ACK / UNLOCK ACK messages.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      a9d968cc
  10. 07 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Move definition of global window counters to nemesis / sock. · a4223bc3
      Xin Zhao authored
      
      
      num_active_issued_win and num_passive_win are counters of
      windows in active ISSUED mode and in passive mode.
      It is modified in CH3 and is used in progress engine of
      nemesis / sock to skip windows that do not need to make
      progress on. Here we define them in mpidi_ch3_pre.h in
      nemesis / sock so that they can be exposed to upper layers.
      Signed-off-by: default avatarMin Si <msi@il.is.s.u-tokyo.ac.jp>
      a4223bc3
  11. 03 Nov, 2014 13 commits
    • Xin Zhao's avatar
      add original RMA PVARs back. · ed20cd37
      Xin Zhao authored
      
      
      Add some original RMA PVARs back to the new
      RMA infrastructure, including timing of packet
      handlers, op allocation and setting, window
      creation, etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ed20cd37
    • Xin Zhao's avatar
      Delete no longer needed code. · cc63b367
      Xin Zhao authored
      
      
      We made a huge change to RMA infrastructure and
      a lot of old code can be droped, including separate
      handlers for lock-op-unlock, ACCUM_IMMED specific
      code, O(p) data structure code, code of lazy issuing,
      etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      cc63b367
    • Xin Zhao's avatar
      Rewrite code of passive lock control messages. · 0542e304
      Xin Zhao authored
      
      
      1. Piggyback LOCK request with first IMMED operation.
      
      When we see an IMMED operation, we can always piggyback
      LOCK request with that operation to reduce one sync
      message of single LOCK request. When packet header of
      that operation is received on target, we will try to
      acquire the lock and perform that operation. The target
      either piggybacks LOCK_GRANTED message with the response
      packet (if available), or sends a single LOCK_GRANTED
      message back to origin.
      
      2. Rewrite code of manage lock queue.
      
      When the lock request cannot be satisfied on target,
      we need to buffer that lock request on target. All we
      need to do is enqueuing the packet header, which contains
      all information we need after lock is granted. When
      the current lock is released, the runtime will goes
      over the lock queue and grant the lock to the next
      available request. After lock is granted, the runtime
      just trigger the packet handler for the second time.
      
      3. Release lock on target side if piggybacking with UNLOCK.
      
      If there are active-message operations to be issued,
      we piggyback a UNLOCK flag with the last operation.
      When the target recieves it, it will release the current
      lock and grant the lock to the next process.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0542e304
    • Xin Zhao's avatar
      Rewrite all synchronization routines. · 38b20e57
      Xin Zhao authored
      
      
      We use new algorithms for RMA synchronization
      functions and RMA epochs. The old implementation
      uses a lazy-issuing algorithm, which queues up
      all operations and issues them at end. This
      forbid opportunites to do hardware RMA operations
      and can use up all memory resources when we
      queue up large number of operations.
      
      Here we use a new algorithm, which will initialize
      the synchonization at beginning, and issue operations
      as soon as the synchronization is finished.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      38b20e57
    • Xin Zhao's avatar
      Control no. of active RMA requests in the runtime. · 257faca2
      Xin Zhao authored
      
      
      When there are too many active requests in the runtime,
      the internal memory might be used up. This patch
      prevents such situation by triggering blocking
      wait loop in operation routines when no. of active
      requests reaches certain threshold value.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      257faca2
    • Xin Zhao's avatar
      Enable making progress in operation routines. · 33d96690
      Xin Zhao authored
      
      
      We no longer use the lazy-issuing model, which delays
      all operations to the end to issue, but issues them
      as early as possible. To achieve this, we enable
      making progress in RMA routines, so that RMA operations
      can be issued out as long as synchronization is finished.
      
      Sometimes we also need to poke the progress in
      operation routines to make sure that target side
      makes enough progress to receiving packets. Here
      we trigger it when no. of posted operations reaches
      certain threshold value.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      33d96690
    • Xin Zhao's avatar
      Keep track of no. of non-empty slots on window. · f91d4633
      Xin Zhao authored
      
      
      Keep track of no. of non-empty slots on window so that
      when number is 0, there are no operations needed to
      be processed and we can ignore that window.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      f91d4633
    • Xin Zhao's avatar
      Add new RMA states on window / target and modify state checking. · f076f3fe
      Xin Zhao authored
      
      
      We define new states to indicate the current situation of
      RMA synchronization. The states contain both ACCESS states
      and EXPOPSURE states, and specify if the synchronization
      is initialized (_CALLED), on-going (_ISSUED) and completed
      (_GRANTED). For single lock in Passive Target, we use
      per-target state whereas the window state is set to PER_TARGET.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      f076f3fe
    • Xin Zhao's avatar
      Add global window list. · 1d873639
      Xin Zhao authored
      
      
      Add a list of created windows on this process,
      so that we can make progress on all windows in
      the progress engine.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1d873639
    • Xin Zhao's avatar
      Add RMA slots and related APIs. · 0f596c48
      Xin Zhao authored
      
      
      We allocate a fixed size of targets array on window
      during window creation. The size can be configured
      by the user via CVAR. Each slot entry contains a list
      of target elements.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0f596c48
    • Xin Zhao's avatar
      Add target element and global / local pools and related APIs. · 5dd8a0a4
      Xin Zhao authored
      
      
      Here we add a data structure to store information of active target.
      The information includes operation lists, pasive lock state,
      sync state, etc.
      
      The target element is created by origin on-demand, and can
      be freed after the remote completion of all previous oeprations
      is detected. After RMA ending synchrnization calls, all
      target elements should be freed.
      
      Similiarly with operation pools, we create two-level target
      pools for target elements: one pre-window target pool and
      one global target pool.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      5dd8a0a4
    • Xin Zhao's avatar
      Add global / local pools of RMA ops and related APIs. · fc7617f2
      Xin Zhao authored
      
      
      Instead of allocating / deallocating RMA operations whenever
      an RMA op is posted by user, we allocate fixed size operation
      pools beforehand and take the op element from those pools
      when an RMA op is posted.
      
      With only a local (per-window) op pool, the number of ops
      allocated can increase arbitrarily if many windows are created.
      Alternatively, if we only use a global op pool, other windows
      might use up all operations thus starving the window we are
      working on.
      
      In this patch we create two pools: a local (per-window) pool and a
      global pool.  Every window is guaranteed to have at least the number
      of operations in the local pool.  If we run out of these operations,
      we check in the global pool to see if we have any operations left.
      When an operation is released, it is added back to the same pool it
      was allocated from.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      fc7617f2
    • Xin Zhao's avatar
      Temporarily remove all RMA PVARs. · 5c513032
      Xin Zhao authored
      
      
      Because we are going to rewrite the RMA infrastructure
      and many PVARs will no longer be used, here we temporarily
      remove all PVARs and will add needed PVARs back after new
      implementation is done.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      5c513032
  12. 01 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Bug-fix: always waiting for remote completion in Win_unlock. · c76aa786
      Xin Zhao authored
      
      
      The original implementation includes an optimization which
      allows Win_unlock for exclusive lock to return without
      waiting for remote completion. This relys on the
      assumption that window memory on target process will not
      be accessed by a third party until that target process
      finishes all RMA operations and grants the lock to other
      processes. However, this assumption is not correct if user
      uses assert MPI_MODE_NOCHECK. Consider the following code:
      
                P0                              P1           P2
          MPI_Win_lock(P1, NULL, exclusive);
          MPI_Put(X);
          MPI_Win_unlock(P1, exclusive);
          MPI_Send (P2);                                MPI_Recv(P0);
                                                        MPI_Win_lock(P1, MODE_NOCHECK, exclusive);
                                                        MPI_Get(X);
                                                        MPI_Win_unlock(P1, exclusive);
      
      Both P0 and P2 issue exclusive lock to P1, and P2 uses assert
      MPI_MODE_NOCHECK because the lock should be granted to P2 after
      synchronization between P2 and P0. However, in the original
      implementation, GET operation on P2 might not get the updated
      value since Win_unlock on P0 return without waiting for remote
      completion.
      
      In this patch we delete this optimization. In Win_free, since every
      Win_unlock guarantees the remote completion, target process no
      longer needs to do additional counting works to detect target-side
      completion, but only needs to do a global barrier.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c76aa786
  13. 30 Oct, 2014 1 commit
  14. 01 Oct, 2014 1 commit
  15. 28 Sep, 2014 1 commit
  16. 23 Sep, 2014 1 commit
    • Xin Zhao's avatar
      Bug-fix: waiting for ACKs for Active Target Synchronization. · 74189446
      Xin Zhao authored
      
      
      The original implementation of FENCE and PSCW does not
      guarantee the remote completion of issued-out RMA operations
      when MPI_Win_complete and MPI_Win_fence returns. They only
      guarantee the local completion of issued-out operations and
      the completion of coming-in operations. This is not correct
      if we try to get updated values on target side using synchronizations
      with MPI_MODE_NOCHECK.
      
      Here we modify it by making runtime wait for ACKs from all
      targets before returning from MPI_Win_fence and MPI_Win_complete.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      74189446
  17. 31 Jul, 2014 1 commit
    • Wesley Bland's avatar
      Add MPI_Comm_revoke · 57f6ee88
      Wesley Bland authored
      
      
      MPI_Comm_revoke is a special function because it does not have a matching call
      on the "receiving side". This is because it has to act as an out-of-band,
      resilient broadcast algorithm. Because of this, in this commit, in addition to
      the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.),
      we add a new CH3 packet type that will handle revoking a communicator without
      involving a matching call from the MPI layer (similar to how RMA is currently
      implemented).
      
      The thing that must be handled most carefully when revoking a communicator is
      to ensure that a previously used context ID will eventually be returned to the
      pool of available context IDs and that after this occurs, no old messages will
      match the new usage of the context ID (for instance, if some messages are very
      slow and show up late). To accomplish this, revoke is implemented as an
      all-to-all algorithm. When one process calls revoke, it will send a message to
      all other processes in the communicator, which will trigger that process to
      send a message to all other processes, and so on. Once a process has already
      revoked its communicator locally, it won't send out another wave of messages.
      As each process receives the revoke messages from the other processes, it will
      track how many messages have been received. Once it has either received a
      revoke message or a message about a process failure for each other process, it
      will release its refcount on the communicator object. After the application
      has freed all of its references to the communicator (and all requests, files,
      etc. associated with it), the context ID will be returned to the available
      pool.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      57f6ee88
  18. 11 Apr, 2014 1 commit
  19. 13 Mar, 2014 1 commit
    • Huiwei Lu's avatar
      Fixes inconsistent definition of parameters · 33337436
      Huiwei Lu authored
      
      
      In MPID_Win_allocate and MPID_Win_allocate_shared, baseptr are defined
      as void * and void ** separately, while in MPIDI_Win_fns, both
      MPID_Win_allocate and MPID_Win_allocate_shared are registered as
      MPIDI_CH3U_Win_allocate, where baseptr is defined as void *.
      
      Fixes #1995
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      33337436
  20. 17 Dec, 2013 1 commit