1. 03 Nov, 2014 6 commits
  2. 01 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Bug-fix: always waiting for remote completion in Win_unlock. · c76aa786
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The original implementation includes an optimization which
      allows Win_unlock for exclusive lock to return without
      waiting for remote completion. This relys on the
      assumption that window memory on target process will not
      be accessed by a third party until that target process
      finishes all RMA operations and grants the lock to other
      processes. However, this assumption is not correct if user
      uses assert MPI_MODE_NOCHECK. Consider the following code:
      
                P0                              P1           P2
          MPI_Win_lock(P1, NULL, exclusive);
          MPI_Put(X);
          MPI_Win_unlock(P1, exclusive);
          MPI_Send (P2);                                MPI_Recv(P0);
                                                        MPI_Win_lock(P1, MODE_NOCHECK, exclusive);
                                                        MPI_Get(X);
                                                        MPI_Win_unlock(P1, exclusive);
      
      Both P0 and P2 issue exclusive lock to P1, and P2 uses assert
      MPI_MODE_NOCHECK because the lock should be granted to P2 after
      synchronization between P2 and P0. However, in the original
      implementation, GET operation on P2 might not get the updated
      value since Win_unlock on P0 return without waiting for remote
      completion.
      
      In this patch we delete this optimization. In Win_free, since every
      Win_unlock guarantees the remote completion, target process no
      longer needs to do additional counting works to detect target-side
      completion, but only needs to do a global barrier.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c76aa786
  3. 30 Oct, 2014 1 commit
  4. 20 Oct, 2014 1 commit
  5. 01 Oct, 2014 1 commit
  6. 28 Sep, 2014 1 commit
  7. 23 Sep, 2014 1 commit
    • Xin Zhao's avatar
      Bug-fix: waiting for ACKs for Active Target Synchronization. · 74189446
      Xin Zhao authored
      
      
      The original implementation of FENCE and PSCW does not
      guarantee the remote completion of issued-out RMA operations
      when MPI_Win_complete and MPI_Win_fence returns. They only
      guarantee the local completion of issued-out operations and
      the completion of coming-in operations. This is not correct
      if we try to get updated values on target side using synchronizations
      with MPI_MODE_NOCHECK.
      
      Here we modify it by making runtime wait for ACKs from all
      targets before returning from MPI_Win_fence and MPI_Win_complete.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      74189446
  8. 03 Sep, 2014 1 commit
    • Min Si's avatar
      Enabled SHM segments detection in MPI_Win_create · b58d4baf
      Min Si authored
      
      
      First, cache every SHM window created by Win_allocate or
      Win_allocate_shared into a global list, and unlink it in Win_free.
      
      Then, when user calls Win_create for a new window, check user specified
      buffer and comm. Enable local SHM communicaiton in the new window if it
      matches a cached SHM window. It is noted that all the shared resources
      are still freed by the original SHM window.
      
      Matching a SHM window must satisfy following two conditions:
      1. The new node comm is equal to, or a subset of the SHM node comm.
      (Note that in the other cases where two node comms are overlapped,
      although the overlapped processes could be logically shared, it is not
      supported for now. To support this, we need to fist modify the implementation
      of RMA operations in order to remember shared status per target but not
      just compare its node_id).
      2. The buffer is in the range of the SHM segment across local processes
      in original SHM window (a contigunous segment is mapped across local
      processes regardless of whether alloc_shared_noncontig is set).
      
      Resolves #2161
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      b58d4baf
  9. 27 Aug, 2014 1 commit
  10. 26 Aug, 2014 1 commit
  11. 25 Aug, 2014 1 commit
    • Wesley Bland's avatar
      Fix error case for MPIDI_Request_create_null_rreq · cf1240d6
      Wesley Bland authored
      
      
      For some reason, the error case code between MPIDI_Request_create_rreq and
      MPIDI_Request_create_null_rreq was different. This is odd, because both macros
      take FAIL_ as an argument which is executed directly in the error case of
      create_rreq, but not in null_req. This commit makes the two act the same and
      updates the only two calls to the function that existed in the code.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      cf1240d6
  12. 31 Jul, 2014 4 commits
    • Wesley Bland's avatar
      Add MPI_Comm_revoke · 57f6ee88
      Wesley Bland authored
      
      
      MPI_Comm_revoke is a special function because it does not have a matching call
      on the "receiving side". This is because it has to act as an out-of-band,
      resilient broadcast algorithm. Because of this, in this commit, in addition to
      the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.),
      we add a new CH3 packet type that will handle revoking a communicator without
      involving a matching call from the MPI layer (similar to how RMA is currently
      implemented).
      
      The thing that must be handled most carefully when revoking a communicator is
      to ensure that a previously used context ID will eventually be returned to the
      pool of available context IDs and that after this occurs, no old messages will
      match the new usage of the context ID (for instance, if some messages are very
      slow and show up late). To accomplish this, revoke is implemented as an
      all-to-all algorithm. When one process calls revoke, it will send a message to
      all other processes in the communicator, which will trigger that process to
      send a message to all other processes, and so on. Once a process has already
      revoked its communicator locally, it won't send out another wave of messages.
      As each process receives the revoke messages from the other processes, it will
      track how many messages have been received. Once it has either received a
      revoke message or a message about a process failure for each other process, it
      will release its refcount on the communicator object. After the application
      has freed all of its references to the communicator (and all requests, files,
      etc. associated with it), the context ID will be returned to the available
      pool.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      57f6ee88
    • Wesley Bland's avatar
      Remove coll_active field in MPIDI_Comm · 5c71c3a8
      Wesley Bland authored
      
      
      The collectively active field wasn't doing anything anymore so it's been
      removed. This was a remnant from a previous FT proposal.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      5c71c3a8
    • Wesley Bland's avatar
      Add MPIX_Comm_failure_ack/get_acked · 8652e0ad
      Wesley Bland authored
      
      
      This commit adds the new functions MPI(X)_COMM_FAILURE_ACK and
      MPI(X)_COMM_FAILURE_GET_ACKED. These two functions together allow the user to
      get the group of failed processes.
      
      Most of the implementation for this is pushed into the MPID layer since some
      systems won't support this (PAMI). The existing function
      MPIDI_CH3U_Check_for_failed_procs has been modified to give back the group of
      acknowledged failed processes. There is an inefficiency here in that the list
      of failed processes is retrieved from PMI and parsed every time the user calls
      both failure_ack and get_acked, but this means we don't have to try to cache
      the list that comes back from PMI (which could potentially be expensive, but
      would have some cost even in the failure-free case).
      
      This commit adds a failed to the MPID_Comm structure. There is now a field
      called last_ack_rank. This is a single integer that stores the last
      acknowledged failure for this communicator which is used to determine when to
      stop parsing when getting back the list of acknowledged failed processes.
      
      Lastly, this commit includes a test to make sure that all of the above works
      (test/mpi/ft/failure_ack). This tests that a failure is appropriately included
      in the failed group and excluded if the failure was not previously
      acknowledged.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      8652e0ad
    • Wesley Bland's avatar
      Add MPIDI_CH3U_Get_failed_group · 665ced28
      Wesley Bland authored
      
      
      This function will take a last_failed value and generate an MPID_Group. If the
      value is MPI_PROC_NULL, then it will parse the entire list. This function is
      exposed by MPID so this can be used by any functions that need the list of
      failed processes.
      
      This change necessitated changing the way the list of failed processes is
      retreived from PMI. Rather than allocating a char array on demand every time
      we get the list from PMI, this string is allocated at init time and freed at
      finalize time now. This means that we can cache the value to be used later for
      things like just querying the list of processes that we already know have
      failed, rather than also getting the new list (which is important for the
      failure_ack/get_acked semantics).
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      665ced28
  13. 22 Jul, 2014 2 commits
  14. 21 Jul, 2014 4 commits
  15. 18 Jul, 2014 1 commit
  16. 17 Jul, 2014 1 commit
    • Pavan Balaji's avatar
      Simplified RMA_Op structure. · 274a5a70
      Pavan Balaji authored
      
      
      We were creating duplicating information in the operation structure
      and in the packet structure when the message is actually issued.
      Since most of the information is the same anyway, this patch just
      embeds a packet structure into the operation structure.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      274a5a70
  17. 11 Apr, 2014 1 commit
  18. 23 Mar, 2014 1 commit
    • Wesley Bland's avatar
      Remove the use of MPIDI_TAG_UB · 055abbd3
      Wesley Bland authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The constant MPIDI_TAG_UB is used in only one place at the moment, in the
      initialization of ch3 (source:src/mpid/ch3/src/mpid_init.c@4b35902a#L131). The
      problem is that the value which is being set (MPIR_Process.attrs.tag_ub) is
      set differently in pamid (INT_MAX). This leads to weird results when we set
      apart a bit in the tag space for failure propagation in non-blocking
      collectives (see #2008).
      
      Since this value isn't being referenced anywhere else, there doesn't seem to
      be a use for it and it's just leading to confusion. To avoid this, here we
      remove this value and just set MPIR_Process.attrs.tag_ub to INT_MAX in both
      ch3 and pamid.
      
      See #2009
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      055abbd3
  19. 26 Feb, 2014 1 commit
  20. 27 Jan, 2014 2 commits
    • Wesley Bland's avatar
      Remove a comment that doesn't apply anymore. · 201b0dbf
      Wesley Bland authored
      No reviewer
      201b0dbf
    • Wesley Bland's avatar
      Moves the tag reservation to MPI layer · bb755b5c
      Wesley Bland authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      Resets MPIDI_TAG_UB back to 0x7fffffff. This value was changed a while back,
      but the change should have happened at the MPI layer instead of the CH3 layer.
      This resets the value to allow CH3 to use the tag space.
      
      Instead, the value is now set in the MPI layer during initthread. This means
      that it will be safe regardless of the device being used. This prevents a
      collision that was occurring on the pamid device where the values for
      MPIR_TAG_ERROR_BIT and the MPIR_Process.attr.tagged_coll_mask values were the
      same.
      
      Fixes #2008
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      bb755b5c
  21. 17 Dec, 2013 1 commit
  22. 15 Nov, 2013 2 commits
  23. 31 Oct, 2013 1 commit
  24. 29 Oct, 2013 1 commit
  25. 26 Oct, 2013 1 commit
  26. 27 Sep, 2013 1 commit