1. 12 Nov, 2014 1 commit
  2. 31 Jul, 2014 3 commits
    • Wesley Bland's avatar
      Add MPIX_Comm_agree · 1f0ee136
      Wesley Bland authored
      
      
      Adds function implementing an agreement algorithm for the user. This function
      lets the user manually perform an agreement as well as detect unacknowledged
      failures.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      1f0ee136
    • Wesley Bland's avatar
      Add MPIX_Comm_shrink functionality · 5be10ce9
      Wesley Bland authored
      
      
      This adds a new function MPIX_COMM_SHRINK. This is a communicator creation
      function that creates a new communicator based on a previous communicator, but
      excluding any failed processes.
      
      As part of the operation, the shrink call needs to perform an agreement to
      determine the group of failed processes. This is done using the algorithm
      published by Hursey et al. in his EuroMPI '12 paper.
      
      The list of failed processes is collected using a bit array. This happens via
      a few new functions in the CH3 layer to create and send a bitarry to the
      master process and receive an updated bitarray. Obviously, this is not a very
      scalable implementation yet, but something better can easily be plugged in
      here to replace the naïve implementation. This is also a use case for an
      MPI_Recv_reduce for future reference.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      5be10ce9
    • Wesley Bland's avatar
      Add MPI_Comm_revoke · 57f6ee88
      Wesley Bland authored
      
      
      MPI_Comm_revoke is a special function because it does not have a matching call
      on the "receiving side". This is because it has to act as an out-of-band,
      resilient broadcast algorithm. Because of this, in this commit, in addition to
      the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.),
      we add a new CH3 packet type that will handle revoking a communicator without
      involving a matching call from the MPI layer (similar to how RMA is currently
      implemented).
      
      The thing that must be handled most carefully when revoking a communicator is
      to ensure that a previously used context ID will eventually be returned to the
      pool of available context IDs and that after this occurs, no old messages will
      match the new usage of the context ID (for instance, if some messages are very
      slow and show up late). To accomplish this, revoke is implemented as an
      all-to-all algorithm. When one process calls revoke, it will send a message to
      all other processes in the communicator, which will trigger that process to
      send a message to all other processes, and so on. Once a process has already
      revoked its communicator locally, it won't send out another wave of messages.
      As each process receives the revoke messages from the other processes, it will
      track how many messages have been received. Once it has either received a
      revoke message or a message about a process failure for each other process, it
      will release its refcount on the communicator object. After the application
      has freed all of its references to the communicator (and all requests, files,
      etc. associated with it), the context ID will be returned to the available
      pool.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      57f6ee88
  3. 01 Apr, 2013 1 commit
    • Ralf Gunter's avatar
      Add per-communicator eager threshold support. · a3c816ac
      Ralf Gunter authored
      Message transfers now respect the communicator-specific threshold.  This
      change has not been carefully checked for impact on our shared-memory
      ping-pong latency.
      
      Reviewed-by: goodell
      a3c816ac
  4. 10 Oct, 2012 1 commit
  5. 27 Jul, 2010 1 commit
    • David Goodell's avatar
      [svn-r6919] completion counter cleanup (adds MPID_cc_t) · 0a5c22ae
      David Goodell authored
      When compiled for fine-grained threading, the completion counter serves
      as a form of lockfree signalling.  As such, atomic access and memory
      barriers must be used to ensure correctness.
      
      In per-object mode, this code also contains valgrind client request annotations
      to inform Helgrind/DRD/TSan about the lockfree signalling pattern.
      
      No reviewer.
      0a5c22ae
  6. 24 Sep, 2009 1 commit
    • William Gropp's avatar
      [svn-r5368] Use the correct type in the segment calls - it must be an... · efb35431
      William Gropp authored
      [svn-r5368] Use the correct type in the segment calls - it must be an MPI_Aint, not an MPIDI_msg_sz_t, particularly when the size of MPI_Aint is changed to match MPI_Offset (the segment calls specify an MPI_Aint at the last argument, for example.  This is part of the changes needed to make attributes work properly when --with-aint-size=8 is selected
      efb35431
  7. 19 Aug, 2009 1 commit
  8. 06 May, 2009 1 commit
    • Darius Buntinas's avatar
      [svn-r4411] Fixed nemesis to correctly set vc state (instead of setting all... · 14a1e1cf
      Darius Buntinas authored
      [svn-r4411] Fixed nemesis to correctly set vc state (instead of setting all vcs to active.  Renamed MPIDI_Comm_get_vc to MPIDI_Comm_get_vc_set_active to alert caller to side-effect.  Added MPIDI_CHANGE_VC_STATE macro to set the vc state and call debugging macro.  Changed all places where vc state is changed to use this macro.  Reviewed by goodell@.
      14a1e1cf
  9. 05 Mar, 2009 1 commit
  10. 02 Nov, 2007 1 commit