1. 26 Nov, 2014 1 commit
  2. 12 Nov, 2014 1 commit
    • Wesley Bland's avatar
      Change errflag to be an enum · 3850e6bf
      Wesley Bland authored
      
      
      The errflag value being used in the MPIC helper functions only
      propagated whether or not an error occurred. It did not contain any
      information about what kind of error occurred, which made returning the
      correct error code after a process failure impossible.
      
      This patch converts the binary value to an enum with three options:
      MPIR_ERR_NONE
      MPIR_ERR_PROC_FAILED
      MPIR_ERR_OTHER
      
      The original use of TRUE and false maps to MPIR_ERR_NONE and
      MPIR_ERR_OTHER.
      
      MPIR_ERR_PROC_FAILED indicates that the error occurred
      because of a process failure. It uses the new bit set aside from the tag
      space to track such information between processes.
      
      This change required modifying lots of function signatures and type
      declarations to use the new enum type, but these are actually not very
      intrusive changes and shouldn't be a problem going forward.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      3850e6bf
  3. 20 Oct, 2014 1 commit
  4. 31 Jul, 2014 2 commits
    • Wesley Bland's avatar
      Change MPID_Comm_valid_ptr to optionally ignore revoke · 05cb62bd
      Wesley Bland authored
      
      
      Adds a parameter to MPID_Comm_valid_ptr to take a second parameter that will
      either cause the macro to ignore the revoke flag or not.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      05cb62bd
    • Wesley Bland's avatar
      Add MPIX_Comm_shrink functionality · 5be10ce9
      Wesley Bland authored
      
      
      This adds a new function MPIX_COMM_SHRINK. This is a communicator creation
      function that creates a new communicator based on a previous communicator, but
      excluding any failed processes.
      
      As part of the operation, the shrink call needs to perform an agreement to
      determine the group of failed processes. This is done using the algorithm
      published by Hursey et al. in his EuroMPI '12 paper.
      
      The list of failed processes is collected using a bit array. This happens via
      a few new functions in the CH3 layer to create and send a bitarry to the
      master process and receive an updated bitarray. Obviously, this is not a very
      scalable implementation yet, but something better can easily be plugged in
      here to replace the naïve implementation. This is also a use case for an
      MPI_Recv_reduce for future reference.
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
      5be10ce9