1. 12 Nov, 2014 9 commits
    • Huiwei Lu's avatar
      Adds checking VC state in MPID_Recv · af8b2d04
      Huiwei Lu authored
      
      
      Similar to d086ac27, check the state of a VC to see if it is valid
      before creating a group, request or communicator in MPID_Recv.
      Signed-off-by: default avatarWesley Bland <wbland@anl.gov>
      af8b2d04
    • Huiwei Lu's avatar
      Adds checking VC state in MPID_Send · a91f178a
      Huiwei Lu authored
      
      
      MPID_Send should first check the state of a VC to see if it is valid
      before creating a group, request or communicator.
      
      In the case of fault tolerance, if VC has already been revoked or marked
      as terminated (e.g., in test/mpi/ft/senddead). The send operation
      evolved should exit without creating any memory objects of request,
      group or communicator.
      Signed-off-by: default avatarWesley Bland <wbland@anl.gov>
      a91f178a
    • Wesley Bland's avatar
      Mark collective FT tests as passing · 96deece2
      Wesley Bland authored
      
      
      The collective FT tests now pass with debug output turned off.
      
      See #1945
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      96deece2
    • Wesley Bland's avatar
      Correctly handle errflag in MPI collectives · 47f62b0c
      Wesley Bland authored
      
      
      The MPI collectives get and set the errflag used by the collective
      helper functions (MPIC_*). The possible values of the errflag changed,
      so the collective functions need to appropriately set this value using
      either MPIR_ERR_NONE (MPI_SUCCESS), MPIR_ERR_PROC_FAILED
      (MPIX_ERR_PROC_FAILED), or MPIR_ERR_OTHER (MPI_ERR_OTHER).
      
      This should allow collectives to correctly report process failures when
      they occur now, fixing the FT tests that use collectives (see #1945).
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      47f62b0c
    • Wesley Bland's avatar
      Change errflag to be an enum · 3850e6bf
      Wesley Bland authored
      
      
      The errflag value being used in the MPIC helper functions only
      propagated whether or not an error occurred. It did not contain any
      information about what kind of error occurred, which made returning the
      correct error code after a process failure impossible.
      
      This patch converts the binary value to an enum with three options:
      MPIR_ERR_NONE
      MPIR_ERR_PROC_FAILED
      MPIR_ERR_OTHER
      
      The original use of TRUE and false maps to MPIR_ERR_NONE and
      MPIR_ERR_OTHER.
      
      MPIR_ERR_PROC_FAILED indicates that the error occurred
      because of a process failure. It uses the new bit set aside from the tag
      space to track such information between processes.
      
      This change required modifying lots of function signatures and type
      declarations to use the new enum type, but these are actually not very
      intrusive changes and shouldn't be a problem going forward.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      3850e6bf
    • Wesley Bland's avatar
      Take a bit in the tag space for proc failure · 46f59276
      Wesley Bland authored
      
      
      We need to take another bit from the tag space to specify the difference
      between a generic failure and a process failure. This patch modifies the
      macros to handle this situation.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      46f59276
    • Antonio Pena Monferrer's avatar
      Added large message cases to getfence1/putfence1 · 316ac29b
      Antonio Pena Monferrer authored
      
      
      These are meant to hit the >1GB message size and hence test the large
      message case in Portals4.
      Signed-off-by: default avatarWesley Bland <wbland@anl.gov>
      316ac29b
    • Antonio Pena Monferrer's avatar
      9d7d493b
    • Kenneth Raffenetti's avatar
      portals4: implement cancel send · b56f4f1d
      Kenneth Raffenetti authored
      
      
      All MPI_Sends in the Portals4 netmod will cause some or all of the data to be
      sent eagerly to the receiver. Canceling a send means having to find the data in
      the unexpected message queue and removing it in order to preserve matching.
      Because the message queues exist at the netmod level, it needs its own cancel
      protocol.
      
      The protocol is modeled on a similar case in CH3, but with its own method
      for searching the unexpected queue. Custom netmod packet handlers are used to
      receive and process the control messages.
      
      Known Issue:
        Because we are using different PTs for the send and cancel message, it is
        possible the cancel request could arrive before the message being canceled.
      Signed-off-by: default avatarAntonio Pena Monferrer <apenya@mcs.anl.gov>
      b56f4f1d
  2. 11 Nov, 2014 11 commits
  3. 10 Nov, 2014 3 commits
  4. 08 Nov, 2014 2 commits
  5. 07 Nov, 2014 6 commits
  6. 06 Nov, 2014 9 commits