- 13 Nov, 2014 12 commits
-
-
Kenneth Raffenetti authored
No reviewer
-
Min Si authored
Nightly testing reported timeout on octopus with 12 timelimit. Each of them took 12:05 ~ 12:10 mins.
-
Wesley Bland authored
This tests the behavior after a failure when using revoke+shrink. Right now this test still fails so it is marked as xfail. See #2198 No reviewer
-
Antonio Pena Monferrer authored
Those were introduced for a robust protocol during development. No longer needed. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
-
Pavan Balaji authored
The user pointer was set, but later overwritten with an internal value. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Ported test programs using collective I/O in the ROMIO test directory to the nonblocking collective I/O version. They were temporarily added to the MPICH test directory to run with Jenkins and nightly tests. However, they may need to be moved to the ROMIO test directory later. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Added nonblocking version of bigtype, hindexed_io, rdwrord, and setviewcur for testing nonblocking collective I/O functions. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
This patch implemented four functions for nonblocking collective I/O, which will be added to MPI 3.1 standard. Details for these functions can be found in the MPI-Forum ticket, https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/273 . Currently, they are implemented as MPIX functions. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Wesley Bland authored
Some of the FT tests were not correctly setting their error handlers to MPI_ERRORS_RETURN. While this doesn't seem to have caused problems, it's safer to do so. This commit also cleans up some unused variables, reorders communicator creation, and correctly frees some variables to avoid some debugging output. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
Depending on how some uninitialized data was prepopulated, the ibsend check was periodically crashing out in the call to MPID_Request_is_pending_failure. Some simple sanity checking to make sure the input data wasn't NULL takes care of this. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Kenneth Raffenetti authored
The previous code only detected a datatype mismatch when the message was copied out of the unexpected queue. Now it will throw an error in both cases. We also set the error in the status object to match the default ch3 behavior. This fixed an issue where the request would not be freed and cause extra debugging output at MPI_Finalize. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 12 Nov, 2014 19 commits
-
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Pavan Balaji authored
The terminology "flow_control" was a bit of a misnomer since we do more than just enable/disable flow control based on whether messages are on the data or control portal. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Pavan Balaji authored
We now use a target structure for each target ID that we want to send data to. This allows us to separate out target-specific states and more cleanly manage operations to a single target. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Full redesign, mainly of the functions in ptl_nm.c and the communications involving the "control" portal. Still some problems with flow control. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Min Si authored
Timeout is reported on some overloaded machines with 10 minutes time limitation. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
Huiwei Lu authored
Free the group and communicator created in the test so it does not complain when memory debug is on. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Huiwei Lu authored
Fixes #1945 Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Huiwei Lu authored
Similar to d086ac27, check the state of a VC to see if it is valid before creating a group, request or communicator in MPID_Recv. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Huiwei Lu authored
MPID_Send should first check the state of a VC to see if it is valid before creating a group, request or communicator. In the case of fault tolerance, if VC has already been revoked or marked as terminated (e.g., in test/mpi/ft/senddead). The send operation evolved should exit without creating any memory objects of request, group or communicator. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Wesley Bland authored
The collective FT tests now pass with debug output turned off. See #1945 Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The MPI collectives get and set the errflag used by the collective helper functions (MPIC_*). The possible values of the errflag changed, so the collective functions need to appropriately set this value using either MPIR_ERR_NONE (MPI_SUCCESS), MPIR_ERR_PROC_FAILED (MPIX_ERR_PROC_FAILED), or MPIR_ERR_OTHER (MPI_ERR_OTHER). This should allow collectives to correctly report process failures when they occur now, fixing the FT tests that use collectives (see #1945). Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The errflag value being used in the MPIC helper functions only propagated whether or not an error occurred. It did not contain any information about what kind of error occurred, which made returning the correct error code after a process failure impossible. This patch converts the binary value to an enum with three options: MPIR_ERR_NONE MPIR_ERR_PROC_FAILED MPIR_ERR_OTHER The original use of TRUE and false maps to MPIR_ERR_NONE and MPIR_ERR_OTHER. MPIR_ERR_PROC_FAILED indicates that the error occurred because of a process failure. It uses the new bit set aside from the tag space to track such information between processes. This change required modifying lots of function signatures and type declarations to use the new enum type, but these are actually not very intrusive changes and shouldn't be a problem going forward. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
We need to take another bit from the tag space to specify the difference between a generic failure and a process failure. This patch modifies the macros to handle this situation. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Antonio Pena Monferrer authored
These are meant to hit the >1GB message size and hence test the large message case in Portals4. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Antonio Pena Monferrer authored
Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Kenneth Raffenetti authored
All MPI_Sends in the Portals4 netmod will cause some or all of the data to be sent eagerly to the receiver. Canceling a send means having to find the data in the unexpected message queue and removing it in order to preserve matching. Because the message queues exist at the netmod level, it needs its own cancel protocol. The protocol is modeled on a similar case in CH3, but with its own method for searching the unexpected queue. Custom netmod packet handlers are used to receive and process the control messages. Known Issue: Because we are using different PTs for the send and cancel message, it is possible the cancel request could arrive before the message being canceled. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 11 Nov, 2014 9 commits
-
-
Min Si authored
We should never change the ADI which is exposed to MPI layer for CH3 internal implementation. However, commit 3e005f03 changed the ADI of put/get/accumulate/get_accumulate for reusing the routine of normal RMA operations in request-based operations. This patch defines new CH3 internal functions of put/get/accumulate/get_accumulate to be reused by both normal and request-based operations and reverts the ADI change in commit 3e005f03 . Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
We already moved all functions from src/mpid/ch3/src/ch3u_rma_acc_ops.c to src/mpid/ch3/src/ch3u_rma_ops.c and deleted the previous one from Makefile.mk, here we just delete this file. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
We already use window states to specify the current state in RMA epoch, therfore the epoch states are no longer used. Here we delete those states. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
For lock type, we only need one internal value to specify cases when currently there is no passive lock issued from origin side or there is no passive lock imposed on target side. If there are passive locks, we directly use MPI_LOCK_SHARED and MPI_LOCK_EXCLUSIVE to indicate the lock type. This patch deletes redundant enum for lock types and just defines MPID_LOCK_NONE. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
It is helpful for us to find variables that are not initialized or wrongly initialized. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
MPIDI_RMA_NONE is the initial value of window state and should not be used with sync flag. The initial value of sync flag should be set to MPIDI_RMA_SYNC_NONE. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Instead of overriding malloc functions, set some hook functions only when using netmod-IB.
-