- 13 Nov, 2014 24 commits
-
-
Xin Zhao authored
Here we wrap up common action when one RMA op is finished on target into a function to make code structure cleaner. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Originally do_accumulate_op() only accepts request pointer as argument which is too restrict to be reused. Here we modify it to access buffer address, count, datatype and op, so that it can be reused in more general cases. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we initalize packet flag as FLAG_NONE when creating this packet, and add flags later when needed. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When operation pending list and request lists are all empty, FLUSH message needs to be sent by origin only when origin issued PUT/ACC operations since the last synchronization calls, otherwise origin does not need to issue FLUSH at all and does not need to wait for FLUSH ACK message. Similiarly, origin waits for ACK of UNLOCK message only when origin issued PUT/ACC operations since the last synchronization calls. However, UNLOCK message always needs to be sent out because origin needs to unlock the target process. This patch avoids issuing unnecessary FLUSH / FLUSH ACK / UNLOCK ACK messages. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Junchao Zhang authored
It makes testlist.in files more flexible and easier to read Signed-off-by:
Sangmin Seo <sseo@anl.gov>
-
Junchao Zhang authored
No review since F08 binding is experimental now.
-
Junchao Zhang authored
No review since F08 binding is experimental now.
-
Junchao Zhang authored
Without doing so, the script wrongly thinks #ifdef etc. are part of a subroutine's prototype line. No review since F08 binding is experimental now.
-
Junchao Zhang authored
No review since F08 binding is experimental now.
-
Sangmin Seo authored
Stack variables should not have been used as sendbuf for MPI_Iallgather because we do not wait the completion of MPI_Iallgather in caller functions. This fix moved them to the struct used for keeping track of operation state and uses variables in the struct for MPI_Iallgather. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Kenneth Raffenetti authored
Helps clarity since we no longer use ACKs in the netmod code. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Antonio Pena Monferrer authored
The rportals layer is taking care of retransmissions, so we should only be interested in delivery events in the netmod layer. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Kenneth Raffenetti authored
No reviewer
-
Min Si authored
Nightly testing reported timeout on octopus with 12 timelimit. Each of them took 12:05 ~ 12:10 mins.
-
Wesley Bland authored
This tests the behavior after a failure when using revoke+shrink. Right now this test still fails so it is marked as xfail. See #2198 No reviewer
-
Antonio Pena Monferrer authored
Those were introduced for a robust protocol during development. No longer needed. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
-
Pavan Balaji authored
The user pointer was set, but later overwritten with an internal value. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Ported test programs using collective I/O in the ROMIO test directory to the nonblocking collective I/O version. They were temporarily added to the MPICH test directory to run with Jenkins and nightly tests. However, they may need to be moved to the ROMIO test directory later. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Added nonblocking version of bigtype, hindexed_io, rdwrord, and setviewcur for testing nonblocking collective I/O functions. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
This patch implemented four functions for nonblocking collective I/O, which will be added to MPI 3.1 standard. Details for these functions can be found in the MPI-Forum ticket, https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/273 . Currently, they are implemented as MPIX functions. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Wesley Bland authored
Some of the FT tests were not correctly setting their error handlers to MPI_ERRORS_RETURN. While this doesn't seem to have caused problems, it's safer to do so. This commit also cleans up some unused variables, reorders communicator creation, and correctly frees some variables to avoid some debugging output. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
Depending on how some uninitialized data was prepopulated, the ibsend check was periodically crashing out in the call to MPID_Request_is_pending_failure. Some simple sanity checking to make sure the input data wasn't NULL takes care of this. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Kenneth Raffenetti authored
The previous code only detected a datatype mismatch when the message was copied out of the unexpected queue. Now it will throw an error in both cases. We also set the error in the status object to match the default ch3 behavior. This fixed an issue where the request would not be freed and cause extra debugging output at MPI_Finalize. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 12 Nov, 2014 16 commits
-
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Pavan Balaji authored
The terminology "flow_control" was a bit of a misnomer since we do more than just enable/disable flow control based on whether messages are on the data or control portal. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Pavan Balaji authored
We now use a target structure for each target ID that we want to send data to. This allows us to separate out target-specific states and more cleanly manage operations to a single target. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Full redesign, mainly of the functions in ptl_nm.c and the communications involving the "control" portal. Still some problems with flow control. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Min Si authored
Timeout is reported on some overloaded machines with 10 minutes time limitation. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
Huiwei Lu authored
Free the group and communicator created in the test so it does not complain when memory debug is on. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Huiwei Lu authored
Fixes #1945 Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Huiwei Lu authored
Similar to d086ac27, check the state of a VC to see if it is valid before creating a group, request or communicator in MPID_Recv. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Huiwei Lu authored
MPID_Send should first check the state of a VC to see if it is valid before creating a group, request or communicator. In the case of fault tolerance, if VC has already been revoked or marked as terminated (e.g., in test/mpi/ft/senddead). The send operation evolved should exit without creating any memory objects of request, group or communicator. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Wesley Bland authored
The collective FT tests now pass with debug output turned off. See #1945 Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The MPI collectives get and set the errflag used by the collective helper functions (MPIC_*). The possible values of the errflag changed, so the collective functions need to appropriately set this value using either MPIR_ERR_NONE (MPI_SUCCESS), MPIR_ERR_PROC_FAILED (MPIX_ERR_PROC_FAILED), or MPIR_ERR_OTHER (MPI_ERR_OTHER). This should allow collectives to correctly report process failures when they occur now, fixing the FT tests that use collectives (see #1945). Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The errflag value being used in the MPIC helper functions only propagated whether or not an error occurred. It did not contain any information about what kind of error occurred, which made returning the correct error code after a process failure impossible. This patch converts the binary value to an enum with three options: MPIR_ERR_NONE MPIR_ERR_PROC_FAILED MPIR_ERR_OTHER The original use of TRUE and false maps to MPIR_ERR_NONE and MPIR_ERR_OTHER. MPIR_ERR_PROC_FAILED indicates that the error occurred because of a process failure. It uses the new bit set aside from the tag space to track such information between processes. This change required modifying lots of function signatures and type declarations to use the new enum type, but these are actually not very intrusive changes and shouldn't be a problem going forward. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
We need to take another bit from the tag space to specify the difference between a generic failure and a process failure. This patch modifies the macros to handle this situation. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-