- 28 Nov, 2014 2 commits
-
-
Pavan Balaji authored
Also remove the function prototype declaration since it is not used out-of-order. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
- 26 Nov, 2014 7 commits
-
-
Wesley Bland authored
This test was left out of the testlist for some reason No reviewer
-
Wesley Bland authored
The function to convert the group of failed procs to a bitarray was incorrectly quiting early if one of the globally known failed processes was not in the communciator being dealt with. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Wesley Bland authored
Since pamid doesn't include any of the fault tolerance functions, it should never say that a message is pending failure. We also can't call abort in here since the function is usind at the MPI layer. Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com>
-
Kenneth Raffenetti authored
MPICH now behaves correctly for this test. There is no reason for it to output " No errors", since the only thing we are testing for is that it does not timeout. We also use a non-zero error code in MPI_Abort to fit the requirements of the test runner. Closes #1537 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Kenneth Raffenetti authored
If a fatal error occurs, pass the MPI error code to MPID_Abort. To ensure non-zero exit status with dynamic error codes, we set the first available dynamic error class to 1. #Refs 1537 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Kenneth Raffenetti authored
Implement abort in the Hydra PMI server and modify simple PMI to send an abort command. Previously, we just exited the calling process and relied on the process manager to detect it and cleanup the rest of the job. Refs #1537 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Kenneth Raffenetti authored
We simply use PMI_Abort in both the sock and nemesis code. Remove extra functions and constants that are not useful. Refs #1537 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 24 Nov, 2014 3 commits
-
-
ROMIO GPFSMPIO_P2PCONTIG threaded read needs to toggle first read buffer When using both the GPFSMPIO_P2PCONTIG and GPFSMPIO_PTHREADIO optimizations there was a correctness bug when reading where for the first round the read buffer did not toggle to the two-phase buffer for the pthread reader, resulting in diseminating the data from the wrong buffer. The fix is to do the toggle after the first read. Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com> Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Xin Zhao authored
It is possible that a request handler of RMA request is called for the second time inside the first called request handler on the same request. Consider the following case: a req is queued up in Nemesis SHM queue with ref count of 2: one is for request completion and another is for dequeueing from SHM queue. The first called req handler completed this request and decrement ref count to 1. This request is still in the queue. However, within this handler, we trigger the same req handler on the same request again (for example making progress on SHM queue), and the second called handler also tries to complete this request, which leads to the wrong execution. In this patch we check if request has already been completed when entering the req handler, to prevent processing the same request twice. We also move the function finish_op_on_target() (where the same req handler can be triggered again) after request completion routine, so that we can mark the current request as completed before enter the same req handler for the second time. Fix #2204 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Update the use of DOCTEXT to match the rest of MPICH, including adding -nolocation (drop the location of the source file from the documentation) and ensure that the mpi.cit file contains the I/O routines as well as the others (this file can be used to add links to the man pages in other documents). Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
- 23 Nov, 2014 2 commits
-
-
Min Si authored
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Min Si authored
Three datatype test levels are defined: basic,min,full(default full). The default level can be overwritten in runtime by setting environment variable MPITEST_DATATYPE_TEST_LEVEL. An MPI test can also specify different level for each datatype loop by calling corresponding datatype test initialization function before that loop, otherwise the default version is used. Basic : MTestInitBasicDatatypes Minimum : MTestInitMinDatatypes Full : MTestInitFullDatatypes Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 21 Nov, 2014 2 commits
-
-
* Implements a tag matching interface netmod over the OFIWG Scalable Fabric Interfaces (SFI)
-
At the end of MPIDI_Init_collsel_extension in the pami device init code mpid_init.c there is logic to disable the optimized collectives based on criteria that is invalid on BGQ but was nonetheless always evaluating to true and disabling the optimized collectives on BGQ. Compiler directives were placed around the logic to avoid this code for the BGQ platform. Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com> Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
- 20 Nov, 2014 2 commits
-
-
Min Si authored
This program creates window with shm window buffer and checks the correctness of RMA operations issued through that window. It generates two tests with and without alloc_shm info, in which operations are issued out as SHM OP and as AM respectively. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
Min Si authored
If user does not explicitly set alloc_shm to TRUE in win_create, we should never detect SHM windows because of expensive overhead. However, current code does not check this info flag. This patch fixed it. Closes #2161 Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 19 Nov, 2014 3 commits
-
-
Kenneth Raffenetti authored
Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Kenneth Raffenetti authored
Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Kenneth Raffenetti authored
It is possible that PtlMEAppend can return a PTL_NO_SPACE error, meaning there are too many outstanding operations already active. To avoid an abort we simply retry after processing events that have queued up locally. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 18 Nov, 2014 4 commits
-
-
Junchao Zhang authored
No reviewer
-
Junchao Zhang authored
No reviewer
-
Junchao Zhang authored
No reviewer
-
Junchao Zhang authored
It should be MPI_DATATYPE_NULL. MPI does not have MPI_TYPE_NULL. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 17 Nov, 2014 1 commit
-
-
Kenneth Raffenetti authored
-
- 14 Nov, 2014 9 commits
-
-
Min Si authored
Some overloaded nightly test nodes use almost 20 minutes for running these tests. We increase their time limit for now to easily figure out other bugs reported by nightly test.
-
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
The ABI string is set to 0:0:0 since it's a pre-release. No guarantees on ABI compatibility. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Antonio Pena Monferrer authored
Going from a macro to a function fixes the issue because of creating a copy of the pointer. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Kenneth Raffenetti authored
No reviewer
-
Pavan Balaji authored
Now, when we pop an event, we queue up the buddy event (e.g., ACK for SEND) to return next. This way, we don't need to search for the event everytime. Since we know that there'll be at most one such pending event, we maintain a single event structure for this. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Pavan Balaji authored
We were stashing events when the origin receives a NACK. This is unnecessary since we retransmit the op and never use those stashed events. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Pavan Balaji authored
1. Moved op management to a different file. 2. Move rptl_info to an extern, so it can be shared by multiple files. 3. Separate out rptl initialization routines. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 13 Nov, 2014 5 commits
-
-
Xin Zhao authored
ReqHandler_GaccumLikeSendComplete is used for GACC-like operations, including GACC, CAS and FOP. Here we split it into following three functions: ReqHandler_GaccumSendComplete ReqHandler_CASSendComplete ReqHandler_FOPSendComplete It is convenient for us to add different actions in future for those three kinds of operations. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we wrap up common action when one RMA op is finished on target into a function to make code structure cleaner. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Originally do_accumulate_op() only accepts request pointer as argument which is too restrict to be reused. Here we modify it to access buffer address, count, datatype and op, so that it can be reused in more general cases. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we initalize packet flag as FLAG_NONE when creating this packet, and add flags later when needed. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When operation pending list and request lists are all empty, FLUSH message needs to be sent by origin only when origin issued PUT/ACC operations since the last synchronization calls, otherwise origin does not need to issue FLUSH at all and does not need to wait for FLUSH ACK message. Similiarly, origin waits for ACK of UNLOCK message only when origin issued PUT/ACC operations since the last synchronization calls. However, UNLOCK message always needs to be sent out because origin needs to unlock the target process. This patch avoids issuing unnecessary FLUSH / FLUSH ACK / UNLOCK ACK messages. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-