- 03 Sep, 2014 3 commits
-
-
Pavan Balaji authored
The original MPICH code was directly adding external library dependencies into LIBS. This forces configure to use these libraries, thus requiring the user to point to these libraries through LD_LIBRARY_PATH (for shared libraries). This is unnecessary since those libraries are only needed when building the executables and not the rest of MPICH. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Min Si authored
Flush should guarantee operations are finished on both origin and target side. However, flush may return before the completion on target side in MPI implementation. It makes an error in this case: P0 and P1 allocate a shared window, and P2 locks both of them; P2 first put to P0 and flush, then get the updated data from P1. The put may complete on P0 after the completion of get on P1. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Min Si authored
FLUSH should guarantee the completion of operations on both origin and target side. However, for exclusive lock, there is an optimization in MPICH which allows FLUSH to return without waiting for the acknowledgement of remote completion from the target side. It relys on the fact that there will be no other processes accessing the window during the exclusive lock epoch. However, such optimization is not correct when two processes allocating windows on overlapping SHM region. Suppose P0 and P1 (on the same node) allocate RMA window using the same SHM region, and P2 (on a different node) locks both windows. P2 first issues a PUT and FLUSH to P0, then issues a GET to P1 on the same memory location with PUT, since FLUSH does not guarantee the remote completion of PUT, GET operation may not get the updated value. This patch disables the optimization for FLUSH and forces FLUSH to always wait for the remote completion of operations. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 02 Sep, 2014 6 commits
-
-
Rob Latham authored
No reviewer
-
Rob Latham authored
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Rob Latham authored
A flag like this would have warned about #2160 at compile time. See #2060 Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Do not reset temp buffer. Upper layer(MPID_nem_send_iov) might allocate a temp buffer(MPIDI_CH3U_SRBuf_alloc) and set a flag. This buffer will get freed at the time of request free. Removed assertion is not valid for RMA operations. Signed-off-by:
Devendar Bureddy <devendar@mellanox.com> Signed-off-by:
Igor Ivanov <Igor.Ivanov@itseez.com>
-
Wesley Bland authored
All of the FT tests will stay xfail for a while until I can figure out what's causing all of the nasty debug output. No reviewer
-
Pavan Balaji authored
This release only includes source-code changes. No new interfaces were added or removed. Note that the FT functions are added as MPIX_ functions, which are not included in the ABI string. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 01 Sep, 2014 1 commit
-
-
Pavan Balaji authored
-
- 29 Aug, 2014 3 commits
-
-
Kenneth Raffenetti authored
Fixes dependency issues in the testsuite. Some tests use functions that might be located in additional libraries (e.g. on Solaris). Use AC_SEARCH_LIBS to find and add them when necessary. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Kenneth Raffenetti authored
When a platform supports inter-library dependecies, remove MPICH dependencies from the compile wrappers. Specifying them can confuse the linker and cause a run-time "library not found" error for the resulting binary. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Wesley Bland authored
I accidentally introduced a compiler warning in `a184bd01 `. This checks the value of mpi_errno after using it to silence the warning. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 28 Aug, 2014 2 commits
-
-
Marked a variable as unused, since the variable is only being used for an assertion. This is less intrusive than enclosing the affected code within #ifdefs looking for the definition of the NDEBUG marco. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Perform CAS on a lock variable for HCAs using IB HCA when unlocking it, not issue CPU store instruction on it, because CPU cannot safely unlock it since CAS with PCI device and CPU is not supported with the combination of Mellanox ConnectX-3 and Intel IvyBridge.
-
- 27 Aug, 2014 8 commits
-
-
Xin Zhao authored
The added test mutex_bench_shm_ordered causes deadlock when MPIR_PARAM_CH3_ODD_EVEN_CLIQUES is set to 1. It is modified from mutex_bench_shm by changing the work distribution pattern from odd/even ranks to continuous ranks. See #2127. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Antonio J. Pena authored
This was making our FreeBSD 32 bit platform unhappy, causing segfaults. Fixes #2160 Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Norio Yamaguchi authored
After one thread finishes processing all operations in the ops list, a new RMA operation may be enqueued by another thread in MPID_Progress_wait(). In such case, it has not got issued yet and we should avoid processing it at end of synchronization calls. This situation occurred when running test/mpi/threads/rma/multirma.c Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Kenneth Raffenetti authored
PtlNIInit can optionally return the limitations of a network interface. Get these limits so we can account for things like the max_msg_size. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Kenneth Raffenetti authored
Similar to [494f597b ], take the datatype offset into account during non-contiguous operations in the Portals4 netmod. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Pavan Balaji authored
We were not cleaning up some of the soft links created during make uninstall. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Pavan Balaji authored
We were trying to clean up mpic++ (which is a soft link to mpicxx) during "make clean". This is incorrect since mpic++ is located in the install directory, which we should not touch during make clean. This patch moves such cleanup to make uninstall. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Pavan Balaji authored
Most of the hcoll code is in a separate directory, expect for a few changes in mainline mpich. 1. The comm structure stores some hcoll specific data structures. 2. The nemesis and sock progress engines need to poke the hcoll progress. 3. CH3 added comm creation hooks into hcoll. Signed-off-by:
Devendar Bureddy <devendar@mellanox.com> Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 26 Aug, 2014 5 commits
-
-
Pavan Balaji authored
Signed-off-by:
Devendar Bureddy <devendar@mellanox.com>
-
Pavan Balaji authored
Move device specific comm structure components to 'dev'. Move channel-specific code into a different channel structure. The context stores MXM-specific information for sending/receiving data. Signed-off-by:
Devendar Bureddy <devendar@mellanox.com>
-
Pavan Balaji authored
The mxm netmod was using getenv directly instead of going through the CVAR interface. This disallows users from controlling it through MPI_T. Signed-off-by:
Devendar Bureddy <devendar@mellanox.com>
-
Also move device specific comm structure components to 'dev' to clean up the naming a bit. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Put a barrier inside initialization and finalization phases to make sure all ranks call mxm_ep_connect and mxm_ep_disconnect before and after mxm connections readiness. Signed-off-by:
Igor Ivanov <Igor.Ivanov@itseez.com> Change-Id: I99d41517c58ce9767735a8eccc21ad360e824bf8 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 25 Aug, 2014 7 commits
-
-
Wesley Bland authored
When searching for a corresponding comm_ptr, we should also check the node_comm and node_roots_comm if they exist. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
When revoking a communicator, if it has node aware communicators attached to it (as node_comm and node_roots_comm), revoke those as well. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
When a message is received, if the communicator has already been revoked, we shouldn't bother keeping the message since it's now invalid (unless its for an AGREE or SHRINK request). Instead, just drop the request and return a null request to signal the calling function that the request was ignored. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
For some reason, the error case code between MPIDI_Request_create_rreq and MPIDI_Request_create_null_rreq was different. This is odd, because both macros take FAIL_ as an argument which is executed directly in the error case of create_rreq, but not in null_req. This commit makes the two act the same and updates the only two calls to the function that existed in the code. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
After some more testing on fusion, some problems transmitting the failed procs bitarray sprang up. This seems to solve those problems now. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Huiwei Lu authored
barrier_smp_intra completes the barrier in two steps, first for intra smp nodes, then for inter smp nodes. It uses an additional node_comm for intra smp barrier. This node_comm should also be cancelled inside MPIDI_CH3U_Clean_recvq when communicator is revoked. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Wesley Bland authored
The receive queue had some hacky ways of reporting errors related to process failure that didn't really match up with the way the codes should be returned correctly. This patch sets the correct error class in the correct place and doesn't require extra logic in dequeue_and_set_error to set the class itself. This seems to get a couple of the tests to pass in non-debug mode. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 22 Aug, 2014 3 commits
-
-
Rob Latham authored
instead of every process reading and processing the hint config file, one process will read and broadcast, then everyone will process. Yay! fewer file system calls! Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com>
-
Rob Latham authored
We'll just over-allocate enough space for a bunch of hints, instead of trying to get it exactly right. Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com>
-
Rob Latham authored
There is no need for the non-aggregators to immediately open the file before calling sync -- if they have not opened the file, they will have no data to flush. Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com>
-
- 21 Aug, 2014 2 commits
-
-
Pavan Balaji authored
No reviewer.
-
Junchao Zhang authored
Checks whether the compiler supports intrinsic storage_size() and non-bind(C) argument x in C_FUNLOC(x). Added them since IBM XLF 15.1 fails on these two tests.
-