- 13 Feb, 2015 1 commit
-
-
Xin Zhao authored
We are going to revert the commit 389aab16 because it re-ordered the attributes in RMA packet structs in mpidpkt.h and messed up the alignments. This commit temporarily reverts the following commits, which only reverts modification on mpidpkt.h after commit 389aab16. e36203c3, 45afd1fd, 3a05784f, 87acbbbe, b155e7e0 We will re-apply those modifications after we revert 389aab16 . Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 09 Feb, 2015 3 commits
-
-
Kenneth Raffenetti authored
Two libtool patches we were carrying were released in version 2.4.3. This commit drops our patches and bumps version required to run autogen.sh. Our patch for ifort on OSX is still present and updated to work with the new version. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Kenneth Raffenetti authored
Fixes a potential "arg list too long" error at make time. See https://lists.gnu.org/archive/html/bug-automake/2014-10/msg00009.html for more info. Closes #2215 Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Igor Ivanov authored
Call of MPID_Sched_cb callback function can force list memory reallocation. As a result entry point proccessed before call can become invalid. It should be set again after callback call. Signed-off-by:
Devendar Bureddy <devendar@mellanox.com> Signed-off-by:
Igor Ivanov <Igor.Ivanov@itseez.com> Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
- 08 Feb, 2015 5 commits
-
-
Xin Zhao authored
The entire "read-modify-write" should be atomic for CAS, FOP and GACC operations. This patch adds corresponding tests for them. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
FOP, CAS and GACC are atomic "read-modify-write" operations, which means when the target window is defined on a SHM region, we need inter-process lock to guarantee the atomicity of the entire "read+OP". The current implementation is correct for SHM-based RMA operations, but not correct for AM-based RMA operations: for SHM-based operations, it protects the entire "read+OP", but for AM-based operations, it only protects the "OP" part. This patch fixes this issue by protecting the memory copy to temporary buffer and computation together for AM-based operations. Fix ticket 2226 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
In commit 7d71278, if node_comm is NULL (only self process is on that node), we call allocate_no_shm() in CH3 to allocate window. If node_comm is not NULL (more than one process is on the same node), we call allocate_shm() in Nemesis to allocate SHM window. However, the exchanged information amount (in MPI_Allgather) is different in allocate_no_shm() and allocate_shm(), which leads to wrong execution when both SHM window and non-SHM window exist. This patch fixes this issue. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We allocate / free SHM regions only when node_comm exists, which means there are more than one processes on the same node. When node_comm is NULL (only self process is on that node), we call default allocate / free functions in CH3. (Please refer to commit f02eed5b ) Here we delete unnecessary code dealing with node_comm being NULL in SHM allocate / free functions. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 05 Feb, 2015 1 commit
-
-
Rajeev Thakur authored
code of MPIU_Strncpy. Added test program. Closes #2225 Signed-off-by:
William Gropp <wgropp@illinois.edu>
-
- 04 Feb, 2015 3 commits
-
-
Wesley Bland authored
Some of the MPIX functions did not have weak symbols set up correctly which causes problems on some compilers (Pathscale). This patch adds the correct attribute for all of them that were missing. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Kenneth Raffenetti authored
Merges the existing send callbacks into a single function. Uses the completion counter to track remaining operations and complete the request once finished. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Kenneth Raffenetti authored
Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 03 Feb, 2015 2 commits
-
-
set by type_create_resized are not sticky. Changes darray and subarray types to use type_create_resized instead of type_struct with explicit lb/ub, because explicit MPI_LB/MPI_UB have been removed from MPI in MPI-3 and they also cause other problems because they were defined to be sticky in MPI-1. Fixes type_create_struct, which was incorrectly setting lb and ub to true_lb and true_ub in the non-sticky case. Closes #2218 Closes #2220 Closes #2224 Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Wesley Bland authored
The errflag in the request object is used in common code and should not have been put in device specific code. This moves it up to the MPI_Request object. Signed-off-by:
Sameh Sharkawi <sssharka@us.ibm.com>
-
- 02 Feb, 2015 1 commit
-
-
Wesley Bland authored
This include was present but commented out. Normally, it wasn't needed, but to pick up the definition of MPIX_ERR_PROC_FAILED correctly, it needs to be there. No reviewer
-
- 31 Jan, 2015 1 commit
-
-
Wesley Bland authored
The previous commits didn't take into account empty requests when extracting the status. It also introduced a dumb bug that didn't get tested first about a null pointer check. Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
- 30 Jan, 2015 8 commits
-
-
Wesley Bland authored
The error code set in the status was being ignored for NBC and one-sided requests (which wasn't right anyway so it didn't matter). This grabs the error code from the status now. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The scheduler functions now use MPIC_* functions to handle their communication instead of directly calling the MPID_* functions. This helps to simplify code related to error handling and allows the collectives to complete even if a failure is detected because the error will be tracked via the errflag inside the request object. Fixes #2222 Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
Non-blocking communication requests need a way to track whether an error has occurred in a previous part of the NBC schedule. This adds an errflag to the request object itself so the tracking is possible. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
Having mpir_errflag_t defined in mpiimpl.h causes a problem if it needs to be used in some other headers. This moves the definition to mpitypedefs.h so it can be used elsewhere. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
Part of converting the NBC code to use the MPIC_* functions requires an MPIC_Issend function to exist. This adds it. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The MPIC helper functions have been using MPI_Comm and MPI_Request objects instead of their MPID_* counterparts. This leads to a bunch of unnecessary conversions back and forth between the two types of objects and makes the work incompatible with other parts of the codebase (non-blocking collectives for instance). This patch converts all of the MPIC_* functions to use MPID_Comm and MPID_Request and changes all of the collective calls to use them now too. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
The collective helper functions generally have an errflag that is used when a failure is detected to allow the collective to continue while also communicating that a failure occurred. That flag is now included as a parameter for MPIC_Wait. The rest of this commit is the refactoring necessary in the rest of the helper functions to support the change. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 27 Jan, 2015 2 commits
-
-
Jithin Jose authored
Signed-off-by:
Charles J Archer <charles.j.archer@intel.com>
-
Kenneth Raffenetti authored
The tag for send was ignored and recvtag incorrectly used in its place. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
- 23 Jan, 2015 3 commits
-
-
HDF5 folks reported a bug with ROMIO and one of their slightly-strange (but 100% legal) datatypes. git-bisect points to the "promote size of length" change. Seems that MPICH does not like struct datatypes with zero-count elements? Further investigation requred. This change (construct a simpler datatype in more cases) is sufficient to help HDF5 move forward. See #2221 Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
many many places where a 64 bit value is stored in a 32 bit value Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- bump up subtypes from 3 to 6. The limit is arbitrary. I am trying to figure out a type with 4 sub-types. - split up indexed/hindexed lists onto separate lines. MPICH debug output format adds its own newlines, but we have to clean out MPICH's extra debug output anyway: joining a few lines isn't that much more work. - output a name of the digraph that graphviz can actually parse. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 22 Jan, 2015 2 commits
-
-
Huiwei Lu authored
When process fails, fault tolerance scheme takes a different path to deal with MPI object reference counts than the existing one. Some reference counts were not properly set in FT path so when configured with --enable-g=all, some ft tests will show leaked context id, dirty COMM, GROUP and REQUEST objects and so on when exit. This patch fixes ft/shrink and ft/agree with "--enable-g=all". Stack allocated objects of requests, communicators and groups will be freed by FT. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Wesley Bland authored
MPIX_Comm_agree should not return errors if the failed processes have all been acknowledged. Previously, it was returning errors unnecessarily, but this makes sure that the errcode is MPI_SUCCESS when appropriate. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 16 Jan, 2015 1 commit
-
-
Su Huang authored
The segfault was caused by the library trying to free an already freed mpid_statp structure. The structure is freed right after the status information is printed. To fix the problem, the mpid_statp is set to NULL after the free is done. (ibm) D202018 Signed-off-by:
Sameh Sharkawi <sssharka@us.ibm.com>
-
- 15 Jan, 2015 1 commit
-
-
For some reason, there was no MPIR_Testall_impl as there is with many of the other MPI_* functions. This causes a linking problem when weak symbols are disabled and another MPI function needs to call MPI_*. This patch moves most of the MPI_Testall code into MPIR_Testall_impl and has MPI_Waitall call that function instead of MPI_Testall. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 14 Jan, 2015 3 commits
-
-
Rob Latham authored
User on OpenMPI list wanted to create a 259 character file. shared file pointer name construction used the magic '256' value to construct a full path to the hidden shared file pointer file. PATH_MAX already exists for this purpose, so use it. While there, found a few spots checking/setting PATH_MAX, so do that in one place Closes #2212 Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Rob Latham authored
Right now there's only one error condition: file name too long. This change checks return codes of ADIOI_Strncpy and informs caller. Otherwise, really long names result in buffer overruns. See #2212 Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Charles J Archer authored
Compile time fix required for OFI threading model No semantic changes Signed-off-by:
Yohann Burette <yohann.burette@intel.com>
-
- 13 Jan, 2015 2 commits
-
-
Wesley Bland authored
There was an accidental ADI breakage earlier when MPI level codes would query into the dev part of the MPID request object. This commit removes that breakage by adding a new macro into the mpiimpl.h file to portably check whether a request is anysource. For now, in pamid, this macro always evaluates to 0. This can easily be fixed by overwriting it in the pamid code, but since pamid doesn't support FT, it won't have any functional change either. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Wesley Bland authored
It was pointed out that by putting this in a macro and failing silently when unimplemented, this make things challenging for derivatives which will implement this function in the future. By moving this to an MPID level function, it becomes more obvious that the function should be implemented later. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 12 Jan, 2015 1 commit
-
-
Wesley Bland authored
This macro was used inside CH3 to determine if the communicator could be used for anysource communication. With the rewrite of the anysource fault tolerance logic, it is now necessary to use it at the MPI level. Because it is a macro and not a function, the macro is defined in mpiimple.h as (1) and then overwritten in the ch3 device. Future devices can also overwrite it if desired. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-