- 09 Feb, 2014 4 commits
-
-
Pavan Balaji authored
The bad casting to int was losing information when the message size is larger than 2GB. Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
This works around problems with writev on some platforms (such as Mac OSX, 10.9.1) where writev hangs for large messages. Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
Using writev on larger than 2GB messages seems to cause some platforms to hang (e.g., Mac OSX, at least as of 10.9.1). This patch creates a new function that reduces the message size being transmitted. The function does not attempt to send all data, in order to avoid making it a blocking function. A higher-level function would need to check how much data is sent and retry later if needed. Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
- 06 Feb, 2014 4 commits
-
-
In the pamid layer, mpir_nbc is set to 0 as the default e.g. the scheduled NBC work will not be advanced which would cause the job hang. The NBC would work only if the schedule is advanced. To do so, a user requires to set the environment variable PAMID_MPIR_NBC to 1. With MPICH 3.1 being released and NBC is one of support items in the release, the default for mpir_nbc should be changed from 0 to 1. Signed-off-by:
Michael Blocksome <blocksom@us.ibm.com> Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
No new interfaces were added or removed. Just the source code changed. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 04 Feb, 2014 3 commits
-
-
Pavan Balaji authored
No reviewer.
-
Pavan Balaji authored
No reviewer.
-
Kenneth Raffenetti authored
No reviewer.
-
- 03 Feb, 2014 4 commits
-
-
'make testing' now will generate another output "summary.junit.xml" using junit format. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Pavan Balaji authored
No reviewer.
-
Pavan Balaji authored
-
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 02 Feb, 2014 2 commits
-
-
Pavan Balaji authored
This test does not do anything special for 4 processes. rank 0 communicates with each other process sequentially, and waits for all communication to be over before communicating with a different target. Reducing the number of processes allows us to reduce the time taken by the test without affecting any tested parameter. Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Kenneth Raffenetti authored
Return MPI_ERR_SERVICE when the name being unpublished is not found. No reviewer.
-
- 01 Feb, 2014 8 commits
-
-
Pavan Balaji authored
Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
The test suite already recognizes --disable-error-checking which is the right way to tell it to not enable error checking tests. Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
1. enable-fast=all|none should really reflect "all"/"none", not "some". We were enable some of the optimizations, but not all, e.g., mpit pvars disabling. 2. Making naming consistent and move related checks closer to each other, so they are easier to verify. 3. Don't control enable-timing, enable-mpit-pvars, and enable-error-checking from enable-fast. They have their own configure options. enable-fast is kind of weird in that it sets some of its own configure variables, but also resets variables set by other configure options making it very confusing for users. Instead we should point out in the README what users should do for performance tests. 4. Allow optimization levels like O3 to be used with other enable-fast options, such as ndebug. 5. Remove some incorrect and/or unnecessary comments. 6. Don't force default compiler optimizations with --disable-fast is given. Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Pavan Balaji authored
No reviewer.
-
Pavan Balaji authored
Signed-off-by:
Jeff Hammond <jeff.science@gmail.com>
-
When the autotools versions are not as expected error out instead of throwing a warning. In reality, only the libtool version should be critical for ABI, not the autoconf or automake version. But we want to avoid going backward in autotools versions in releases. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Pavan Balaji authored
No reviewer.
-
- 31 Jan, 2014 5 commits
-
-
Pavan Balaji authored
This patch only removes the most obvious pieces of windows code. There is certainly more windows-related code remaining. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Check that attempting to unpublish a service the nameserver does not know about returns MPI_ERR_SERVICE. See 10.4.4 in the MPI standard. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
The error class should be MPI_ERR_SERVICE, not MPI_ERR_NAME, as defined by the MPI-3 standard section 10.4.4. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
Improves error reporting in the pmi nameservice by checking return codes from the pmi server. Debug messages are also printed when enabled. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
When MPI_Unpublish_name is called on the pmi nameserver before any services have been published, a segfault will occur. Add a check to determine if anything is published so we can exit early instead. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 30 Jan, 2014 7 commits
-
-
Change the default name service system from file to pmi. The file method can be unreliable on network filesystems with client side caching. Closes #2007 Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
Pavan Balaji authored
No reviewer.
-
Pavan Balaji authored
By default we still print warnings, but if the HYDRA_LAUNCHER_SSH_ENABLE_WARNINGS environment variable is set to false, we disable it. Refs #2007. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Use type (const void *) instead of (void *) and explicitly cast to (void *) when needed in order to makes gcc -Wall happy. See #2010 Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Use MPI_Type_size_x instead of MPI_Type_size and make sure the second argument has type MPI_Count. Fixes #2010 Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Ensure the third argument of MPIR_Status_set_bytes has type MPI_Count in order to avoid integer overflow. Fixes #2010 Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Pavan Balaji authored
During finalize, we were destroying the COMM_WORLD, COMM_SELF and COMM_IWORLD communicator objects, and all other associated resources internally, before waiting for the final progress checks for incoming messages finished. This resulted in the following sequence of cleanup: 1. COMM_WORLD got cleaned up. Internally, there is a check to see if a group object has been allocated for COMM_WORLD. If there is one, it is freed up. 2. We waited for other messages to arrive. We noticed a failure at this time, so we try to create a failed process group. This uses the COMM_WORLD group internally, causing it to be created again, but with a reference count of 2, since the code assumes that the first reference count is always for the original COMM_WORLD. 3. When we try to free the world group, we notice that the reference count is 2, so we decrement the reference count and not actually free the object. Moving the check for incoming messages to happen before the communicator free, fixes this problem. Note that the PG finalization still needs to be the last step since that cleans up all the VCs as well. See #1996 Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
- 29 Jan, 2014 3 commits
-
-
Pavan Balaji authored
If the PMI process mapping string wraps around to node 0, we were creating a bad node list of which processes are local and which are not. This patch provides a hacky fix for this case by only repeating the part of the PMI mapping string from the point where it wrapped around. The patch is hacky because it assumes that seeing a start node ID of 0 means a wrap around. This is not necessarily true. A user-defined node list can use the node ID 0 without actually creating a wrap around. The reason this patch still works in this case is because Hydra creates a new node list starting from node ID 0 for user-specified nodes during MPI_Comm_spawn{_multiple}. If a different process manager searches for allocated nodes in the user-specified list, this patch will break. Fixes #2007. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
When qsort is not available, don't define comparision function and fallback to simple insertion sort implementation. In the future, a more general function with fallback should be added in MPL so it can be used in other cases like comm_split. Refs #2007 Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
Pavan Balaji authored
The original PMI process mapping parsing code had a number of assumptions that would allow it to only work on COMM_WORLD. This patch corrects these to work for dynamic processes as well. It also corrects the evaluation of the number of nodes used to be correct in the general case. Refs #2007. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-