- 16 Apr, 2015 2 commits
-
-
Pavan Balaji authored
Ticket #2243 was resolved as a duplicate of ticket #2183. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Pavan Balaji authored
These tests seem to use a lot of memory per process, causing us to hit swap space when running with too many processes. Reducing it to two processes, allows this test to run on more machines. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 15 Apr, 2015 3 commits
-
-
Pavan Balaji authored
We increased the number of cases the bcast test was running in [e01a20b6]. This is causing it to timeout on some platforms, where the test now seems to take close to 3 minutes. This increased timeout should be sufficient on those platforms. No reviewer.
-
Charles J Archer authored
* Rename MPIR_CVAR_DUMP_PROVIDERS to MPIR_CVAR_OFI_DUMP_PROVIDERS * Add MPIR_CVAR_OFI_USE_PROVIDER, which takes a string to desired provider name
-
Sameh Sharkawi authored
This commit includes multiple fixes: - Fixes for MPI_IN_PLACE checking. cudaGetPointerAttributes returns true on MPI_IN_PLACE which causes issues. Now we check on MPI_IN_PLACE before passing pointer to cuda. - Enabling PAMID geometries (in order to get to PAMID collectives) when MP_CUDA_AWARE=yes. This allows for intercepting CUDA buffer. - Disabling FCA when MP_CUDA_AWARE=yes if user enables FCA. - Copying user recv buffer into temp recv host buffer before collective starts, especially in MPI_IN_PLACE cases. (ibm) D203255 Signed-off-by:
Tsai-Yang (Alan) Jea <tjea@us.ibm.com>
-
- 14 Apr, 2015 2 commits
-
-
Min Si authored
The linker on Darwin does not allow common symbols, thus libtool adds the -fno-common option by default for shared libraries. However, the common symbols defined in different shared libraries and object files still can not be treated as the same symbol. For example: with gfortran, the same common block in the shared libraries and the object files will have different memory locations separately; with ifort, the same common block in different shared libraries will get the same memory location but still get a different location in the object file. The -Wl,-commons,use_dylibs option asks linker to check dylibs for definitions and use them to replace tentative definitions(commons) from object files, thus it solves the issue of the common symbol mismatch between the object file and the dylibs (i.e., by setting the address of a common symbol to the place located in the first dylib that is linked with the object file and contains this symbol). It needs to be added only in the linking stage for the final executable file. The -flat-namespace option allows linker to unify the same common symbols in different dylibs. It needs to be added in linking stage for both the shared library and the final executable file. (see man ld for their definition) Although gfortran works fine by only adding -flat-namespace, and ifort works by only adding -Wl,-commons,use_dylibs, we should add both options here as a generic solution to make sure everything safe. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Charles J Archer authored
-
- 11 Apr, 2015 1 commit
-
-
Pavan Balaji authored
When we skip non-comm-world communicators, we need to make sure to skip them. No reviewer.
-
- 10 Apr, 2015 7 commits
-
-
Kenneth Raffenetti authored
Changes the value of various static limits in the Portals4 netmod, based on experimentation results and suggestions from collaborators. 1. Bump most ni_limits from 32K to 64K. These limits relate closely to queue depth. We can reasonably expect to support a queue depth of 64K. 2. Limit issued origin events to 500. This translates to sending ~250 operations to Portals at a time, which over IB is roughly the saturation point. TODO: turn this into a CVAR. 3. Limit per target issued operations to 50. This will give the target a better chance to process events without being overwhelmed by a single process. TODO: turn this into a CVAR, also. 4. Allocate more buffer space for incoming control messages. Observed results, especially with larger messages, showed that more buffer space cuts down on flow-control events. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Kenneth Raffenetti authored
The 2 commits being reverted introduced a "safe" PtlMEAppend function that would call MPID_nem_ptl_poll to process some events in case there was no space to append the match list entry. However the poll function is not reentrant safe, which could lead to ordering problems. The increased list entry limit from [c6c0d6f6 ] should prevent PTL_NO_SPACE errors from happening, except in the extreme case. If we still find we are hitting this error, a proper fix can be done in the Rportals layer. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Charles J Archer authored
-
Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
The current number of combinations we are checking are too many, causing the test to take too long on some platforms. This patch simplifies the test, so we build two versions of the test. In the first version, we run only on COMM_WORLD but go through all datatypes. In the second version, we run on all communicators, but go through only a small subset of datatypes. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
1. Renamed bcast2 to bcast. 2. White-space cleanup for bcast.c Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
This test is exactly the same as bcast2. Originally these two tests were different, but over time they have become essentially the same. There's no point testing the same thing twice. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 09 Apr, 2015 1 commit
-
-
Antonio Pena Monferrer authored
The datatype size was checked outside the appropriate branches in a couple of places Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 08 Apr, 2015 2 commits
-
-
Antonio J. Pena authored
This reverts commit b47d95f7.
-
Kenneth Raffenetti authored
The previous design for MPICH control messages utilized a small set of "use once" buffers that could be quickly exhausted. The new approach processes all control messages via an unexpected queue. Benefits are a larger incoming message capacity, leading to less flow-control events. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 07 Apr, 2015 14 commits
-
-
Norio Yamaguchi authored
Also change individual author to organization names. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Norio Yamaguchi authored
Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Kenneth Raffenetti authored
Use the VC private area to track outstanding send operations. This way, when a VC close packet comes in, we wait until all remaining operations are complete before closing locally. This allows for a simpler netmod finalize function where we are sure the network is safe to shutdown. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Antonio J. Pena authored
The tests were modifying local buffers without locking them after window creation, causing potential race conditions. I've moved the buffer initialization to be performed before the global window is created. These tests were failing due to incorrect results in Jenkins whith async enabled. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
Antonio J. Pena authored
The datatypes shouldn't be released until we make sure that there are no more remote operations using that datatype. I've changed several tests to release the datatype after a barrier. To avoid introducing a barrier in every iteration, and aiming to stress out a little more, I've restructured the tests so that the datatypes are not created and freed every iteration. This was causing intermittent segfaults mainly with async enabled. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 06 Apr, 2015 1 commit
-
-
Sameh Sharkawi authored
(ibm) D203212 Signed-off-by:
Coffman <pkcoff@bldlnx65.pok.stglabs.ibm.com> Signed-off-by:
Sameh Sharkawi <sssharka@us.ibm.com>
-
- 03 Apr, 2015 7 commits
-
-
Rob Latham authored
Instead of creating window at open time (depending on hints), let's deferr the window creation until we need it. Signed-off-by:
Paul Coffman <pkcoff@us.ibm.com>
-
Optimization to use the PAMI_Rput_typed / PAMI_Rget_typed call in the case where PAMID MPI_Put / MPI_Get is called with a derived (non-contiguous) datatype. Instead of breaking the MPI datatype up into contiguous chunks on the MPICH side and repeatedly calling PAMI_Rput / PAMI_Rget for each chunk with the associated overhead, create a PAMI datatype to represent the MPI derived type and make just 1 call to PAMI_Rput_typed / PAMI_Rget_typed. We deal with non-contiguous buffers by avoiding packing and using origin buffers (as in PAMI) Guarded by the PAMID_TYPED_ONESIDED environment variable. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Added support to additionally run two-phase aggregation which has the read-modify-write capability in cases where the one-sided write aggregation encounters holes in the data. Additon of two new environment variables (GPFSMPIO_ONESIDED_NO_RMW, GPFSMPIO_ONESIDED_INFORM_RMW) to control this behavior and inform the user. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
read-modify-write for holes at the beginning Added support to correctly handle a data pattern that has a hole only at the beginning of the file offset range to essentially ignore the hole and begin writing at the first offset with actual data, thereby avoiding the need for a read-modify-write. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
source buffer bug fixes The CESM climate model decomps for fill-value support exposed several bugs in the algorithm related to non-contiguous source buffers which have been fixed. Those issues include: Mishandling of ranks with no data. Miscalculations of the source buffer offsets utilizing the flattened buffer mechanisms. Mishandling of negative source buffer offsets. Inefficient and inaccurate memory management of temporary buffers used to collect non-contigous chunks for a given file offset. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Code to enable the usage of the optimized one-sided collective IO aggregation algorithm from the ADIOI_GPFS_WriteStridedColl and ADIOI_GPFS_ReadStridedColl functions. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Optimized collective IO algorithm for GPFS to replace the existing two-phase algorithm with one utilizing one-sided MPI_Put and MPI_Get. Significant performance and memory optimization possible for certain workloads. Guarded by GPFSMPIO_AGGMETHOD environment variable, see ad_gpfs_tuning.c for details. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-