- 26 Jun, 2015 13 commits
-
-
We should always allocate target lock entry pool in win_init, even though info no_locks is set to TRUE during window creation, this is because that info can be set to FALSE by user after the window creation. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Since it does not help on performance. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally we always dynamically allocate a request array for the current RMA operation, since the current operation might be streamed and needs multiple requests to track each stream unit. However, in most cases where streaming is not happening, we only needs one request for each operation and does not need to dynamically allocate it. This patch optimizes such case. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally we poke the progress engine at the end of RMA sync calls if progress engine is never poked in this call before. The purpose of this is to prevent possible deadlock problem. However, the deadlock problem should only happen in self lock cases, if target is not myself, it add unnecessary overhead to RMA sync calls. In this patch, we delete those progress poking but only leave ones when target is myself. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In this patch, we add a reduce-scatter based algorithm in MPI_Win_fence, which is triggered when number of processes is at a small / medium value. When this algorithm is being used, memory usage is O(P), but the ending FENCE only needs to wait for local completion but does not need to wait for remote completion. When number of processes is large, we switch FENCE to the original barrier based algorithm, which has O(1) memory usage, but needs to wait for the remote completion in the ending FENCE. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
On target side, after we receive the GACC/FOP packet, we should first start sending back the data, then perform ACC computation. By doing this issuing data and computation can be overlapped. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
This optimization was missed in 7189bcde . Here we add this back so that when there is no iSSUED active win or passive win, we ignore the while loop in RMA progress. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally the arguments passed to MPI_Win_create in this test was wrong. This patch fixed this issue. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 25 Jun, 2015 3 commits
-
-
Pavan Balaji authored
Signed-off-by:
Halim Amer <aamer@anl.gov>
-
Pavan Balaji authored
Signed-off-by:
Halim Amer <aamer@anl.gov>
-
Halim Amer authored
-
- 24 Jun, 2015 5 commits
-
-
Xin Zhao authored
In the Nemesis implementation of Win_gather_info(), we allocate a memory region on SHM to store window information for other processes, so that all processes on the same node can share those information. However, previously the memory size was incorrectly set as O(node_comm_size), which should be O(comm_size). This patch fixed this bug. Signed-off-by:
Min Si <msi@il.is.s.u-tokyo.ac.jp> Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
The original implementation in Win_flush_local counts number of total local completion and remote completion needed to wait, and then waiting for current local/remote completion count to reach those values. There is a bug that we should initialize the current count to zero in each while loop, otherwise the targets that are already completed will be count again and we failed to wait for some targets to be completed. This patch fixes this issue. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
No reviewer. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Charles J Archer authored
Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
Rob Latham authored
No Reviewer
-
- 23 Jun, 2015 9 commits
-
-
Junchao Zhang authored
No reviewer
-
Rob Latham authored
Lisandro Dalcin <dalcinl@gmail.com> reports that mpi4py's test suite invokes MPIR_Add_finalize() 33 times. It's been 6.5 years since we doubled it, so bump it up once again. Closes: #2272 Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
Rob Latham authored
disable this memory-intensive test on 32 bit platforms No Reviewer
-
Rob Latham authored
resize and struct are the two type constructors that can set the LB and UB markers on a type. Struct, due to MPI-1 ideas, is a strange beast (they adjust only if they are lower/higher than the old ones (!) ) but for resized it's clear that the markers shift. Closes #2088 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Rob Latham authored
in deep types we might want to update the lb and ub, not simply append/prepend two tuples to the flattened representation. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Rob Latham authored
some libraries like HDF5 want to register their cleanup routines into finalize. these cleanup routines use MPI-IO, so they need to fire before ROMIO cleans up. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Halim Amer authored
Signed-off-by:
Min Si <msi@il.is.s.u-tokyo.ac.jp> Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Signed-off-by:
Sangmin Seo <sseo@anl.gov>
-
Signed-off-by:
Sangmin Seo <sseo@anl.gov>
-
- 22 Jun, 2015 4 commits
-
-
Rob Latham authored
type promotions have resulted in a change to the device layer. Ref: 1767 Signed-off-by:
Pavan Balaji <balaji@anl.gov> Signed-off-by:
Sameh S Sharkawi <sssharka@us.ibm.com>
-
Rob Latham authored
despite promoting types throughout the gather path, still had one case of constructing structs with larger-than-int blocklens. solution: borrow BigMPI strategy and construct types-of-chunks to get around limitations. Ref: #1767 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Rob Latham authored
The ongoing march towards 64-bit clean continues. Address areas where large product of two ints might have overflowed. Ref: #1767 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Rob Latham authored
- preprocessor constants need parens - which showed the "always fail" case wasn't big enough - compiler warned about variables possibly being used uninitialized Ref: #1767 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 20 Jun, 2015 3 commits
-
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 19 Jun, 2015 2 commits
-
-
Kenneth Raffenetti authored
No reviewer.
-
Rob Latham authored
commit 83253a41 triggerd a bunch of new warnings. Take a different approach. For simplicity of implementation, do_accumulate_op is defined as MPI_User_function. We could split up internal routine and user-provided routines, but that complicates the code for little benefit: Instead, keep do_accumlate_op with an int type, but check for overflow before explicitly casting. In many places the count is simply '1'. In stream processing there is an interal limit of 256k, so the assertion should never fire. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 18 Jun, 2015 1 commit
-
-
Kenneth Raffenetti authored
Encode request handles in unused tag bits, eliminating the need for hash table. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-