- 26 Jun, 2015 22 commits
-
-
When number of processes is only 1, we do not need to schedule the current NBC communication but can just return a REQUEST_NULL request handle. This patch fixes this issue. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Here we add internal function flush_local_all and flush_all, so that Win_fence/Win_complete can just call them internally. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
After Win_flush_local/Win_flush_local_all/Win_flush/Win_flush_all, we should set upgrade_flush_local flag back to 0. Originally we forgot to do this in Win_flush/Win_flush_all. Here we add them. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Here we modify MPIDI_CH3I_RMA_Make_progress_target function and MPIDI_CH3I_RMA_Make_progress_win function so that they will poke the progress engine once if the current window/target state is not satisfied for issuing operations. Note that MPIDI_CH3I_RMA_Make_progress_target is only called from operation routines (MPI_PUT,MPI_GET,...) and MPIDI_CH3I_RMA_Make_progress_win is only called from synchronization routines (MPI_WIN_FENCE, MPI_WIN_LOCK,...). They cannot be called from the RMA progress engine. issue_ops_target(), issue_ops_win(), check_and_switch_target_state(), check_and_switch_win_state() are core functions, and they are called by MPIDI_CH3I_RMA_Make_progress_target(), MPIDI_CH3I_RMA_Make_progress_win() and RMA progress engine. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Here we add progress poking and GC progress during issuing out operations in order to make progress on receiving incoming messages while issuing out messages. Otherwise, if all processes are busy issuing out large number of operations, there will be no process making progress on receiving and sending progress cannot be finished until reaching the ending epoch. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In check_and_switch_target_state function, we return a flag indicating if the state is satified to issue out operations. Here the flag should only indicate the current state, should not mixed with pending list condition. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally in the RMA synchronization, we always try to piggyback LOCK/UNLOCK/FLUSH flags with operations by delaying issuing some of the operations. This is good when number of operations is very small, but delaying issuing not good when message size is large or number of operations is large. In this patch, we add an CVAR to control turn on/off piggybacking LOCK/UNLOCK/FLUSH flags. Defaultly it is off, which means we only piggyback when there are operations available, but not at the cost of delaying issuing operations. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We should always allocate target lock entry pool in win_init, even though info no_locks is set to TRUE during window creation, this is because that info can be set to FALSE by user after the window creation. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Since it does not help on performance. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally we always dynamically allocate a request array for the current RMA operation, since the current operation might be streamed and needs multiple requests to track each stream unit. However, in most cases where streaming is not happening, we only needs one request for each operation and does not need to dynamically allocate it. This patch optimizes such case. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally we poke the progress engine at the end of RMA sync calls if progress engine is never poked in this call before. The purpose of this is to prevent possible deadlock problem. However, the deadlock problem should only happen in self lock cases, if target is not myself, it add unnecessary overhead to RMA sync calls. In this patch, we delete those progress poking but only leave ones when target is myself. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In this patch, we add a reduce-scatter based algorithm in MPI_Win_fence, which is triggered when number of processes is at a small / medium value. When this algorithm is being used, memory usage is O(P), but the ending FENCE only needs to wait for local completion but does not need to wait for remote completion. When number of processes is large, we switch FENCE to the original barrier based algorithm, which has O(1) memory usage, but needs to wait for the remote completion in the ending FENCE. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
On target side, after we receive the GACC/FOP packet, we should first start sending back the data, then perform ACC computation. By doing this issuing data and computation can be overlapped. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
This optimization was missed in 7189bcde . Here we add this back so that when there is no iSSUED active win or passive win, we ignore the while loop in RMA progress. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 24 Jun, 2015 2 commits
-
-
Xin Zhao authored
The original implementation in Win_flush_local counts number of total local completion and remote completion needed to wait, and then waiting for current local/remote completion count to reach those values. There is a bug that we should initialize the current count to zero in each while loop, otherwise the targets that are already completed will be count again and we failed to wait for some targets to be completed. This patch fixes this issue. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
No reviewer. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 23 Jun, 2015 2 commits
-
-
Signed-off-by:
Sangmin Seo <sseo@anl.gov>
-
Signed-off-by:
Sangmin Seo <sseo@anl.gov>
-
- 22 Jun, 2015 1 commit
-
-
Rob Latham authored
The ongoing march towards 64-bit clean continues. Address areas where large product of two ints might have overflowed. Ref: #1767 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 20 Jun, 2015 1 commit
-
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 19 Jun, 2015 1 commit
-
-
Rob Latham authored
commit 83253a41 triggerd a bunch of new warnings. Take a different approach. For simplicity of implementation, do_accumulate_op is defined as MPI_User_function. We could split up internal routine and user-provided routines, but that complicates the code for little benefit: Instead, keep do_accumlate_op with an int type, but check for overflow before explicitly casting. In many places the count is simply '1'. In stream processing there is an interal limit of 256k, so the assertion should never fire. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 16 Jun, 2015 1 commit
-
-
The loser of a head-to-head connection sometimes tries to reconnect later, afer MPI_Finalize was called This can lead to several errors in the socket layer, depending on the state of the disarded connection and the appereance of the connection events. Refs #2180 This Patch has two ways to handle this: 1.) Discarded connections are marked with CONN_STATE_DISCARD, so they are hold from connection. Furthermore, an error on any discarded connection (because the remote side closed in MPI_Finalize) is ignored and the connection is closed. 2.) Add a finalize flag for process groups. If a process group is closing and tries to close all VCs, a flag is set to mark this. If the flag is set, a reconnection (in the socket state) is refused and the connection is closed on both sided. Both steps are necessary to catch all reconnection tries after MPI_Finalize was called. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 15 Jun, 2015 2 commits
-
-
Originally Request_load_recv_iov() function assumes that the initial value of req->dev.segment_first is always zero, which is not correct if we set it to a non-zero value for streaming the RMA operations. The way Request_load_recv_iov() works is that, it is triggered multiple times for the same receiving request until all data is received. During this process, req->dev.segment_first is rewritten to the current offset value. When the initial value of req->dev.segment_first is non-zero, we need another variable to store that value until the receiving process for this request is finished. Here we use a static variable in this function to reach the purpose. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In this patch, we fix the mistakes in calculating the streaming size in GetAccumulate pkt handler on the target side. The original code has two mistakes here: 1. The original code use the size and extent of the target datatype, which is wrong. Here we should use the size / extent of the basic type in the target datatype. 2. The original code always use the total data size to calculate the current streaming size, which is wrong. Here we should use the current rest data size to calculate. This patch fixes these two issues. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 14 Jun, 2015 2 commits
-
-
This patch includes three changes: (1) Added netmod API get_ordering to allow netmod to expose the network ordering. A netmod may issue some packets via multiple connections in parallel if those packets (such as RMA) do not require ordering, and thus the packets may be unordered. This patch sets the network ordering in every existing netmod (tcp|mxm|ofi|portals|llc) to true, since all packets are sent orderly via one connection. (2) Nemesis exposes the window packet orderings such as AM flush ordering at init time. It supports ordered packets only when netmod supports ordered network. (3) If AM flush is ordered (flush must be finished after all previous operations), then CH3 RMA only requests FLUSH ACK on the last operation. Otherwise, CH3 must request per-OP FLUSH ACK to ensure all operations are remotely completed. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The outstanding_acks counter was increased at each sync call (such as fence and flush). However, the counter had to be decreased again if flush ack is not required. It is more straightforward if increasing it only when the flush packet is issued (FLUSH flag piggyback or a separate flush message). Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 12 Jun, 2015 6 commits
-
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Here we make check_and_switch_target/window_state to return a flag indicating if the current window/target states are OK for issuing operations. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
check_window_state ---> check_and_switch_window_state check_target_state ---> check_and_switch_target_state Both of those two functions are used to check and switch (if possible) RMA state. Here we change their name to proper ones. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
When GACC/FOP is used with MPI_NO_OP, the operation is essentially an atomic GET. Originally MPICH implemented this by converting GACC/FOP to GET, which lost the atomicity of that operation. In this patch, we modify the implementation of GACC/FOP to support atomic GET. Main modifications are listed below: (1) When streaming GACC operation, originally we use origin data size to calculate the stream unit size. Since origin data is zero in atomic GET, here we use target data size instead to calculate the stream unit size. (2) On the origin side, if it is atomic GET, CH3 just issues packet header and metadata for derived datatypes (if needed) and does not try to issue from origin buffer; on the target side, after packet header and metadata for derived datatypes (if needed) are received, the final request handler is triggered, CH3 does not try to receive any data from origin. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-