- 16 Dec, 2014 6 commits
-
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
Use int instead of size_t in RMA pkt header to reduce packet size. No reviewer.
-
Xin Zhao authored
Originally we only allows LOCK request to be piggybacked with small RMA operations (all data can be fit in packet header). This brings communication overhead for larger operations since origin side needs to wait for the LOCK ACK before it can transmit data to the target. In this patch we add support of piggybacking LOCK with RMA operations with arbitrary size. Note that (1) this only works with basic datatypes; (2) if the LOCK cannot be satisfied, we temporarily buffer this operation on the target side. No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
Arrange RMA sync functions in src/mpid/ch3/src/ch3u_rma_sync.c in the following order: Win_fence Win_post Win_start Win_complete Win_wait Win_test Win_lock Win_unlock Win_flush Win_flush_local Win_lock_all Win_unlock_all Win_flush_all Win_flush_local_all Win_sync No reviewer.
-
- 13 Nov, 2014 3 commits
-
-
Xin Zhao authored
Here we wrap up common action when one RMA op is finished on target into a function to make code structure cleaner. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Originally do_accumulate_op() only accepts request pointer as argument which is too restrict to be reused. Here we modify it to access buffer address, count, datatype and op, so that it can be reused in more general cases. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When operation pending list and request lists are all empty, FLUSH message needs to be sent by origin only when origin issued PUT/ACC operations since the last synchronization calls, otherwise origin does not need to issue FLUSH at all and does not need to wait for FLUSH ACK message. Similiarly, origin waits for ACK of UNLOCK message only when origin issued PUT/ACC operations since the last synchronization calls. However, UNLOCK message always needs to be sent out because origin needs to unlock the target process. This patch avoids issuing unnecessary FLUSH / FLUSH ACK / UNLOCK ACK messages. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 03 Nov, 2014 13 commits
-
-
Xin Zhao authored
Add some original RMA PVARs back to the new RMA infrastructure, including timing of packet handlers, op allocation and setting, window creation, etc. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We made a huge change to RMA infrastructure and a lot of old code can be droped, including separate handlers for lock-op-unlock, ACCUM_IMMED specific code, O(p) data structure code, code of lazy issuing, etc. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
1. Piggyback LOCK request with first IMMED operation. When we see an IMMED operation, we can always piggyback LOCK request with that operation to reduce one sync message of single LOCK request. When packet header of that operation is received on target, we will try to acquire the lock and perform that operation. The target either piggybacks LOCK_GRANTED message with the response packet (if available), or sends a single LOCK_GRANTED message back to origin. 2. Rewrite code of manage lock queue. When the lock request cannot be satisfied on target, we need to buffer that lock request on target. All we need to do is enqueuing the packet header, which contains all information we need after lock is granted. When the current lock is released, the runtime will goes over the lock queue and grant the lock to the next available request. After lock is granted, the runtime just trigger the packet handler for the second time. 3. Release lock on target side if piggybacking with UNLOCK. If there are active-message operations to be issued, we piggyback a UNLOCK flag with the last operation. When the target recieves it, it will release the current lock and grant the lock to the next process. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we extract the common code of different issuing functions at origin side and simplify those issuing functions. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We add a IMMED data area (16 bytes by default) in packet header which will contains as much origin data as possible. If origin can put all data in packet header, then it no longer needs to send separate data packet. When target recieves the packet header, it will first copy data out from the IMMED data area. If there is still more data coming, it continues to receive following packets; if all data is included in header, then recieving is done. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
During PSCW, when there are active-message operations to be issued in Win_complete, we piggback a AT_COMPLETE flag with it so that when target receives it, it can decrement a counter on target side and detect completion when target counter reaches zero. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When the origin wants to do a FLUSH sync, if there are active-message operations that are going to be issued, we piggback the FLUSH message with the last operation; if no such operations, we just send a single FLUSH packet. If the last operation is a write op (PUT, ACC) or only a single FLUSH packet is sent, after target recieves it, target will send back a single FLUSH_ACK packet; if the last operation contains a read action (GET, GACC, FOP, CAS), after target receiveds it, target will piggback a FLUSH_ACK flag with the response packet. After origin receives the FLUSH_ACK packet or response packet with FLUSH_ACK flag, it will decrement the counter which indicates number of outgoing sync messages (FLUSH / UNLOCK). When that counter reaches zero, origin can know that remote completion is achieved. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure, so that we eliminate unnessary copy. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The packet type MPIDI_CH3_PKT_PT_RMA_DONE is used for ACK of FLUSH / UNLOCK packets. Here we rename it to MPIDI_CH3_PKT_FLUSH_ACK and modify the related functions and data structures. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were adding an unnecessary dependency on VC structure declarations in the mpidpkt.h file. The required information in RMA lock queue is only the rank, but not actual VC. Here we replace VC with rank. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Split RMA functionality into smaller files, and move functions to where they belong based on the file names. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Because we are going to rewrite the RMA infrastructure and many PVARs will no longer be used, here we temporarily remove all PVARs and will add needed PVARs back after new implementation is done. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 01 Nov, 2014 1 commit
-
-
The original implementation includes an optimization which allows Win_unlock for exclusive lock to return without waiting for remote completion. This relys on the assumption that window memory on target process will not be accessed by a third party until that target process finishes all RMA operations and grants the lock to other processes. However, this assumption is not correct if user uses assert MPI_MODE_NOCHECK. Consider the following code: P0 P1 P2 MPI_Win_lock(P1, NULL, exclusive); MPI_Put(X); MPI_Win_unlock(P1, exclusive); MPI_Send (P2); MPI_Recv(P0); MPI_Win_lock(P1, MODE_NOCHECK, exclusive); MPI_Get(X); MPI_Win_unlock(P1, exclusive); Both P0 and P2 issue exclusive lock to P1, and P2 uses assert MPI_MODE_NOCHECK because the lock should be granted to P2 after synchronization between P2 and P0. However, in the original implementation, GET operation on P2 might not get the updated value since Win_unlock on P0 return without waiting for remote completion. In this patch we delete this optimization. In Win_free, since every Win_unlock guarantees the remote completion, target process no longer needs to do additional counting works to detect target-side completion, but only needs to do a global barrier. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 30 Oct, 2014 1 commit
-
-
Xin Zhao authored
No reviewer.
-
- 20 Oct, 2014 1 commit
-
-
Pavan Balaji authored
We were not setting the function states correctly in a bunch of functions. Modifications by Wesley to split up big commit. Signed-off-by:
Wesley Bland <wbland@anl.gov> Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 27 Aug, 2014 1 commit
-
-
Norio Yamaguchi authored
After one thread finishes processing all operations in the ops list, a new RMA operation may be enqueued by another thread in MPID_Progress_wait(). In such case, it has not got issued yet and we should avoid processing it at end of synchronization calls. This situation occurred when running test/mpi/threads/rma/multirma.c Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 21 Jul, 2014 1 commit
-
-
Pavan Balaji authored
This is to help with debugging. Zero is too common a value, and is often set automatically by the system if not initialized. Starting at a different value helps us catch uninitialized cases more easily. We pick "42" as our magic number as it is the answer to the ultimate question of life, the Universe, and everything. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
- 18 Jul, 2014 1 commit
-
-
Pavan Balaji authored
This reverts commit 274a5a70.
-
- 17 Jul, 2014 1 commit
-
-
Pavan Balaji authored
We were creating duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 17 Dec, 2013 1 commit
-
-
Junchao Zhang authored
Fixes #1962 Signed-off-by: Junchao Zhang<jczhang@mcs.anl.gov> (Reviewed by Bill Gropp)
-
- 15 Nov, 2013 1 commit
-
-
Antonio J. Pena authored
Fixes the following warnings: PGC-W-0095-Type cast required for this conversion (./src/mpid/ch3/include/mpidrma.h: 703) PGC-W-0095-Type cast required for this conversion (./src/mpid/ch3/include/mpidrma.h: 864) Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
- 31 Oct, 2013 1 commit
-
-
Also includes random fixes to `-Wshorten-64-to-32` warnings which might need to be teased out. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 27 Sep, 2013 1 commit
-
-
Pavan Balaji authored
Optimize the case where the origin and target both use basic datatypes. In this case, we assume that the data is aligned correctly for the appropriate datatype and perform a direct assignment instead of a memory copy. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 26 Sep, 2013 2 commits
-
-
Pavan Balaji authored
The check was originally in the ch3 layer, but doesn't seem to use any ch3 specific information. This macro will be useful at the upper layers for optimizations, e.g., in the localcopy routine. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
Pavan Balaji authored
We already do a check for shared memory before calling the shared-memory specific functions. This patch simplifies some of those redundant checks. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 01 Aug, 2013 3 commits
-
-
Pavan Balaji authored
-
When judging if origin and target process are on the same node, using vc->node_id flag instead of vc->ch.is_local flag. Flag 'is_local' is not correct because it is defined in nemesis, not in CH3. Flag 'node_id' is defined in CH3. Note that for ch3:sock, even if origin and target are on the same node, they are not within the same SHM region. Currently ch3:sock is filtered out by checking shm_allocated flag first. In future we need to figure out a way to check if origin and target are within the same "SHM comm". Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
Because if MPI_WIN_FLAVOR_SHARED is used in ch3:sock, it will allocate normal memory instead of shared memory, therefore shm_base_addrs will not be used. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 28 Jul, 2013 2 commits
-
-
If shared memory is allocated for window, and target vc is local, do SHM RMA operations. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
The code in inline functions is moved from operation routines in ch3u_rma_ops.c and ch3u_rma_acc_ops.c. By moving them in inline functions, both operation routines and synchronization routines can call them. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-