- 13 Feb, 2015 4 commits
-
-
Xin Zhao authored
In this patch, we replace one argument of function finish_op_on_target, "packet(op) type", with "has_response_data". Since finish_op_on_target does not care what specific packet(op) type it is processing on, but only cares about if the current op has response data (like GET/GACC), changing the argument in this way can simplify the code by avoiding acquiring packet(op) type everytime before calling finish_op_on_target. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Originally we add "immed_data" and "immed_len" areas to RMA packets, in order to piggyback small amount of data with packet header to reduce number of packets (Note that "immed_len" is necessary when the piggybacked data is not the entire data). However, those areas potentially increase the packet union size and worsen the two-sided communication. This patch fixes this issue. In this patch, we remove "immed_data" and "immed_len" from normal "MPIDI_CH3_Pkt_XXX_t" operation type (e.g. MPIDI_CH3_Pkt_put_t), and we introduce new "MPIDI_CH3_Pkt_XXX_immed_t" packt type for each operation (e.g. MPIDI_CH3_Pkt_put_immed_t). "MPIDI_CH3_Pkt_XXX_immed_t" is used when (1) both origin and target are basic datatypes, AND, (2) the data to be sent can be entirely fit into the header. By doing this, "MPIDI_CH3_Pkt_XXX_immed_t" needs "immed_data" area but can drop "immed_len" area. Also, since it only works with basic target datatype, it can drop "dataloop_size" area as well. All operations that do not satisfy (1) or (2) will use normal "MPIDI_CH3_Pkt_XXX_t" type. Originally we always piggyback FOP data into the packet header, which makes the packet size too large. In this patch we split the FOP operaton into IMMED packets and normal packets. Because CAS only work with 2 basic datatype and non-complex elements, the data amount is relatively small, we always piggyback the data with packet header and only use "MPIDI_CH3_Pkt_XXX_immed_t" packet type for CAS. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Originally we added lock_type and origin_rank areas in RMA packet, in order to piggyback passive lock request with RMA operations. However, those areas potentially enlarged the packet union size, and actually they are not necessary and can be completetly avoided. "Lock_type" is used to remember what types of lock (shared or exclusive) the origin wants to acquire on the target. To remove it from RMA packet, we use flags (already exists in RMA packet) to remember such information. "Origin_rank" is used to remember which origin has sent lock request to the target, so that when the lock is granted to this origin later, the target can send ack to that origin. Actually the target does not need to store origin_rank but can only store origin_vc, which is known from progress engine on target side. Therefore, we can completely remove origin_rank from RMA packet. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 08 Feb, 2015 1 commit
-
-
Xin Zhao authored
FOP, CAS and GACC are atomic "read-modify-write" operations, which means when the target window is defined on a SHM region, we need inter-process lock to guarantee the atomicity of the entire "read+OP". The current implementation is correct for SHM-based RMA operations, but not correct for AM-based RMA operations: for SHM-based operations, it protects the entire "read+OP", but for AM-based operations, it only protects the "OP" part. This patch fixes this issue by protecting the memory copy to temporary buffer and computation together for AM-based operations. Fix ticket 2226 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 16 Dec, 2014 14 commits
-
-
Xin Zhao authored
When data is dropped but lock is queued, we should still store the lock entry in current request, so that we can try to acquire the lock when we received and dropped all data. No reviewer.
-
Xin Zhao authored
Here we should first dequeue the current lock queue entry from lock queue then performing the operation in it. This is because when performing op in current lock entry, we may trigger release_lock() function, which go to check the lock queue again. If we did not remove current entry from the queue, release_lock() will try to process it for the second time, which leads to the wrong execution. No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
The behavior of UNLOCK_ACK flag is exactly the same with the behavior of FLUSH_ACK, so here we just delete UNLOCK_ACK flag and use FLUSH_ACK flag for all FLUSH ACK packets. No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
Because we will send different kinds of LOCK ACKs (not just LOCK_GRANTED, but maybe LOCK_DISCARDED, for example), so naming related packets and function as "LOCK_GRANTED" is not proper anymore. Here we rename them to "LOCK_ACK". No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
Use int instead of size_t in RMA pkt header to reduce packet size. No reviewer.
-
Xin Zhao authored
In this patch we allow GET/GACC response packets to piggyback some IMMED data, just like what we did for PUT/GACC/FOP/CAS packets. No reviewer.
-
Xin Zhao authored
Originally we only allows LOCK request to be piggybacked with small RMA operations (all data can be fit in packet header). This brings communication overhead for larger operations since origin side needs to wait for the LOCK ACK before it can transmit data to the target. In this patch we add support of piggybacking LOCK with RMA operations with arbitrary size. Note that (1) this only works with basic datatypes; (2) if the LOCK cannot be satisfied, we temporarily buffer this operation on the target side. No reviewer.
-
- 24 Nov, 2014 1 commit
-
-
Xin Zhao authored
It is possible that a request handler of RMA request is called for the second time inside the first called request handler on the same request. Consider the following case: a req is queued up in Nemesis SHM queue with ref count of 2: one is for request completion and another is for dequeueing from SHM queue. The first called req handler completed this request and decrement ref count to 1. This request is still in the queue. However, within this handler, we trigger the same req handler on the same request again (for example making progress on SHM queue), and the second called handler also tries to complete this request, which leads to the wrong execution. In this patch we check if request has already been completed when entering the req handler, to prevent processing the same request twice. We also move the function finish_op_on_target() (where the same req handler can be triggered again) after request completion routine, so that we can mark the current request as completed before enter the same req handler for the second time. Fix #2204 Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 13 Nov, 2014 3 commits
-
-
Xin Zhao authored
ReqHandler_GaccumLikeSendComplete is used for GACC-like operations, including GACC, CAS and FOP. Here we split it into following three functions: ReqHandler_GaccumSendComplete ReqHandler_CASSendComplete ReqHandler_FOPSendComplete It is convenient for us to add different actions in future for those three kinds of operations. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we wrap up common action when one RMA op is finished on target into a function to make code structure cleaner. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Originally do_accumulate_op() only accepts request pointer as argument which is too restrict to be reused. Here we modify it to access buffer address, count, datatype and op, so that it can be reused in more general cases. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 03 Nov, 2014 11 commits
-
-
Xin Zhao authored
We made a huge change to RMA infrastructure and a lot of old code can be droped, including separate handlers for lock-op-unlock, ACCUM_IMMED specific code, O(p) data structure code, code of lazy issuing, etc. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
1. Piggyback LOCK request with first IMMED operation. When we see an IMMED operation, we can always piggyback LOCK request with that operation to reduce one sync message of single LOCK request. When packet header of that operation is received on target, we will try to acquire the lock and perform that operation. The target either piggybacks LOCK_GRANTED message with the response packet (if available), or sends a single LOCK_GRANTED message back to origin. 2. Rewrite code of manage lock queue. When the lock request cannot be satisfied on target, we need to buffer that lock request on target. All we need to do is enqueuing the packet header, which contains all information we need after lock is granted. When the current lock is released, the runtime will goes over the lock queue and grant the lock to the next available request. After lock is granted, the runtime just trigger the packet handler for the second time. 3. Release lock on target side if piggybacking with UNLOCK. If there are active-message operations to be issued, we piggyback a UNLOCK flag with the last operation. When the target recieves it, it will release the current lock and grant the lock to the next process. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
For FOP operation, all data can be fit into the packet header, so on origin side we do not need to send separate data packets, and on target side we do not need request handler, only packet handler is needed. Similar with FOP response packet, we can receive all data in FOP resp packet handler. This patch delete the request handler on target side and simplify packet handler on target / origin side. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We add a IMMED data area (16 bytes by default) in packet header which will contains as much origin data as possible. If origin can put all data in packet header, then it no longer needs to send separate data packet. When target recieves the packet header, it will first copy data out from the IMMED data area. If there is still more data coming, it continues to receive following packets; if all data is included in header, then recieving is done. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
During PSCW, when there are active-message operations to be issued in Win_complete, we piggback a AT_COMPLETE flag with it so that when target receives it, it can decrement a counter on target side and detect completion when target counter reaches zero. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When the origin wants to do a FLUSH sync, if there are active-message operations that are going to be issued, we piggback the FLUSH message with the last operation; if no such operations, we just send a single FLUSH packet. If the last operation is a write op (PUT, ACC) or only a single FLUSH packet is sent, after target recieves it, target will send back a single FLUSH_ACK packet; if the last operation contains a read action (GET, GACC, FOP, CAS), after target receiveds it, target will piggback a FLUSH_ACK flag with the response packet. After origin receives the FLUSH_ACK packet or response packet with FLUSH_ACK flag, it will decrement the counter which indicates number of outgoing sync messages (FLUSH / UNLOCK). When that counter reaches zero, origin can know that remote completion is achieved. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Separate final request handler of PUT, ACC, GACC into three. Separate derived DT request handler of ACC and GACC into two. Renaming request handlers as follows: (1) Normal request handler: it is triggered on target side when all data from origin is received. It includes: ReqHandler_PutRecvComplete --- for PUT ReqHandler_AccumRecvComplete --- for ACC ReqHandler_GaccumRecvComplete --- for GACC (2) Derived DT request handler: it is triggered on target side when all derived DT info is recieved. It includes: ReqHandler_PutDerivedDTRecvComplete --- for PUT ReqHandler_AccumDerivedDTRecvComplete --- for ACC ReqHandler_GaccumDerivedDTRecvComplete --- for GACC (3) Reponse request handler: it is triggered on target side when sending back process is finished in GET-like operations. It includes: ReqHandler_GetSendComplete --- for GET ReqHandler_GaccumLikeSendComplete --- for GACC, FOP, CAS Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure, so that we eliminate unnessary copy. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The packet type MPIDI_CH3_PKT_PT_RMA_DONE is used for ACK of FLUSH / UNLOCK packets. Here we rename it to MPIDI_CH3_PKT_FLUSH_ACK and modify the related functions and data structures. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were adding an unnecessary dependency on VC structure declarations in the mpidpkt.h file. The required information in RMA lock queue is only the rank, but not actual VC. Here we replace VC with rank. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Split RMA functionality into smaller files, and move functions to where they belong based on the file names. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 01 Nov, 2014 2 commits
-
-
Xin Zhao authored
req->dev.user_buf points to the data sent from origin process to target process, and for FOP sometimes it points to the IMMED area in packet header when data can be fit in packet header. In such case, we should not free req->dev.user_buf in final request handler since that data area will be freed by the runtime when packet header is freed. In this patch we initialize user_buf to NULL when creating the request, and set it to NULL when FOP is completed, and avoid free a NULL pointer in final request handler. Signed-off-by:
Min Si <msi@il.is.s.u-tokyo.ac.jp>
-
The original implementation includes an optimization which allows Win_unlock for exclusive lock to return without waiting for remote completion. This relys on the assumption that window memory on target process will not be accessed by a third party until that target process finishes all RMA operations and grants the lock to other processes. However, this assumption is not correct if user uses assert MPI_MODE_NOCHECK. Consider the following code: P0 P1 P2 MPI_Win_lock(P1, NULL, exclusive); MPI_Put(X); MPI_Win_unlock(P1, exclusive); MPI_Send (P2); MPI_Recv(P0); MPI_Win_lock(P1, MODE_NOCHECK, exclusive); MPI_Get(X); MPI_Win_unlock(P1, exclusive); Both P0 and P2 issue exclusive lock to P1, and P2 uses assert MPI_MODE_NOCHECK because the lock should be granted to P2 after synchronization between P2 and P0. However, in the original implementation, GET operation on P2 might not get the updated value since Win_unlock on P0 return without waiting for remote completion. In this patch we delete this optimization. In Win_free, since every Win_unlock guarantees the remote completion, target process no longer needs to do additional counting works to detect target-side completion, but only needs to do a global barrier. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 01 Oct, 2014 3 commits
-
-
Xin Zhao authored
at_completion_counter is used to indicate if all Active Target operations have completed on this target. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
For GET-like operations, We should increment the Active Target counter when the process of sending back data is not completed immediately on target and a response request is created. We should decrement the counter when the process of sending back data is completed on target side. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
In the original implementation, for GACC/FOP/CAS, the function MPIDI_CH3_Finish_rma_op_target (includes operations that should be performed on target when that operation finishes on target) is not called when that operation real finishes, but is called after starting send back data. Here we fix it to make the function called after sending process on target is completed. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 28 Sep, 2014 1 commit
-
-
Xin Zhao authored
For Active Target synchronization, the original implementation does not guarantee the completion of all ops on target side when Win_wait / Win_fence returns. It is implemented using a counter, which is decremented when the last operation from that origin finishes. Win_wait / Win_fence waits until that counter reaches zero. Problem is that, when the last operation finishes, the previous GET-like operation (for example with a large data volume) may have not finished yet. This breaks the semantic of Win_wait / Win_fence. Here we fix this by increment the counter whenever we meet a GET-like operation, and decrement it when that operation finishes on target side. This will guarantee that when counter reaches zero and Win_wait / Win_fence returns, all operations are completed on the target. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-