- 13 Nov, 2014 1 commit
-
-
Xin Zhao authored
When operation pending list and request lists are all empty, FLUSH message needs to be sent by origin only when origin issued PUT/ACC operations since the last synchronization calls, otherwise origin does not need to issue FLUSH at all and does not need to wait for FLUSH ACK message. Similiarly, origin waits for ACK of UNLOCK message only when origin issued PUT/ACC operations since the last synchronization calls. However, UNLOCK message always needs to be sent out because origin needs to unlock the target process. This patch avoids issuing unnecessary FLUSH / FLUSH ACK / UNLOCK ACK messages. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 11 Nov, 2014 5 commits
-
-
Min Si authored
We should never change the ADI which is exposed to MPI layer for CH3 internal implementation. However, commit 3e005f03 changed the ADI of put/get/accumulate/get_accumulate for reusing the routine of normal RMA operations in request-based operations. This patch defines new CH3 internal functions of put/get/accumulate/get_accumulate to be reused by both normal and request-based operations and reverts the ADI change in commit 3e005f03 . Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
We already use window states to specify the current state in RMA epoch, therfore the epoch states are no longer used. Here we delete those states. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
For lock type, we only need one internal value to specify cases when currently there is no passive lock issued from origin side or there is no passive lock imposed on target side. If there are passive locks, we directly use MPI_LOCK_SHARED and MPI_LOCK_EXCLUSIVE to indicate the lock type. This patch deletes redundant enum for lock types and just defines MPID_LOCK_NONE. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
It is helpful for us to find variables that are not initialized or wrongly initialized. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
MPIDI_RMA_NONE is the initial value of window state and should not be used with sync flag. The initial value of sync flag should be set to MPIDI_RMA_SYNC_NONE. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
- 07 Nov, 2014 1 commit
-
-
Xin Zhao authored
num_active_issued_win and num_passive_win are counters of windows in active ISSUED mode and in passive mode. It is modified in CH3 and is used in progress engine of nemesis / sock to skip windows that do not need to make progress on. Here we define them in mpidi_ch3_pre.h in nemesis / sock so that they can be exposed to upper layers. Signed-off-by:
Min Si <msi@il.is.s.u-tokyo.ac.jp>
-
- 04 Nov, 2014 1 commit
-
-
Min Si authored
There are two requests associated with each request-based operation: one normal internal request (req) and one newly added user request (ureq). We return ureq to user when request-based op call returns. The ureq is initialized with completion counter (CC) to 1 and ref count to 2 (one is referenced by CH3 and another is referenced by user). If the corresponding op can be finished immediately in CH3, the runtime will complete ureq in CH3, and let user's MPI_Wait/Test to destroy ureq. If corresponding op cannot be finished immediately, we will first increment ref count to 3 (because now there are three places needed to reference ureq: user, CH3, progress engine). Progress engine will complete ureq when op is completed, then CH3 will release its reference during garbage collection, finally user's MPI_Wait/Test will destroy ureq. The ureq can be completed in following three ways: 1. If op is issued and completed immediately in CH3 (req is NULL), we just complete ureq before free op. 2. If op is issued but not completed, we remember the ureq handler in req and specify OnDataAvail / OnFinal handlers in req to a newly added request handler, which will complete user reqeust. The handler is triggered at three places: 2-a. when progress engine completes a put/acc req; 2-b. when get/getacc handler completes a get/getacc req; 2-c. when progress engine completes a get/getacc req; 3. If op is not issued (i.e., wait for lock granted), the 2nd way will be eventually performed when such op is issued by progress engine. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 03 Nov, 2014 32 commits
-
-
Xin Zhao authored
Add some original RMA PVARs back to the new RMA infrastructure, including timing of packet handlers, op allocation and setting, window creation, etc. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We made a huge change to RMA infrastructure and a lot of old code can be droped, including separate handlers for lock-op-unlock, ACCUM_IMMED specific code, O(p) data structure code, code of lazy issuing, etc. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
1. Piggyback LOCK request with first IMMED operation. When we see an IMMED operation, we can always piggyback LOCK request with that operation to reduce one sync message of single LOCK request. When packet header of that operation is received on target, we will try to acquire the lock and perform that operation. The target either piggybacks LOCK_GRANTED message with the response packet (if available), or sends a single LOCK_GRANTED message back to origin. 2. Rewrite code of manage lock queue. When the lock request cannot be satisfied on target, we need to buffer that lock request on target. All we need to do is enqueuing the packet header, which contains all information we need after lock is granted. When the current lock is released, the runtime will goes over the lock queue and grant the lock to the next available request. After lock is granted, the runtime just trigger the packet handler for the second time. 3. Release lock on target side if piggybacking with UNLOCK. If there are active-message operations to be issued, we piggyback a UNLOCK flag with the last operation. When the target recieves it, it will release the current lock and grant the lock to the next process. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We must make the initial value of enum to zero because some places check number of packet types by checking ending type value. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Rearrange the ordering of packet types so that all RMA issuing types can be placed together. This is convenient when we check if currently involved packets are all RMA packets. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
For FOP operation, all data can be fit into the packet header, so on origin side we do not need to send separate data packets, and on target side we do not need request handler, only packet handler is needed. Similar with FOP response packet, we can receive all data in FOP resp packet handler. This patch delete the request handler on target side and simplify packet handler on target / origin side. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we extract the common code of different issuing functions at origin side and simplify those issuing functions. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We add a IMMED data area (16 bytes by default) in packet header which will contains as much origin data as possible. If origin can put all data in packet header, then it no longer needs to send separate data packet. When target recieves the packet header, it will first copy data out from the IMMED data area. If there is still more data coming, it continues to receive following packets; if all data is included in header, then recieving is done. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
During PSCW, when there are active-message operations to be issued in Win_complete, we piggback a AT_COMPLETE flag with it so that when target receives it, it can decrement a counter on target side and detect completion when target counter reaches zero. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When the origin wants to do a FLUSH sync, if there are active-message operations that are going to be issued, we piggback the FLUSH message with the last operation; if no such operations, we just send a single FLUSH packet. If the last operation is a write op (PUT, ACC) or only a single FLUSH packet is sent, after target recieves it, target will send back a single FLUSH_ACK packet; if the last operation contains a read action (GET, GACC, FOP, CAS), after target receiveds it, target will piggback a FLUSH_ACK flag with the response packet. After origin receives the FLUSH_ACK packet or response packet with FLUSH_ACK flag, it will decrement the counter which indicates number of outgoing sync messages (FLUSH / UNLOCK). When that counter reaches zero, origin can know that remote completion is achieved. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Separate final request handler of PUT, ACC, GACC into three. Separate derived DT request handler of ACC and GACC into two. Renaming request handlers as follows: (1) Normal request handler: it is triggered on target side when all data from origin is received. It includes: ReqHandler_PutRecvComplete --- for PUT ReqHandler_AccumRecvComplete --- for ACC ReqHandler_GaccumRecvComplete --- for GACC (2) Derived DT request handler: it is triggered on target side when all derived DT info is recieved. It includes: ReqHandler_PutDerivedDTRecvComplete --- for PUT ReqHandler_AccumDerivedDTRecvComplete --- for ACC ReqHandler_GaccumDerivedDTRecvComplete --- for GACC (3) Reponse request handler: it is triggered on target side when sending back process is finished in GET-like operations. It includes: ReqHandler_GetSendComplete --- for GET ReqHandler_GaccumLikeSendComplete --- for GACC, FOP, CAS Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Previously several RMA packet types share the same structure, which is misleading for coding. Here make different RMA packet types use different packet data structures. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We use new algorithms for RMA synchronization functions and RMA epochs. The old implementation uses a lazy-issuing algorithm, which queues up all operations and issues them at end. This forbid opportunites to do hardware RMA operations and can use up all memory resources when we queue up large number of operations. Here we use a new algorithm, which will initialize the synchonization at beginning, and issue operations as soon as the synchronization is finished. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When there are too many active requests in the runtime, the internal memory might be used up. This patch prevents such situation by triggering blocking wait loop in operation routines when no. of active requests reaches certain threshold value. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We no longer use the lazy-issuing model, which delays all operations to the end to issue, but issues them as early as possible. To achieve this, we enable making progress in RMA routines, so that RMA operations can be issued out as long as synchronization is finished. Sometimes we also need to poke the progress in operation routines to make sure that target side makes enough progress to receiving packets. Here we trigger it when no. of posted operations reaches certain threshold value. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
GET_OP function may be a blocking function which guarantees to return an RMA operation. Inside GET_OP we first call the normal OP_ALLOC function which will try to get a new OP from OP pools; if failed, we call nonblocking GC function to cleanup completed ops and then call OP_ALLOC again; if we still cannot get a new OP, we call nonblocking FREE_OP_BEFORE_COMPLETION function if hardware ordering is provided and then call OP_ALLOC again; if still failed, finally we call blocking aggressive cleanup function, which will guarantee to return a new OP element. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When FLUSH sync is issued and remote completion ordering between the last FLUSH message and all previous ops is provided by curent hardware, we no longer need to maintain incomplete operations but only need to wait for the ACK of current FLUSH. Therefore we can free those operation resources without blocking waiting. Not that if we do this, we temporarily lose the opportunity to do a real FLUSH_LOCAl until the current FLUSH ACK is received. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When we run out of resources for operations and targets, we need to make the runtime to complete some operations so that it can free some resources. For RMA operations, we implement by doing an internal FLUSH_LOCAL for one target and waiting for operation resources; for RMA targets, we implement by doing an internal FLUSH operation for one target and wait for target resources. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Progress making functions check if current synchronization is finished, change synchronization state if possible, and issue pending operations on window as many as possible. There are three granularity of progress making functions: per-target, per-window and per-process. Per-target routine is used in RMA routine functions (PUT/GET/ACC...) and single passive lock (Win_unlock, Win_flush, Win_flush_local); per-window routine is used in window-wide synchronization calls (Win_fence, Win_complete, Win_unlock_all, Win_flush_all, Win_flush_local_all), and per-process routine is used in progress engine. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we implement garbage collection functions for both operations and targets. There are two level of GC functions: per-target and per-window. Per-target functions are used in single passive lock ending calls: Win_unlock; per-window functions are used in window-wide ending calls: Win_fence, Win_complete, Win_unlock_all. Garbage collection functions for RMA ops go over all incomplete operation lists in target element and free completed operations. It also returns flags indicating local completion and remote completion. Garbage collection functions for RMA targets go over all targets and free those targets that have compeleted empty operation lists. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Keep track of no. of non-empty slots on window so that when number is 0, there are no operations needed to be processed and we can ignore that window. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We define new states to indicate the current situation of RMA synchronization. The states contain both ACCESS states and EXPOPSURE states, and specify if the synchronization is initialized (_CALLED), on-going (_ISSUED) and completed (_GRANTED). For single lock in Passive Target, we use per-target state whereas the window state is set to PER_TARGET. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Add flag is_dt in op structure which is set when any buffers involved in RMA operations contains derived datatype data. It is convenient for us to enqueue issued but not completed operation to the DT specific list. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Add a list of created windows on this process, so that we can make progress on all windows in the progress engine. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Given an RMA op, finding the correct slot and target, enqueue op to the pending op list in that target object. If the target is not existed, create one in that slot. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We allocate a fixed size of targets array on window during window creation. The size can be configured by the user via CVAR. Each slot entry contains a list of target elements. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we add a data structure to store information of active target. The information includes operation lists, pasive lock state, sync state, etc. The target element is created by origin on-demand, and can be freed after the remote completion of all previous oeprations is detected. After RMA ending synchrnization calls, all target elements should be freed. Similiarly with operation pools, we create two-level target pools for target elements: one pre-window target pool and one global target pool. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Instead of allocating / deallocating RMA operations whenever an RMA op is posted by user, we allocate fixed size operation pools beforehand and take the op element from those pools when an RMA op is posted. With only a local (per-window) op pool, the number of ops allocated can increase arbitrarily if many windows are created. Alternatively, if we only use a global op pool, other windows might use up all operations thus starving the window we are working on. In this patch we create two pools: a local (per-window) pool and a global pool. Every window is guaranteed to have at least the number of operations in the local pool. If we run out of these operations, we check in the global pool to see if we have any operations left. When an operation is released, it is added back to the same pool it was allocated from. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure, so that we eliminate unnessary copy. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The packet type MPIDI_CH3_PKT_PT_RMA_DONE is used for ACK of FLUSH / UNLOCK packets. Here we rename it to MPIDI_CH3_PKT_FLUSH_ACK and modify the related functions and data structures. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-