- 13 Feb, 2015 1 commit
-
-
Xin Zhao authored
Originally we added lock_type and origin_rank areas in RMA packet, in order to piggyback passive lock request with RMA operations. However, those areas potentially enlarged the packet union size, and actually they are not necessary and can be completetly avoided. "Lock_type" is used to remember what types of lock (shared or exclusive) the origin wants to acquire on the target. To remove it from RMA packet, we use flags (already exists in RMA packet) to remember such information. "Origin_rank" is used to remember which origin has sent lock request to the target, so that when the lock is granted to this origin later, the target can send ack to that origin. Actually the target does not need to store origin_rank but can only store origin_vc, which is known from progress engine on target side. Therefore, we can completely remove origin_rank from RMA packet. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 16 Dec, 2014 6 commits
-
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
Rewrite progress engine functions as following: Basic functions: (1) check_target_state: check to see if we can switch target state, issue synchronization messages if needed. (2) issue_ops_target: issue al pending operations to this target. (3) check_window_state: check to see if we can switch window state. (4) issue_ops_win: issue all pending operations on this window. Currently it internally calls check_target_state and issue_ops_target, it should be optimized in future. Progress making functions: (1) Make_progress_target: make progress on one target, which internally call check_target_state and issue_ops_target. (2) Make_progress_win: make progress on all targets on one window, which internally call check_window_state and issue_ops_win. (3) Make_progress_global: make progress on all windows, which internally call make_progress_win. No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
No reviewer.
-
Xin Zhao authored
Originally we only allows LOCK request to be piggybacked with small RMA operations (all data can be fit in packet header). This brings communication overhead for larger operations since origin side needs to wait for the LOCK ACK before it can transmit data to the target. In this patch we add support of piggybacking LOCK with RMA operations with arbitrary size. Note that (1) this only works with basic datatypes; (2) if the LOCK cannot be satisfied, we temporarily buffer this operation on the target side. No reviewer.
-
Xin Zhao authored
accumulated_ops_cnt is used to track no. of accumulated posted RMA operations between two synchronization calls, so that we can decide when to poke progress engine based on the current value of this counter. Here we initialize it to zero in the BEGINNING synchronization calls (Win_fence, Win_start, first Win_lock, Win_lock_all), and correctly decrement it in the ENDING synchronization calls (Win_fence, Win_complete, Win_unlock, Win_unlock_all, Win_flush, Win_flush_local, Win_flush_all, Win_flush_local_all). We also use a per-target counter to track single target case. No reviewer.
-
- 13 Nov, 2014 1 commit
-
-
Xin Zhao authored
When operation pending list and request lists are all empty, FLUSH message needs to be sent by origin only when origin issued PUT/ACC operations since the last synchronization calls, otherwise origin does not need to issue FLUSH at all and does not need to wait for FLUSH ACK message. Similiarly, origin waits for ACK of UNLOCK message only when origin issued PUT/ACC operations since the last synchronization calls. However, UNLOCK message always needs to be sent out because origin needs to unlock the target process. This patch avoids issuing unnecessary FLUSH / FLUSH ACK / UNLOCK ACK messages. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 11 Nov, 2014 2 commits
-
-
For lock type, we only need one internal value to specify cases when currently there is no passive lock issued from origin side or there is no passive lock imposed on target side. If there are passive locks, we directly use MPI_LOCK_SHARED and MPI_LOCK_EXCLUSIVE to indicate the lock type. This patch deletes redundant enum for lock types and just defines MPID_LOCK_NONE. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
It is helpful for us to find variables that are not initialized or wrongly initialized. Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
- 04 Nov, 2014 1 commit
-
-
Min Si authored
There are two requests associated with each request-based operation: one normal internal request (req) and one newly added user request (ureq). We return ureq to user when request-based op call returns. The ureq is initialized with completion counter (CC) to 1 and ref count to 2 (one is referenced by CH3 and another is referenced by user). If the corresponding op can be finished immediately in CH3, the runtime will complete ureq in CH3, and let user's MPI_Wait/Test to destroy ureq. If corresponding op cannot be finished immediately, we will first increment ref count to 3 (because now there are three places needed to reference ureq: user, CH3, progress engine). Progress engine will complete ureq when op is completed, then CH3 will release its reference during garbage collection, finally user's MPI_Wait/Test will destroy ureq. The ureq can be completed in following three ways: 1. If op is issued and completed immediately in CH3 (req is NULL), we just complete ureq before free op. 2. If op is issued but not completed, we remember the ureq handler in req and specify OnDataAvail / OnFinal handlers in req to a newly added request handler, which will complete user reqeust. The handler is triggered at three places: 2-a. when progress engine completes a put/acc req; 2-b. when get/getacc handler completes a get/getacc req; 2-c. when progress engine completes a get/getacc req; 3. If op is not issued (i.e., wait for lock granted), the 2nd way will be eventually performed when such op is issued by progress engine. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 03 Nov, 2014 12 commits
-
-
Xin Zhao authored
1. Piggyback LOCK request with first IMMED operation. When we see an IMMED operation, we can always piggyback LOCK request with that operation to reduce one sync message of single LOCK request. When packet header of that operation is received on target, we will try to acquire the lock and perform that operation. The target either piggybacks LOCK_GRANTED message with the response packet (if available), or sends a single LOCK_GRANTED message back to origin. 2. Rewrite code of manage lock queue. When the lock request cannot be satisfied on target, we need to buffer that lock request on target. All we need to do is enqueuing the packet header, which contains all information we need after lock is granted. When the current lock is released, the runtime will goes over the lock queue and grant the lock to the next available request. After lock is granted, the runtime just trigger the packet handler for the second time. 3. Release lock on target side if piggybacking with UNLOCK. If there are active-message operations to be issued, we piggyback a UNLOCK flag with the last operation. When the target recieves it, it will release the current lock and grant the lock to the next process. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
During PSCW, when there are active-message operations to be issued in Win_complete, we piggback a AT_COMPLETE flag with it so that when target receives it, it can decrement a counter on target side and detect completion when target counter reaches zero. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
When FLUSH sync is issued and remote completion ordering between the last FLUSH message and all previous ops is provided by curent hardware, we no longer need to maintain incomplete operations but only need to wait for the ACK of current FLUSH. Therefore we can free those operation resources without blocking waiting. Not that if we do this, we temporarily lose the opportunity to do a real FLUSH_LOCAl until the current FLUSH ACK is received. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We define new states to indicate the current situation of RMA synchronization. The states contain both ACCESS states and EXPOPSURE states, and specify if the synchronization is initialized (_CALLED), on-going (_ISSUED) and completed (_GRANTED). For single lock in Passive Target, we use per-target state whereas the window state is set to PER_TARGET. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Add flag is_dt in op structure which is set when any buffers involved in RMA operations contains derived datatype data. It is convenient for us to enqueue issued but not completed operation to the DT specific list. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Add a list of created windows on this process, so that we can make progress on all windows in the progress engine. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
We allocate a fixed size of targets array on window during window creation. The size can be configured by the user via CVAR. Each slot entry contains a list of target elements. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Xin Zhao authored
Here we add a data structure to store information of active target. The information includes operation lists, pasive lock state, sync state, etc. The target element is created by origin on-demand, and can be freed after the remote completion of all previous oeprations is detected. After RMA ending synchrnization calls, all target elements should be freed. Similiarly with operation pools, we create two-level target pools for target elements: one pre-window target pool and one global target pool. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Instead of allocating / deallocating RMA operations whenever an RMA op is posted by user, we allocate fixed size operation pools beforehand and take the op element from those pools when an RMA op is posted. With only a local (per-window) op pool, the number of ops allocated can increase arbitrarily if many windows are created. Alternatively, if we only use a global op pool, other windows might use up all operations thus starving the window we are working on. In this patch we create two pools: a local (per-window) pool and a global pool. Every window is guaranteed to have at least the number of operations in the local pool. If we run out of these operations, we check in the global pool to see if we have any operations left. When an operation is released, it is added back to the same pool it was allocated from. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure, so that we eliminate unnessary copy. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were adding an unnecessary dependency on VC structure declarations in the mpidpkt.h file. The required information in RMA lock queue is only the rank, but not actual VC. Here we replace VC with rank. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Split RMA functionality into smaller files, and move functions to where they belong based on the file names. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-