1. 12 Jun, 2015 3 commits
  2. 09 Mar, 2015 1 commit
  3. 04 Mar, 2015 5 commits
  4. 03 Mar, 2015 1 commit
  5. 13 Feb, 2015 2 commits
    • Xin Zhao's avatar
      Rewrite code of piggybacking IMMED data with RMA packets. · de9d0f21
      Xin Zhao authored
      
      
      Originally we add "immed_data" and "immed_len" areas to RMA packets,
      in order to piggyback small amount of data with packet header to
      reduce number of packets (Note that "immed_len" is necessary when
      the piggybacked data is not the entire data). However, those areas
      potentially increase the packet union size and worsen the two-sided
      communication. This patch fixes this issue.
      
      In this patch, we remove "immed_data" and "immed_len" from normal
      "MPIDI_CH3_Pkt_XXX_t" operation type (e.g. MPIDI_CH3_Pkt_put_t), and
      we introduce new "MPIDI_CH3_Pkt_XXX_immed_t" packt type for each
      operation (e.g. MPIDI_CH3_Pkt_put_immed_t).
      
      "MPIDI_CH3_Pkt_XXX_immed_t" is used when (1) both origin and target
      are basic datatypes, AND, (2) the data to be sent can be entirely fit
      into the header. By doing this, "MPIDI_CH3_Pkt_XXX_immed_t" needs
      "immed_data" area but can drop "immed_len" area. Also, since it only
      works with basic target datatype, it can drop "dataloop_size" area
      as well. All operations that do not satisfy (1) or (2) will use
      normal "MPIDI_CH3_Pkt_XXX_t" type.
      
      Originally we always piggyback FOP data into the packet header,
      which makes the packet size too large. In this patch we split the
      FOP operaton into IMMED packets and normal packets.
      
      Because CAS only work with 2 basic datatype and non-complex
      elements, the data amount is relatively small, we always piggyback
      the data with packet header and only use "MPIDI_CH3_Pkt_XXX_immed_t"
      packet type for CAS.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      de9d0f21
    • Xin Zhao's avatar
      Remove lock_type and origin_rank areas from RMA packet. · 81e2b274
      Xin Zhao authored
      
      
      Originally we added lock_type and origin_rank areas
      in RMA packet, in order to piggyback passive lock request
      with RMA operations. However, those areas potentially
      enlarged the packet union size, and actually they are
      not necessary and can be completetly avoided.
      
      "Lock_type" is used to remember what types of lock (shared or
      exclusive) the origin wants to acquire on the target. To remove
      it from RMA packet, we use flags (already exists in RMA packet)
      to remember such information.
      
      "Origin_rank" is used to remember which origin has sent lock
      request to the target, so that when the lock is granted to this
      origin later, the target can send ack to that origin. Actually
      the target does not need to store origin_rank but can only store
      origin_vc, which is known from progress engine on target side.
      Therefore, we can completely remove origin_rank from RMA packet.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      81e2b274
  6. 16 Dec, 2014 7 commits
    • Xin Zhao's avatar
      Support handling different LOCK ACKs · 45afd1fd
      Xin Zhao authored
      No reviewer.
      45afd1fd
    • Xin Zhao's avatar
      Code-refactor: Move send_flush_msg function to header file. · 2b53ff69
      Xin Zhao authored
      No reviewer.
      2b53ff69
    • Xin Zhao's avatar
      Re-organize progress engine functions. · 1962d3b1
      Xin Zhao authored
      Rewrite progress engine functions as following:
      
      Basic functions:
      
      (1) check_target_state: check to see if we can switch target state,
          issue synchronization messages if needed.
      (2) issue_ops_target: issue al pending operations to this target.
      (3) check_window_state: check to see if we can switch window state.
      (4) issue_ops_win: issue all pending operations on this window.
          Currently it internally calls check_target_state and
          issue_ops_target, it should be optimized in future.
      
      Progress making functions:
      
      (1) Make_progress_target: make progress on one target, which
          internally call check_target_state and issue_ops_target.
      (2) Make_progress_win: make progress on all targets on one window,
          which internally call check_window_state and issue_ops_win.
      (3) Make_progress_global: make progress on all windows, which
          internally call make_progress_win.
      
      No reviewer.
      1962d3b1
    • Xin Zhao's avatar
      Modify struct name: replace "struct XXX" with "XXX_t" · 7c533ef3
      Xin Zhao authored
      No reviewer.
      7c533ef3
    • Xin Zhao's avatar
      Bug-fix: modify free_ops_before_completion function · 04d15190
      Xin Zhao authored
      Originally free_ops_before_completion functions only
      works with active target. Here we modify it to accomodate
      passive target as well.
      
      Also, everytime we trigger free_ops_before_completion,
      we lose the chance to do real Win_flush_local operation
      and must do a Win_flush instead. Here we transfer
      Win_flush_local to Win_flush if disable_flush_local flag
      is set, and unset that flag after the current flush
      is fone.
      
      No reviewer.
      04d15190
    • Xin Zhao's avatar
      Bug-fix: set put_acc_issued flag correctly · cc158ff2
      Xin Zhao authored
      No reviewer.
      cc158ff2
    • Xin Zhao's avatar
      Perf-optimize: avoid FLUSH/FLUSH_ACK messages if no PUT/ACC. · 2493e98b
      Xin Zhao authored
      No reviewer.
      2493e98b
  7. 13 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Perf-tuning: issue FLUSH, FLUSH ACK, UNLOCK ACK messages only when needed. · a9d968cc
      Xin Zhao authored
      
      
      When operation pending list and request lists are all empty, FLUSH message
      needs to be sent by origin only when origin issued PUT/ACC operations since
      the last synchronization calls, otherwise origin does not need to issue FLUSH
      at all and does not need to wait for FLUSH ACK message.
      
      Similiarly, origin waits for ACK of UNLOCK message only when origin issued
      PUT/ACC operations since the last synchronization calls. However, UNLOCK
      message always needs to be sent out because origin needs to unlock the
      target process. This patch avoids issuing unnecessary
      FLUSH / FLUSH ACK / UNLOCK ACK messages.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      a9d968cc
  8. 07 Nov, 2014 1 commit
  9. 04 Nov, 2014 1 commit
    • Min Si's avatar
      Implement true request-based RMA operations. · 3e005f03
      Min Si authored
      
      
      There are two requests associated with each request-based
      operation: one normal internal request (req) and one newly
      added user request (ureq). We return ureq to user when
      request-based op call returns.
      
      The ureq is initialized with completion counter (CC) to 1
      and ref count to 2 (one is referenced by CH3 and another
      is referenced by user). If the corresponding op can be
      finished immediately in CH3, the runtime will complete ureq
      in CH3, and let user's MPI_Wait/Test to destroy ureq. If
      corresponding op cannot be finished immediately, we will
      first increment ref count to 3 (because now there are
      three places needed to reference ureq: user, CH3,
      progress engine). Progress engine will complete ureq when
      op is completed, then CH3 will release its reference during
      garbage collection, finally user's MPI_Wait/Test will
      destroy ureq.
      
      The ureq can be completed in following three ways:
      
      1. If op is issued and completed immediately in CH3
      (req is NULL), we just complete ureq before free op.
      
      2. If op is issued but not completed, we remember the ureq
      handler in req and specify OnDataAvail / OnFinal handlers
      in req to a newly added request handler, which will complete
      user reqeust. The handler is triggered at three places:
         2-a. when progress engine completes a put/acc req;
         2-b. when get/getacc handler completes a get/getacc req;
         2-c. when progress engine completes a get/getacc req;
      
      3. If op is not issued (i.e., wait for lock granted), the 2nd
      way will be eventually performed when such op is issued by
      progress engine.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      3e005f03
  10. 03 Nov, 2014 10 commits
    • Xin Zhao's avatar
      Delete no longer needed code. · cc63b367
      Xin Zhao authored
      
      
      We made a huge change to RMA infrastructure and
      a lot of old code can be droped, including separate
      handlers for lock-op-unlock, ACCUM_IMMED specific
      code, O(p) data structure code, code of lazy issuing,
      etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      cc63b367
    • Xin Zhao's avatar
      Simplify issuing functions at origin side. · 52c2fc11
      Xin Zhao authored
      
      
      Here we extract the common code of different
      issuing functions at origin side and simplify
      those issuing functions.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      52c2fc11
    • Xin Zhao's avatar
      Split shared RMA packet structures. · c0094faa
      Xin Zhao authored
      
      
      Previously several RMA packet types share the same structure,
      which is misleading for coding. Here make different
      RMA packet types use different packet data structures.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c0094faa
    • Xin Zhao's avatar
      Rewrite all synchronization routines. · 38b20e57
      Xin Zhao authored
      
      
      We use new algorithms for RMA synchronization
      functions and RMA epochs. The old implementation
      uses a lazy-issuing algorithm, which queues up
      all operations and issues them at end. This
      forbid opportunites to do hardware RMA operations
      and can use up all memory resources when we
      queue up large number of operations.
      
      Here we use a new algorithm, which will initialize
      the synchonization at beginning, and issue operations
      as soon as the synchronization is finished.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      38b20e57
    • Xin Zhao's avatar
      Control no. of active RMA requests in the runtime. · 257faca2
      Xin Zhao authored
      
      
      When there are too many active requests in the runtime,
      the internal memory might be used up. This patch
      prevents such situation by triggering blocking
      wait loop in operation routines when no. of active
      requests reaches certain threshold value.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      257faca2
    • Xin Zhao's avatar
      Free incomplete ops when FLUSH ordering is provided. · 7c1e12f0
      Xin Zhao authored
      
      
      When FLUSH sync is issued and remote completion
      ordering between the last FLUSH message and all
      previous ops is provided by curent hardware, we
      no longer need to maintain incomplete operations
      but only need to wait for the ACK of current
      FLUSH. Therefore we can free those operation
      resources without blocking waiting.
      
      Not that if we do this, we temporarily lose the
      opportunity to do a real FLUSH_LOCAl until the
      current FLUSH ACK is received.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      7c1e12f0
    • Xin Zhao's avatar
      Add blocking ops / targets aggressively cleanup functions. · 41a365ec
      Xin Zhao authored
      
      
      When we run out of resources for operations and targets,
      we need to make the runtime to complete some operations
      so that it can free some resources.
      
      For RMA operations, we implement by doing an internal
      FLUSH_LOCAL for one target and waiting for operation
      resources; for RMA targets, we implement by doing an
      internal FLUSH operation for one target and wait for
      target resources.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      41a365ec
    • Xin Zhao's avatar
      Add nonblocking progress making functions. · ab058906
      Xin Zhao authored
      
      
      Progress making functions check if current
      synchronization is finished, change synchronization
      state if possible, and issue pending operations
      on window as many as possible.
      
      There are three granularity of progress making functions:
      per-target, per-window and per-process. Per-target
      routine is used in RMA routine functions (PUT/GET/ACC...)
      and single passive lock (Win_unlock, Win_flush, Win_flush_local);
      per-window routine is used in window-wide synchronization
      calls (Win_fence, Win_complete, Win_unlock_all,
      Win_flush_all, Win_flush_local_all), and per-process
      routine is used in progress engine.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ab058906
    • Xin Zhao's avatar
      Embedding packet structure into RMA operation structure. · b1685139
      Xin Zhao authored
      
      
      We were duplicating information in the operation structure and in the
      packet structure when the message is actually issued.  Since most of
      the information is the same anyway, this patch just embeds a packet
      structure into the operation structure, so that we eliminate unnessary
      copy.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      b1685139
    • Xin Zhao's avatar
      Code refactoring to clean up the RMA code. · 61f952c7
      Xin Zhao authored
      
      
      Split RMA functionality into smaller files, and move functions
      to where they belong based on the file names.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      61f952c7