1. 13 Feb, 2015 5 commits
    • Xin Zhao's avatar
    • Xin Zhao's avatar
      Bug-fix: use do_accumulate_op function for ACC computation. · c8ecef8d
      Xin Zhao authored
      
      
      do_accumulate_op() does more comprehensive work on ACC
      computation than OP function. For example, MPI_REPLACE
      is not defined as predefined computation and therefore
      not handled by OP function, but it is safely handled
      in do_accumulate_op(). This patch replace OP function
      with do_accumulate_op() on target side.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c8ecef8d
    • Xin Zhao's avatar
      Change argument of function finish_op_on_target. · 1b30ab19
      Xin Zhao authored
      
      
      In this patch, we replace one argument of function
      finish_op_on_target, "packet(op) type", with "has_response_data".
      Since finish_op_on_target does not care what specific
      packet(op) type it is processing on, but only cares
      about if the current op has response data (like GET/GACC),
      changing the argument in this way can simplify the
      code by avoiding acquiring packet(op) type everytime
      before calling finish_op_on_target.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1b30ab19
    • Xin Zhao's avatar
      Rewrite code of piggybacking IMMED data with RMA packets. · de9d0f21
      Xin Zhao authored
      
      
      Originally we add "immed_data" and "immed_len" areas to RMA packets,
      in order to piggyback small amount of data with packet header to
      reduce number of packets (Note that "immed_len" is necessary when
      the piggybacked data is not the entire data). However, those areas
      potentially increase the packet union size and worsen the two-sided
      communication. This patch fixes this issue.
      
      In this patch, we remove "immed_data" and "immed_len" from normal
      "MPIDI_CH3_Pkt_XXX_t" operation type (e.g. MPIDI_CH3_Pkt_put_t), and
      we introduce new "MPIDI_CH3_Pkt_XXX_immed_t" packt type for each
      operation (e.g. MPIDI_CH3_Pkt_put_immed_t).
      
      "MPIDI_CH3_Pkt_XXX_immed_t" is used when (1) both origin and target
      are basic datatypes, AND, (2) the data to be sent can be entirely fit
      into the header. By doing this, "MPIDI_CH3_Pkt_XXX_immed_t" needs
      "immed_data" area but can drop "immed_len" area. Also, since it only
      works with basic target datatype, it can drop "dataloop_size" area
      as well. All operations that do not satisfy (1) or (2) will use
      normal "MPIDI_CH3_Pkt_XXX_t" type.
      
      Originally we always piggyback FOP data into the packet header,
      which makes the packet size too large. In this patch we split the
      FOP operaton into IMMED packets and normal packets.
      
      Because CAS only work with 2 basic datatype and non-complex
      elements, the data amount is relatively small, we always piggyback
      the data with packet header and only use "MPIDI_CH3_Pkt_XXX_immed_t"
      packet type for CAS.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      de9d0f21
    • Xin Zhao's avatar
      Remove lock_type and origin_rank areas from RMA packet. · 81e2b274
      Xin Zhao authored
      
      
      Originally we added lock_type and origin_rank areas
      in RMA packet, in order to piggyback passive lock request
      with RMA operations. However, those areas potentially
      enlarged the packet union size, and actually they are
      not necessary and can be completetly avoided.
      
      "Lock_type" is used to remember what types of lock (shared or
      exclusive) the origin wants to acquire on the target. To remove
      it from RMA packet, we use flags (already exists in RMA packet)
      to remember such information.
      
      "Origin_rank" is used to remember which origin has sent lock
      request to the target, so that when the lock is granted to this
      origin later, the target can send ack to that origin. Actually
      the target does not need to store origin_rank but can only store
      origin_vc, which is known from progress engine on target side.
      Therefore, we can completely remove origin_rank from RMA packet.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      81e2b274
  2. 08 Feb, 2015 1 commit
    • Xin Zhao's avatar
      Bug-fix: guarantee atomicity for FOP and GACC. · bad898f9
      Xin Zhao authored
      
      
      FOP, CAS and GACC are atomic "read-modify-write" operations,
      which means when the target window is defined on a SHM region,
      we need inter-process lock to guarantee the atomicity of the
      entire "read+OP". The current implementation is correct for
      SHM-based RMA operations, but not correct for AM-based RMA
      operations: for SHM-based operations, it protects the entire
      "read+OP", but for AM-based operations, it only protects the
      "OP" part.
      
      This patch fixes this issue by protecting the memory copy to
      temporary buffer and computation together for AM-based operations.
      
      Fix ticket 2226
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      bad898f9
  3. 16 Dec, 2014 9 commits
  4. 13 Nov, 2014 3 commits
    • Xin Zhao's avatar
      Split shared request handler. · 88d34091
      Xin Zhao authored
      
      
      ReqHandler_GaccumLikeSendComplete is used for GACC-like operations,
      including GACC, CAS and FOP. Here we split it into following three
      functions:
      
      ReqHandler_GaccumSendComplete
      ReqHandler_CASSendComplete
      ReqHandler_FOPSendComplete
      
      It is convenient for us to add different actions in future for those
      three kinds of operations.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      88d34091
    • Xin Zhao's avatar
      Code-refactoring: wrapping up action of finishing op on target. · 8b1a69b9
      Xin Zhao authored
      
      
      Here we wrap up common action when one RMA op is finished on target
      into a function to make code structure cleaner.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      8b1a69b9
    • Xin Zhao's avatar
      Perf-tuning: issue FLUSH, FLUSH ACK, UNLOCK ACK messages only when needed. · a9d968cc
      Xin Zhao authored
      
      
      When operation pending list and request lists are all empty, FLUSH message
      needs to be sent by origin only when origin issued PUT/ACC operations since
      the last synchronization calls, otherwise origin does not need to issue FLUSH
      at all and does not need to wait for FLUSH ACK message.
      
      Similiarly, origin waits for ACK of UNLOCK message only when origin issued
      PUT/ACC operations since the last synchronization calls. However, UNLOCK
      message always needs to be sent out because origin needs to unlock the
      target process. This patch avoids issuing unnecessary
      FLUSH / FLUSH ACK / UNLOCK ACK messages.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      a9d968cc
  5. 04 Nov, 2014 2 commits
    • Min Si's avatar
      Bug-fix: trigger OnFinal at end when receiving multiple segments. · ea444c34
      Min Si authored
      
      
      There are two request handlers used when receiving data:
      (1) OnDataAvail, which is triggered when data is arrived;
      (2) OnFinal, which is triggered when receiving data is finished;
      
      In progress engine, only OnDataAvail is triggered when a request is
      completed. The upper ch3 layer should change OnDataAvail to OnFinal when
      the coming receiving data will complete the request.
      
      However, in the original implementation, when receiving multiple
      segments for a large receive data, the OnDataAvail was reset to 0
      at the last segment hence the final action was lost. This patch fixed
      this bug.
      
      In RMA target put/acc/gacc packet handlers, OnDataAvail was reset to
      OnFinal function if OnDataAvail is 0 due to this bug. This patch also
      rewrites this part so that packet handlers only sets proper OnFinal
      handler at beginning and let the receiving data function change
      OnDataAvail to OnFinal at the last segment.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      ea444c34
    • Min Si's avatar
      Implement true request-based RMA operations. · 3e005f03
      Min Si authored
      
      
      There are two requests associated with each request-based
      operation: one normal internal request (req) and one newly
      added user request (ureq). We return ureq to user when
      request-based op call returns.
      
      The ureq is initialized with completion counter (CC) to 1
      and ref count to 2 (one is referenced by CH3 and another
      is referenced by user). If the corresponding op can be
      finished immediately in CH3, the runtime will complete ureq
      in CH3, and let user's MPI_Wait/Test to destroy ureq. If
      corresponding op cannot be finished immediately, we will
      first increment ref count to 3 (because now there are
      three places needed to reference ureq: user, CH3,
      progress engine). Progress engine will complete ureq when
      op is completed, then CH3 will release its reference during
      garbage collection, finally user's MPI_Wait/Test will
      destroy ureq.
      
      The ureq can be completed in following three ways:
      
      1. If op is issued and completed immediately in CH3
      (req is NULL), we just complete ureq before free op.
      
      2. If op is issued but not completed, we remember the ureq
      handler in req and specify OnDataAvail / OnFinal handlers
      in req to a newly added request handler, which will complete
      user reqeust. The handler is triggered at three places:
         2-a. when progress engine completes a put/acc req;
         2-b. when get/getacc handler completes a get/getacc req;
         2-c. when progress engine completes a get/getacc req;
      
      3. If op is not issued (i.e., wait for lock granted), the 2nd
      way will be eventually performed when such op is issued by
      progress engine.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      3e005f03
  6. 03 Nov, 2014 14 commits
    • Xin Zhao's avatar
      add original RMA PVARs back. · ed20cd37
      Xin Zhao authored
      
      
      Add some original RMA PVARs back to the new
      RMA infrastructure, including timing of packet
      handlers, op allocation and setting, window
      creation, etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ed20cd37
    • Xin Zhao's avatar
      Delete no longer needed code. · cc63b367
      Xin Zhao authored
      
      
      We made a huge change to RMA infrastructure and
      a lot of old code can be droped, including separate
      handlers for lock-op-unlock, ACCUM_IMMED specific
      code, O(p) data structure code, code of lazy issuing,
      etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      cc63b367
    • Xin Zhao's avatar
      Rewrite code of passive lock control messages. · 0542e304
      Xin Zhao authored
      
      
      1. Piggyback LOCK request with first IMMED operation.
      
      When we see an IMMED operation, we can always piggyback
      LOCK request with that operation to reduce one sync
      message of single LOCK request. When packet header of
      that operation is received on target, we will try to
      acquire the lock and perform that operation. The target
      either piggybacks LOCK_GRANTED message with the response
      packet (if available), or sends a single LOCK_GRANTED
      message back to origin.
      
      2. Rewrite code of manage lock queue.
      
      When the lock request cannot be satisfied on target,
      we need to buffer that lock request on target. All we
      need to do is enqueuing the packet header, which contains
      all information we need after lock is granted. When
      the current lock is released, the runtime will goes
      over the lock queue and grant the lock to the next
      available request. After lock is granted, the runtime
      just trigger the packet handler for the second time.
      
      3. Release lock on target side if piggybacking with UNLOCK.
      
      If there are active-message operations to be issued,
      we piggyback a UNLOCK flag with the last operation.
      When the target recieves it, it will release the current
      lock and grant the lock to the next process.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0542e304
    • Xin Zhao's avatar
      Simplify PktHandler_FOP and PktHandler_FOPResp. · a42b916d
      Xin Zhao authored
      
      
      For FOP operation, all data can be fit into the packet
      header, so on origin side we do not need to send separate
      data packets, and on target side we do not need request
      handler, only packet handler is needed. Similar with FOP
      response packet, we can receive all data in FOP resp packet
      handler. This patch delete the request handler on target
      side and simplify packet handler on target / origin side.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      a42b916d
    • Xin Zhao's avatar
      Add IMMED area in packet header. · e8d4c6d5
      Xin Zhao authored
      
      
      We add a IMMED data area (16 bytes by default) in
      packet header which will contains as much origin
      data as possible. If origin can put all data in
      packet header, then it no longer needs to send
      separate data packet. When target recieves the
      packet header, it will first copy data out from
      the IMMED data area. If there is still more
      data coming, it continues to receive following
      packets; if all data is included in header, then
      recieving is done.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      e8d4c6d5
    • Xin Zhao's avatar
      Decrement Active Target counter at target side. · b73778ea
      Xin Zhao authored
      
      
      During PSCW, when there are active-message operations
      to be issued in Win_complete, we piggback a AT_COMPLETE
      flag with it so that when target receives it, it can
      decrement a counter on target side and detect completion
      when target counter reaches zero.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      b73778ea
    • Xin Zhao's avatar
      Detect remote completion by FLUSH / FLUSH_ACK messages. · 6578785d
      Xin Zhao authored
      
      
      When the origin wants to do a FLUSH sync, if there are
      active-message operations that are going to be issued,
      we piggback the FLUSH message with the last operation;
      if no such operations, we just send a single FLUSH packet.
      
      If the last operation is a write op (PUT, ACC) or only
      a single FLUSH packet is sent, after target recieves it,
      target will send back a single FLUSH_ACK packet;
      if the last operation contains a read action (GET, GACC, FOP,
      CAS), after target receiveds it, target will piggback a
      FLUSH_ACK flag with the response packet.
      
      After origin receives the FLUSH_ACK packet or response packet
      with FLUSH_ACK flag, it will decrement the counter which
      indicates number of outgoing sync messages (FLUSH / UNLOCK).
      When that counter reaches zero, origin can know that remote
      completion is achieved.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      6578785d
    • Xin Zhao's avatar
      Separate request handler of PUT, ACC, GACC and rename them. · fe15ea26
      Xin Zhao authored
      
      
      Separate final request handler of PUT, ACC, GACC into three.
      Separate derived DT request handler of ACC and GACC into two.
      
      Renaming request handlers as follows:
      
      (1) Normal request handler: it is triggered on target side
          when all data from origin is received.
      
          It includes:
      
          ReqHandler_PutRecvComplete --- for PUT
          ReqHandler_AccumRecvComplete --- for ACC
          ReqHandler_GaccumRecvComplete --- for GACC
      
      (2) Derived DT request handler: it is triggered on target
          side when all derived DT info is recieved.
      
          It includes:
      
          ReqHandler_PutDerivedDTRecvComplete --- for PUT
          ReqHandler_AccumDerivedDTRecvComplete --- for ACC
          ReqHandler_GaccumDerivedDTRecvComplete --- for GACC
      
      (3) Reponse request handler: it is triggered on target
          side when sending back process is finished in GET-like
          operations.
      
          It includes:
      
          ReqHandler_GetSendComplete --- for GET
          ReqHandler_GaccumLikeSendComplete --- for GACC, FOP, CAS
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      fe15ea26
    • Xin Zhao's avatar
      Split shared RMA packet structures. · c0094faa
      Xin Zhao authored
      
      
      Previously several RMA packet types share the same structure,
      which is misleading for coding. Here make different
      RMA packet types use different packet data structures.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c0094faa
    • Xin Zhao's avatar
      bfbb1048
    • Xin Zhao's avatar
      Embedding packet structure into RMA operation structure. · b1685139
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      We were duplicating information in the operation structure and in the
      packet structure when the message is actually issued.  Since most of
      the information is the same anyway, this patch just embeds a packet
      structure into the operation structure, so that we eliminate unnessary
      copy.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      b1685139
    • Xin Zhao's avatar
      Rename ACK packets in RMA. · ba1a400c
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The packet type MPIDI_CH3_PKT_PT_RMA_DONE is used for ACK
      of FLUSH / UNLOCK packets. Here we rename it to
      MPIDI_CH3_PKT_FLUSH_ACK and modify the related functions
      and data structures.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ba1a400c
    • Xin Zhao's avatar
      Avoid using VC in RMA lock queue structure. · 0eaf344b
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      We were adding an unnecessary dependency on VC structure
      declarations in the mpidpkt.h file. The required information
      in RMA lock queue is only the rank, but not actual VC.
      Here we replace VC with rank.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0eaf344b
    • Xin Zhao's avatar
      Code refactoring to clean up the RMA code. · 61f952c7
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      Split RMA functionality into smaller files, and move functions
      to where they belong based on the file names.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      61f952c7