1. 04 Mar, 2015 7 commits
    • Xin Zhao's avatar
      Add counter in op struct to remember number of stream units issued. · c986b927
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      Add a counter in op struct to remember number of stream units
      that have already been issued. For example, when the first stream
      unit piggybacked with LOCK is issued out, we temporarily stop
      issuing the following units. After the origin receives the ACK
      from the target, it can continue to issue the following units.
      This counter helps avoid issuing the first unit again.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c986b927
    • Xin Zhao's avatar
      Use a request array in RMA operation. · ab8386e7
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      Because we may cut one RMA operation into multiple packets,
      and each packet needs a request object to track the completion,
      here we use a request array instead of single request in
      RMA operation structure.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ab8386e7
    • Xin Zhao's avatar
      Code refactoring: setting ureq after issuing op in a function. · a36fd9dd
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      After (1) issuing an op (no LOCK flag), or (2) issuing an op
      (with LOCK flag) and receiving an ACK that LOCK is granted or
      queued, we should set the user request (ureq) to be completed.
      This patch wraps up the work of setting ureq into a function,
      and call that function after (1) and (2) happens.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      a36fd9dd
    • Xin Zhao's avatar
      Modify location of setting next_op_to_issue and sync_flag to NONE · fd92b7bc
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      After we issue an op, we set the next_op_to_issue to the next op,
      and if next op is NULL, we set sync_flag to NONE. When we receive
      the lock ACK saying that lock request is discarded, we set the
      next_op_to_issue back to the current op, we reset the sync_flag
      from NONE to corresponding flag, since we need to re-transmit the
      current op.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      fd92b7bc
    • Xin Zhao's avatar
      Change name from data_size to buf_size. · 45cdb282
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      When the lock is not satisfied, we queue up
      the lock request and op data in a lock entry
      queue. In the struct of lock entry, we use 'data_size'
      to remember the size of buffer for storing the
      data. Since the size of buffer is not type_size*count
      but might be type_extent*extent, here we change
      its name from 'data_size' to 'buf_size'.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      45cdb282
    • Xin Zhao's avatar
      Bug-fix: make RMA work correctly with pair basic type. · ce8bc310
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The original implementation of RMA does not consider pair basic
      types (e.g. MPI_FLOAT_INT, MPI_DOUBLE_INT). It only
      works correctly with builtin datatypes (e.g. MPI_INT, MPI_FLOAT).
      This patch makes the RMA work correctly with pair basic types.
      
      The bug is that: (1) when performing the ACC computation, the original
      implementation uses 'eltype' in the datatype structure, which is set
      when all basic elements in this datatype have the same builtin
      datatype. When basic elements have different builtin datatypes, like
      pair datatypes, the 'eltype' is set to MPI_DATATYPE_NULL. This makes
      the ACC computation be unable to work with pair types; (2) for all
      basic type of data, the original implementation assumes that
      they are all contiguous and issues them in an unpacked manner
      with length of data size (count*type_size). This is incorrect for
      pair datatypes, because most pair datatypes are non-contiguous
      (type_extent != type_size).
      
      In the previous patch, we already made 'eltype' to store basic
      type instead of builtin type. In this patch, we fixed this
      bug by (1) modify ACC computation to treat 'eltype' as basic
      type; (2) For non-contiguous basic type data, we use the noncontig
      API so that it will be issued in a packed manner.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ce8bc310
    • Xin Zhao's avatar
      Store window basic attributes into a struct on window. · 9404e953
      Xin Zhao authored
      
      
      In this patch, we gather window basic attributes of other
      processes (base_addr, size, disp_unit, win_handle) using a
      struct called "basic_info_table". By doing this, we can use
      one contiguous memory region to store them.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      9404e953
  2. 03 Mar, 2015 1 commit
  3. 13 Feb, 2015 4 commits
    • Xin Zhao's avatar
      Remove source_win_handle from GET-like RMA packets. · 80a71e11
      Xin Zhao authored
      
      
      For GET-like RMA packets and response packets (GACC,
      GET, FOP, CAS, GACC_RESP, GET_RESP, FOP_RESP, CAS_RESP),
      originally we carry source_win_handle in packet struct
      in order to locate window handle on origin side in the
      packet handler of response packets. However, this is
      not necessary because source_win_handle can be stored
      in the request on the origin side. This patch delete
      source_win_handle from those packets to reduce the size
      of packet union.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      80a71e11
    • Xin Zhao's avatar
      Change argument of function finish_op_on_target. · 1b30ab19
      Xin Zhao authored
      
      
      In this patch, we replace one argument of function
      finish_op_on_target, "packet(op) type", with "has_response_data".
      Since finish_op_on_target does not care what specific
      packet(op) type it is processing on, but only cares
      about if the current op has response data (like GET/GACC),
      changing the argument in this way can simplify the
      code by avoiding acquiring packet(op) type everytime
      before calling finish_op_on_target.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1b30ab19
    • Xin Zhao's avatar
      Rewrite code of piggybacking IMMED data with RMA packets. · de9d0f21
      Xin Zhao authored
      
      
      Originally we add "immed_data" and "immed_len" areas to RMA packets,
      in order to piggyback small amount of data with packet header to
      reduce number of packets (Note that "immed_len" is necessary when
      the piggybacked data is not the entire data). However, those areas
      potentially increase the packet union size and worsen the two-sided
      communication. This patch fixes this issue.
      
      In this patch, we remove "immed_data" and "immed_len" from normal
      "MPIDI_CH3_Pkt_XXX_t" operation type (e.g. MPIDI_CH3_Pkt_put_t), and
      we introduce new "MPIDI_CH3_Pkt_XXX_immed_t" packt type for each
      operation (e.g. MPIDI_CH3_Pkt_put_immed_t).
      
      "MPIDI_CH3_Pkt_XXX_immed_t" is used when (1) both origin and target
      are basic datatypes, AND, (2) the data to be sent can be entirely fit
      into the header. By doing this, "MPIDI_CH3_Pkt_XXX_immed_t" needs
      "immed_data" area but can drop "immed_len" area. Also, since it only
      works with basic target datatype, it can drop "dataloop_size" area
      as well. All operations that do not satisfy (1) or (2) will use
      normal "MPIDI_CH3_Pkt_XXX_t" type.
      
      Originally we always piggyback FOP data into the packet header,
      which makes the packet size too large. In this patch we split the
      FOP operaton into IMMED packets and normal packets.
      
      Because CAS only work with 2 basic datatype and non-complex
      elements, the data amount is relatively small, we always piggyback
      the data with packet header and only use "MPIDI_CH3_Pkt_XXX_immed_t"
      packet type for CAS.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      de9d0f21
    • Xin Zhao's avatar
      Remove lock_type and origin_rank areas from RMA packet. · 81e2b274
      Xin Zhao authored
      
      
      Originally we added lock_type and origin_rank areas
      in RMA packet, in order to piggyback passive lock request
      with RMA operations. However, those areas potentially
      enlarged the packet union size, and actually they are
      not necessary and can be completetly avoided.
      
      "Lock_type" is used to remember what types of lock (shared or
      exclusive) the origin wants to acquire on the target. To remove
      it from RMA packet, we use flags (already exists in RMA packet)
      to remember such information.
      
      "Origin_rank" is used to remember which origin has sent lock
      request to the target, so that when the lock is granted to this
      origin later, the target can send ack to that origin. Actually
      the target does not need to store origin_rank but can only store
      origin_vc, which is known from progress engine on target side.
      Therefore, we can completely remove origin_rank from RMA packet.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      81e2b274
  4. 16 Dec, 2014 17 commits
  5. 13 Nov, 2014 3 commits
  6. 03 Nov, 2014 8 commits
    • Xin Zhao's avatar
      add original RMA PVARs back. · ed20cd37
      Xin Zhao authored
      
      
      Add some original RMA PVARs back to the new
      RMA infrastructure, including timing of packet
      handlers, op allocation and setting, window
      creation, etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ed20cd37
    • Xin Zhao's avatar
      Delete no longer needed code. · cc63b367
      Xin Zhao authored
      
      
      We made a huge change to RMA infrastructure and
      a lot of old code can be droped, including separate
      handlers for lock-op-unlock, ACCUM_IMMED specific
      code, O(p) data structure code, code of lazy issuing,
      etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      cc63b367
    • Xin Zhao's avatar
      Rewrite code of passive lock control messages. · 0542e304
      Xin Zhao authored
      
      
      1. Piggyback LOCK request with first IMMED operation.
      
      When we see an IMMED operation, we can always piggyback
      LOCK request with that operation to reduce one sync
      message of single LOCK request. When packet header of
      that operation is received on target, we will try to
      acquire the lock and perform that operation. The target
      either piggybacks LOCK_GRANTED message with the response
      packet (if available), or sends a single LOCK_GRANTED
      message back to origin.
      
      2. Rewrite code of manage lock queue.
      
      When the lock request cannot be satisfied on target,
      we need to buffer that lock request on target. All we
      need to do is enqueuing the packet header, which contains
      all information we need after lock is granted. When
      the current lock is released, the runtime will goes
      over the lock queue and grant the lock to the next
      available request. After lock is granted, the runtime
      just trigger the packet handler for the second time.
      
      3. Release lock on target side if piggybacking with UNLOCK.
      
      If there are active-message operations to be issued,
      we piggyback a UNLOCK flag with the last operation.
      When the target recieves it, it will release the current
      lock and grant the lock to the next process.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0542e304
    • Xin Zhao's avatar
      Simplify issuing functions at origin side. · 52c2fc11
      Xin Zhao authored
      
      
      Here we extract the common code of different
      issuing functions at origin side and simplify
      those issuing functions.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      52c2fc11
    • Xin Zhao's avatar
      Add IMMED area in packet header. · e8d4c6d5
      Xin Zhao authored
      
      
      We add a IMMED data area (16 bytes by default) in
      packet header which will contains as much origin
      data as possible. If origin can put all data in
      packet header, then it no longer needs to send
      separate data packet. When target recieves the
      packet header, it will first copy data out from
      the IMMED data area. If there is still more
      data coming, it continues to receive following
      packets; if all data is included in header, then
      recieving is done.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      e8d4c6d5
    • Xin Zhao's avatar
      Decrement Active Target counter at target side. · b73778ea
      Xin Zhao authored
      
      
      During PSCW, when there are active-message operations
      to be issued in Win_complete, we piggback a AT_COMPLETE
      flag with it so that when target receives it, it can
      decrement a counter on target side and detect completion
      when target counter reaches zero.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      b73778ea
    • Xin Zhao's avatar
      Detect remote completion by FLUSH / FLUSH_ACK messages. · 6578785d
      Xin Zhao authored
      
      
      When the origin wants to do a FLUSH sync, if there are
      active-message operations that are going to be issued,
      we piggback the FLUSH message with the last operation;
      if no such operations, we just send a single FLUSH packet.
      
      If the last operation is a write op (PUT, ACC) or only
      a single FLUSH packet is sent, after target recieves it,
      target will send back a single FLUSH_ACK packet;
      if the last operation contains a read action (GET, GACC, FOP,
      CAS), after target receiveds it, target will piggback a
      FLUSH_ACK flag with the response packet.
      
      After origin receives the FLUSH_ACK packet or response packet
      with FLUSH_ACK flag, it will decrement the counter which
      indicates number of outgoing sync messages (FLUSH / UNLOCK).
      When that counter reaches zero, origin can know that remote
      completion is achieved.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      6578785d
    • Xin Zhao's avatar
      Embedding packet structure into RMA operation structure. · b1685139
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      We were duplicating information in the operation structure and in the
      packet structure when the message is actually issued.  Since most of
      the information is the same anyway, this patch just embeds a packet
      structure into the operation structure, so that we eliminate unnessary
      copy.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      b1685139