1. 26 Jun, 2015 2 commits
  2. 22 Jun, 2015 1 commit
  3. 20 Jun, 2015 1 commit
  4. 19 Jun, 2015 1 commit
    • Rob Latham's avatar
      better approach for do_accumulate_op · f039eebb
      Rob Latham authored
      commit 83253a41
      
       triggerd a bunch of new warnings.  Take a different
      approach.  For simplicity of implementation, do_accumulate_op is defined
      as MPI_User_function.  We could split up internal routine and
      user-provided routines, but that complicates the code for little
      benefit:
      
      Instead, keep do_accumlate_op with an int type, but check for overflow
      before explicitly casting.  In many places the count is simply '1'.  In
      stream processing there is an interal limit of 256k, so the assertion
      should never fire.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      f039eebb
  5. 17 Jun, 2015 1 commit
  6. 16 Jun, 2015 1 commit
    • Lena Oden's avatar
      Handling of discard connection to avoid reconnect · ac07f982
      Lena Oden authored
      
      
      The loser of a head-to-head connection sometimes tries
      to reconnect later, afer MPI_Finalize was called  This
      can lead to several errors in the socket layer, depending
      on the state of the disarded connection and the appereance
      of the connection events. Refs #2180
      This Patch has two ways to handle this:
      
      1.)
      Discarded connections are marked with CONN_STATE_DISCARD,
      so they are hold from connection.  Furthermore, an error on
      any discarded connection (because the remote side closed in
      MPI_Finalize) is ignored and the connection is closed.
      
      2.)
      Add a finalize flag for process groups. If a process group is
      closing and tries to close all VCs, a flag is set to mark this.
      If the flag is set, a reconnection (in the socket state) is
      refused and the connection is closed on both sided.
      
      Both steps are necessary to catch all reconnection tries after
      MPI_Finalize was called.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      ac07f982
  7. 14 Jun, 2015 2 commits
    • Min Si's avatar
      Expose AM flush ordering and issue per OP flush if unordered. · 5324a41f
      Min Si authored
      
      
      This patch includes three changes:
      (1) Added netmod API get_ordering to allow netmod to expose the network
      ordering. A netmod may issue some packets via multiple connections in
      parallel if those packets (such as RMA) do not require ordering, and
      thus the packets may be unordered. This patch sets the network ordering
      in every existing netmod (tcp|mxm|ofi|portals|llc) to true, since all
      packets are sent orderly via one connection.
      (2) Nemesis exposes the window packet orderings such as AM flush
      ordering at init time. It supports ordered packets only when netmod
      supports ordered network.
      (3) If AM flush is ordered (flush must be finished after all previous
      operations), then CH3 RMA only requests FLUSH ACK on the last operation.
      Otherwise, CH3 must request per-OP FLUSH ACK to ensure all operations
      are remotely completed.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      5324a41f
    • Min Si's avatar
      Always free issued OPs when window resource is used up. · c83b6b2d
      Min Si authored
      
      
      When win resource is used up, the current code frees OPs before
      completion only if flush_remote is ordered. However, we can always free
      them even on out-of-order network. Because remote completion is waited
      by ack counter, and local completion (flush_local) is translated to
      remote completion (flush).
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c83b6b2d
  8. 12 Jun, 2015 14 commits
  9. 30 May, 2015 9 commits
  10. 20 Apr, 2015 2 commits
    • Xin Zhao's avatar
      Set size of IMMED data in RMA packets to 8 bytes. · de0412c2
      Xin Zhao authored
      
      
      Originally the size of IMMED data in RMA packets is 16 bytes
      which makes the size of CH3 packet be 56 bytes. Here we reduce
      the size of IMMED data in RMA packets to 8 bytes, so that the
      size of CH3 packet is reduced to 48 bytes, the same with
      mpich-3.1.4 (the old RMA infrastructure).
      Signed-off-by: default avatarMin Si <msi@il.is.s.u-tokyo.ac.jp>
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
      de0412c2
    • Xin Zhao's avatar
      Move 'stream_offset' out of RMA packet struct. · 19f29078
      Xin Zhao authored
      
      
      'stream_offset' is used to specify the starting position
      (on target window) of the current streaming unit in ACC-like
      operations. It is originally put in the RMA packet struct,
      which potentially increases the size of CH3 packet size.
      
      In this patch, we move 'stream_offset' out of the RMA
      packet as follows: 1. when target data is basic datatype,
      we use 'stream_offset' and the starting address for the entire
      operation to calculate the starting address for current
      streaming unit, and rewrite 'addr' in RMA packet with that
      value; 2. when target data is derived datatype, we cannot do
      the same thing as basic datatype because the target needs to
      know both the starting address for the entire operation and
      the starting address for the current streaming unit. Therefore,
      we send 'stream_offset' separately to the target side.
      Signed-off-by: default avatarMin Si <msi@il.is.s.u-tokyo.ac.jp>
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
      19f29078
  11. 09 Mar, 2015 1 commit
  12. 04 Mar, 2015 5 commits
    • Xin Zhao's avatar
      Rename predefined_type / predef_type to basic_type. · 04deb880
      Xin Zhao authored
      
      
      In MPI standard, predefined datatype is called as basic type.
      It is better to make the name same with the standard in the
      code.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      04deb880
    • Xin Zhao's avatar
      98c76f78
    • Xin Zhao's avatar
      Modify SHM ACC/GACC to avoid allocate large buffer. · 7c890ab2
      Xin Zhao authored
      
      
      The original implementation of ACC/GACC on SHM first
      allocates a temporary buffer which has the same data
      layout as the target data, copies the entire origin
      data to that temporary buffer, and then performs the
      ACC computation between the temporary buffer and the
      target buffer. The temporary buffer can use potentially
      large amount of memory.
      
      This patch fixes this issue as follows: (1) SHM ACC/GACC
      routines directly call do_accumulate_op() function, which
      requires the origin data to be in a 'packed manner';
      (2) if the origin data is basic type, we directly perform
      do_accumulate_op() between origin buffer and target buffer;
      if the origin data is derived, we stream the origin data
      by copying partial of origin data into a packed streaming
      buffer and performing do_accumulate_op() between the
      streaming buffer and target buffer each time.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      7c890ab2
    • Xin Zhao's avatar
      Allocate buffer with stream size for ACC/GACC data piggybacked with LOCK. · 002ce8c8
      Xin Zhao authored
      
      
      For queued ACC/GACC data piggybacked with LOCK, we do not
      need to allocate the buffer for the entire operation, but
      only need to allocate a buffer with stream unit size. This
      patch fixes this issue.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      002ce8c8
    • Xin Zhao's avatar
      Modify do_accumulate_op to allow for packed basic type data as input. · 0d5146ba
      Xin Zhao authored
      
      
      Originally, do_accumulate_op() is used to perform the ACC
      computation on target between data from origin side and
      data on the target window. It requires that the target side
      must first unpack the received origin data into the same data
      layout as the target data before calling this function, which
      may consume potentially large of memory.
      
      This patch fixes do_accumulate_op() function in the following
      aspects:
      
      (1) It requires that the origin data passed to the function
      must be "in a packed manner", which means it looks as if all
      basic type elements in the origin data is placed one by one.
      Note that the origin data is not necessarily contiguous, since
      we may use non-contiguous basic type. If the basic type
      is contiguous, then the origin data must be contiguous.
      
      (2) It adds a new function argument, stream_offset, which
      specifies a starting location in the target data. This allows
      the origin data to work with partial of target data with stream
      size.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      0d5146ba