1. 16 Dec, 2014 2 commits
    • Xin Zhao's avatar
      Change routine/pkt name from LOCK_GRANTED to LOCK_ACK · e36203c3
      Xin Zhao authored
      Because we will send different kinds of LOCK ACKs (not
      just LOCK_GRANTED, but maybe LOCK_DISCARDED, for example),
      so naming related packets and function as "LOCK_GRANTED"
      is not proper anymore. Here we rename them to "LOCK_ACK".
      No reviewer.
    • Xin Zhao's avatar
      Perf-optimize: support piggybacking LOCK on large RMA operations. · 4739df59
      Xin Zhao authored
      Originally we only allows LOCK request to be piggybacked
      with small RMA operations (all data can be fit in packet
      header). This brings communication overhead for larger
      operations since origin side needs to wait for the LOCK
      ACK before it can transmit data to the target.
      In this patch we add support of piggybacking LOCK with
      RMA operations with arbitrary size. Note that (1) this
      only works with basic datatypes; (2) if the LOCK cannot
      be satisfied, we temporarily buffer this operation on
      the target side.
      No reviewer.
  2. 13 Nov, 2014 1 commit
    • Xin Zhao's avatar
      Split shared request handler. · 88d34091
      Xin Zhao authored
      ReqHandler_GaccumLikeSendComplete is used for GACC-like operations,
      including GACC, CAS and FOP. Here we split it into following three
      It is convenient for us to add different actions in future for those
      three kinds of operations.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
  3. 11 Nov, 2014 1 commit
  4. 04 Nov, 2014 1 commit
    • Min Si's avatar
      Implement true request-based RMA operations. · 3e005f03
      Min Si authored
      There are two requests associated with each request-based
      operation: one normal internal request (req) and one newly
      added user request (ureq). We return ureq to user when
      request-based op call returns.
      The ureq is initialized with completion counter (CC) to 1
      and ref count to 2 (one is referenced by CH3 and another
      is referenced by user). If the corresponding op can be
      finished immediately in CH3, the runtime will complete ureq
      in CH3, and let user's MPI_Wait/Test to destroy ureq. If
      corresponding op cannot be finished immediately, we will
      first increment ref count to 3 (because now there are
      three places needed to reference ureq: user, CH3,
      progress engine). Progress engine will complete ureq when
      op is completed, then CH3 will release its reference during
      garbage collection, finally user's MPI_Wait/Test will
      destroy ureq.
      The ureq can be completed in following three ways:
      1. If op is issued and completed immediately in CH3
      (req is NULL), we just complete ureq before free op.
      2. If op is issued but not completed, we remember the ureq
      handler in req and specify OnDataAvail / OnFinal handlers
      in req to a newly added request handler, which will complete
      user reqeust. The handler is triggered at three places:
         2-a. when progress engine completes a put/acc req;
         2-b. when get/getacc handler completes a get/getacc req;
         2-c. when progress engine completes a get/getacc req;
      3. If op is not issued (i.e., wait for lock granted), the 2nd
      way will be eventually performed when such op is issued by
      progress engine.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
  5. 03 Nov, 2014 9 commits
    • Xin Zhao's avatar
      add original RMA PVARs back. · ed20cd37
      Xin Zhao authored
      Add some original RMA PVARs back to the new
      RMA infrastructure, including timing of packet
      handlers, op allocation and setting, window
      creation, etc.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
      Delete no longer needed code. · cc63b367
      Xin Zhao authored
      We made a huge change to RMA infrastructure and
      a lot of old code can be droped, including separate
      handlers for lock-op-unlock, ACCUM_IMMED specific
      code, O(p) data structure code, code of lazy issuing,
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
      Simplify PktHandler_FOP and PktHandler_FOPResp. · a42b916d
      Xin Zhao authored
      For FOP operation, all data can be fit into the packet
      header, so on origin side we do not need to send separate
      data packets, and on target side we do not need request
      handler, only packet handler is needed. Similar with FOP
      response packet, we can receive all data in FOP resp packet
      handler. This patch delete the request handler on target
      side and simplify packet handler on target / origin side.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
      Decrement Active Target counter at target side. · b73778ea
      Xin Zhao authored
      During PSCW, when there are active-message operations
      to be issued in Win_complete, we piggback a AT_COMPLETE
      flag with it so that when target receives it, it can
      decrement a counter on target side and detect completion
      when target counter reaches zero.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
      Separate request handler of PUT, ACC, GACC and rename them. · fe15ea26
      Xin Zhao authored
      Separate final request handler of PUT, ACC, GACC into three.
      Separate derived DT request handler of ACC and GACC into two.
      Renaming request handlers as follows:
      (1) Normal request handler: it is triggered on target side
          when all data from origin is received.
          It includes:
          ReqHandler_PutRecvComplete --- for PUT
          ReqHandler_AccumRecvComplete --- for ACC
          ReqHandler_GaccumRecvComplete --- for GACC
      (2) Derived DT request handler: it is triggered on target
          side when all derived DT info is recieved.
          It includes:
          ReqHandler_PutDerivedDTRecvComplete --- for PUT
          ReqHandler_AccumDerivedDTRecvComplete --- for ACC
          ReqHandler_GaccumDerivedDTRecvComplete --- for GACC
      (3) Reponse request handler: it is triggered on target
          side when sending back process is finished in GET-like
          It includes:
          ReqHandler_GetSendComplete --- for GET
          ReqHandler_GaccumLikeSendComplete --- for GACC, FOP, CAS
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
    • Xin Zhao's avatar
      Add nonblocking progress making functions. · ab058906
      Xin Zhao authored
      Progress making functions check if current
      synchronization is finished, change synchronization
      state if possible, and issue pending operations
      on window as many as possible.
      There are three granularity of progress making functions:
      per-target, per-window and per-process. Per-target
      routine is used in RMA routine functions (PUT/GET/ACC...)
      and single passive lock (Win_unlock, Win_flush, Win_flush_local);
      per-window routine is used in window-wide synchronization
      calls (Win_fence, Win_complete, Win_unlock_all,
      Win_flush_all, Win_flush_local_all), and per-process
      routine is used in progress engine.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
      Rename ACK packets in RMA. · ba1a400c
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      The packet type MPIDI_CH3_PKT_PT_RMA_DONE is used for ACK
      of FLUSH / UNLOCK packets. Here we rename it to
      MPIDI_CH3_PKT_FLUSH_ACK and modify the related functions
      and data structures.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
    • Xin Zhao's avatar
      Code refactoring to clean up the RMA code. · 61f952c7
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      Split RMA functionality into smaller files, and move functions
      to where they belong based on the file names.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
  6. 03 Sep, 2014 1 commit
    • Min Si's avatar
      Enabled SHM segments detection in MPI_Win_create · b58d4baf
      Min Si authored
      First, cache every SHM window created by Win_allocate or
      Win_allocate_shared into a global list, and unlink it in Win_free.
      Then, when user calls Win_create for a new window, check user specified
      buffer and comm. Enable local SHM communicaiton in the new window if it
      matches a cached SHM window. It is noted that all the shared resources
      are still freed by the original SHM window.
      Matching a SHM window must satisfy following two conditions:
      1. The new node comm is equal to, or a subset of the SHM node comm.
      (Note that in the other cases where two node comms are overlapped,
      although the overlapped processes could be logically shared, it is not
      supported for now. To support this, we need to fist modify the implementation
      of RMA operations in order to remember shared status per target but not
      just compare its node_id).
      2. The buffer is in the range of the SHM segment across local processes
      in original SHM window (a contigunous segment is mapped across local
      processes regardless of whether alloc_shared_noncontig is set).
      Resolves #2161
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
  7. 26 Aug, 2014 1 commit
  8. 25 Aug, 2014 1 commit
    • Wesley Bland's avatar
      Fix error case for MPIDI_Request_create_null_rreq · cf1240d6
      Wesley Bland authored
      For some reason, the error case code between MPIDI_Request_create_rreq and
      MPIDI_Request_create_null_rreq was different. This is odd, because both macros
      take FAIL_ as an argument which is executed directly in the error case of
      create_rreq, but not in null_req. This commit makes the two act the same and
      updates the only two calls to the function that existed in the code.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
  9. 31 Jul, 2014 2 commits
    • Wesley Bland's avatar
      Add MPI_Comm_revoke · 57f6ee88
      Wesley Bland authored
      MPI_Comm_revoke is a special function because it does not have a matching call
      on the "receiving side". This is because it has to act as an out-of-band,
      resilient broadcast algorithm. Because of this, in this commit, in addition to
      the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.),
      we add a new CH3 packet type that will handle revoking a communicator without
      involving a matching call from the MPI layer (similar to how RMA is currently
      The thing that must be handled most carefully when revoking a communicator is
      to ensure that a previously used context ID will eventually be returned to the
      pool of available context IDs and that after this occurs, no old messages will
      match the new usage of the context ID (for instance, if some messages are very
      slow and show up late). To accomplish this, revoke is implemented as an
      all-to-all algorithm. When one process calls revoke, it will send a message to
      all other processes in the communicator, which will trigger that process to
      send a message to all other processes, and so on. Once a process has already
      revoked its communicator locally, it won't send out another wave of messages.
      As each process receives the revoke messages from the other processes, it will
      track how many messages have been received. Once it has either received a
      revoke message or a message about a process failure for each other process, it
      will release its refcount on the communicator object. After the application
      has freed all of its references to the communicator (and all requests, files,
      etc. associated with it), the context ID will be returned to the available
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
    • Wesley Bland's avatar
      Add MPIDI_CH3U_Get_failed_group · 665ced28
      Wesley Bland authored
      This function will take a last_failed value and generate an MPID_Group. If the
      value is MPI_PROC_NULL, then it will parse the entire list. This function is
      exposed by MPID so this can be used by any functions that need the list of
      failed processes.
      This change necessitated changing the way the list of failed processes is
      retreived from PMI. Rather than allocating a char array on demand every time
      we get the list from PMI, this string is allocated at init time and freed at
      finalize time now. This means that we can cache the value to be used later for
      things like just querying the list of processes that we already know have
      failed, rather than also getting the new list (which is important for the
      failure_ack/get_acked semantics).
      Signed-off-by: default avatarJunchao Zhang <jczhang@mcs.anl.gov>
  10. 11 Apr, 2014 1 commit
  11. 15 Nov, 2013 1 commit
  12. 29 Oct, 2013 1 commit
  13. 26 Sep, 2013 1 commit
  14. 27 Aug, 2013 1 commit
  15. 01 Aug, 2013 5 commits
  16. 22 Apr, 2013 1 commit
    • Dave Goodell's avatar
      use C99-standard `__VA_ARGS__` · 9873c9a2
      Dave Goodell authored
      Use C99-standard `__VA_ARGS__` instead of a non-standard GCC extension
      that does the same thing.
      This version based on code review feedback from Pavan.
      No reviewer.
  17. 01 Apr, 2013 1 commit
    • Ralf Gunter's avatar
      Add per-communicator eager threshold support. · a3c816ac
      Ralf Gunter authored
      Message transfers now respect the communicator-specific threshold.  This
      change has not been carefully checked for impact on our shared-memory
      ping-pong latency.
      Reviewed-by: goodell
  18. 22 Feb, 2013 1 commit
    • James Dinan's avatar
      CH3 default shared memory window implementation · 8cbf6414
      James Dinan authored
      This adds a default shared memory window implementation for CH3 (used
      by e.g. sock), which works only for MPI_COMM_SELF (this is what the
      default comm_split_type provides).  This closes ticket #1666.
      Reviewer: apenya
  19. 21 Feb, 2013 3 commits
    • James Dinan's avatar
      Implemented lock op piggybacking for MODE_NOCHECK · 223fce45
      James Dinan authored
      When the MPI_MODE_NOCHECK assertion is given to a passive target lock
      operation, we defer acquisition of the lock and piggyback the request on
      the first RMA op to the target.  This eliminates a round-trip
      lock-request message.
      Reviewer: goodell
    • James Dinan's avatar
      RMA sync. piggybacking from origin->target · 4e67607f
      James Dinan authored
      This patch uses packet header flags to piggyback the unlock operation on other
      RMA operations.  For most operations, there is no net change.  However, FOP and
      GACC, unlock piggybacking was previously not implemented.
      Reviewer: goodell
    • James Dinan's avatar
      Consolidated RMA op finalization code · bba35589
      James Dinan authored
      This patch consolidates the synchronization and tracking of RMA operations into
      a single routine that is called whenever we complete an operation.  The only
      exception are lock-op-unlock operations that are completed from within the lock
      operation processing code.
      This code is pretty ugly, but it will get cleaner once packet flags are been
      Reviewer: goodell
  20. 17 Dec, 2012 1 commit
  21. 05 Nov, 2012 2 commits
  22. 25 Oct, 2012 1 commit
  23. 20 Oct, 2012 1 commit