1. 17 Jun, 2015 1 commit
  2. 16 Jun, 2015 1 commit
  3. 14 Jun, 2015 1 commit
    • Min Si's avatar
      Expose AM flush ordering and issue per OP flush if unordered. · 5324a41f
      Min Si authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      This patch includes three changes:
      (1) Added netmod API get_ordering to allow netmod to expose the network
      ordering. A netmod may issue some packets via multiple connections in
      parallel if those packets (such as RMA) do not require ordering, and
      thus the packets may be unordered. This patch sets the network ordering
      in every existing netmod (tcp|mxm|ofi|portals|llc) to true, since all
      packets are sent orderly via one connection.
      (2) Nemesis exposes the window packet orderings such as AM flush
      ordering at init time. It supports ordered packets only when netmod
      supports ordered network.
      (3) If AM flush is ordered (flush must be finished after all previous
      operations), then CH3 RMA only requests FLUSH ACK on the last operation.
      Otherwise, CH3 must request per-OP FLUSH ACK to ensure all operations
      are remotely completed.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      5324a41f
  4. 30 May, 2015 2 commits
  5. 27 May, 2015 1 commit
  6. 20 May, 2015 1 commit
  7. 10 Apr, 2015 1 commit
    • Kenneth Raffenetti's avatar
      portals4: revert [722d85a4] and [d459c025] · 2f97f429
      Kenneth Raffenetti authored
      The 2 commits being reverted introduced a "safe" PtlMEAppend function
      that would call MPID_nem_ptl_poll to process some events in case there
      was no space to append the match list entry. However the poll function
      is not reentrant safe, which could lead to ordering problems.
      
      The increased list entry limit from [c6c0d6f6
      
      ] should prevent PTL_NO_SPACE
      errors from happening, except in the extreme case. If we still find we are
      hitting this error, a proper fix can be done in the Rportals layer.
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
      2f97f429
  8. 19 Nov, 2014 1 commit
  9. 14 Nov, 2014 1 commit
  10. 13 Nov, 2014 1 commit
  11. 12 Nov, 2014 4 commits
  12. 05 Nov, 2014 4 commits
  13. 04 Nov, 2014 1 commit
  14. 03 Nov, 2014 1 commit
    • Pavan Balaji's avatar
      Initial draft of flow-control in the portals4 netmod. · f4253c38
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      
      
      Portals4 by itself does not provide any flow-control.  This needs to
      be managed by an upper-layer, such as MPICH.  Before this patch we
      were relying on a bunch of unexpected buffers that were posted to the
      portals library to manage unexpected messages.  However, since portals
      asynchronously pulls out messages from the network, if the application
      is delayed, it might result in the unexpected buffers being filled out
      and the portal disabled.  This would cause MPICH to abort.
      
      In this patch, we implement an initial version of flow-control that
      allows us to reenable the portal when it gets disabled.  All this is
      done in the context of the "rportals" wrappers that are implemented in
      the rptl.* files.  We create an extra control portal that is only used
      by rportals.  When the primary data portal gets disabled, the target
      sends PAUSE messages to all other processes.  Once each process
      confirms that it has no outstanding packets on the wire (i.e., all
      packets have either been ACKed or NACKed), it sends a PAUSE-ACK
      message.  When the target receives PAUSE-ACK messages from all
      processes (thus confirming that the network traffic to itself has been
      quiesced), it reenables the portal and sends an UNPAUSE message to all
      processes.
      
      This patch still does not deal with origin-side resource exhaustion.
      This can happen, for example, if we run out of space on the event
      queue on the origin side.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      f4253c38
  15. 27 Aug, 2014 1 commit
  16. 07 Aug, 2014 2 commits
  17. 29 Oct, 2013 1 commit
  18. 17 Dec, 2012 1 commit
  19. 05 Nov, 2012 1 commit
  20. 26 Oct, 2012 1 commit
  21. 12 Oct, 2012 1 commit
  22. 10 Oct, 2012 1 commit
  23. 03 Oct, 2012 1 commit
  24. 11 Sep, 2012 1 commit
  25. 07 Sep, 2012 1 commit
  26. 29 Aug, 2012 1 commit
  27. 30 Jul, 2012 1 commit