1. 12 Nov, 2014 2 commits
    • Antonio J. Pena's avatar
      Fix Portals4 RMA · 50461978
      Antonio J. Pena authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      
      
      Full redesign, mainly of the functions in ptl_nm.c and the
      communications involving the "control" portal. Still some
      problems with flow control.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      50461978
    • Kenneth Raffenetti's avatar
      portals4: implement cancel send · b56f4f1d
      Kenneth Raffenetti authored
      
      
      All MPI_Sends in the Portals4 netmod will cause some or all of the data to be
      sent eagerly to the receiver. Canceling a send means having to find the data in
      the unexpected message queue and removing it in order to preserve matching.
      Because the message queues exist at the netmod level, it needs its own cancel
      protocol.
      
      The protocol is modeled on a similar case in CH3, but with its own method
      for searching the unexpected queue. Custom netmod packet handlers are used to
      receive and process the control messages.
      
      Known Issue:
        Because we are using different PTs for the send and cancel message, it is
        possible the cancel request could arrive before the message being canceled.
      Signed-off-by: default avatarAntonio Pena Monferrer <apenya@mcs.anl.gov>
      b56f4f1d
  2. 05 Nov, 2014 1 commit
  3. 04 Nov, 2014 2 commits
  4. 03 Nov, 2014 1 commit
    • Pavan Balaji's avatar
      Initial draft of flow-control in the portals4 netmod. · f4253c38
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      
      
      Portals4 by itself does not provide any flow-control.  This needs to
      be managed by an upper-layer, such as MPICH.  Before this patch we
      were relying on a bunch of unexpected buffers that were posted to the
      portals library to manage unexpected messages.  However, since portals
      asynchronously pulls out messages from the network, if the application
      is delayed, it might result in the unexpected buffers being filled out
      and the portal disabled.  This would cause MPICH to abort.
      
      In this patch, we implement an initial version of flow-control that
      allows us to reenable the portal when it gets disabled.  All this is
      done in the context of the "rportals" wrappers that are implemented in
      the rptl.* files.  We create an extra control portal that is only used
      by rportals.  When the primary data portal gets disabled, the target
      sends PAUSE messages to all other processes.  Once each process
      confirms that it has no outstanding packets on the wire (i.e., all
      packets have either been ACKed or NACKed), it sends a PAUSE-ACK
      message.  When the target receives PAUSE-ACK messages from all
      processes (thus confirming that the network traffic to itself has been
      quiesced), it reenables the portal and sends an UNPAUSE message to all
      processes.
      
      This patch still does not deal with origin-side resource exhaustion.
      This can happen, for example, if we run out of space on the event
      queue on the origin side.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      f4253c38
  5. 29 Oct, 2014 1 commit
    • Kenneth Raffenetti's avatar
      portals4: set reasonable interface limits · 36d11a13
      Kenneth Raffenetti authored
      
      
      Set reasonable limits for maximum unexpected headers and EQs at init
      time. We accomplish this with a pre-init stage where we fill in a limits
      struct with the system defaults, increase certain values (if they are not
      set already in the environment), then do the real init.
      
      If the "desired" limits structure had a way to allow default values for
      limits we don't care about, the pre-init stage could go away.
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
      36d11a13
  6. 23 Oct, 2014 1 commit
  7. 22 Oct, 2014 1 commit
  8. 27 Aug, 2014 1 commit
  9. 07 Aug, 2014 1 commit
  10. 31 Jul, 2014 1 commit
  11. 09 Jul, 2014 1 commit
  12. 29 Jun, 2014 1 commit
  13. 26 Oct, 2013 1 commit
  14. 17 Dec, 2012 1 commit
  15. 26 Oct, 2012 1 commit
  16. 25 Oct, 2012 1 commit
  17. 24 Oct, 2012 1 commit
  18. 12 Oct, 2012 1 commit
  19. 03 Oct, 2012 3 commits
  20. 11 Sep, 2012 1 commit
  21. 07 Sep, 2012 1 commit
  22. 29 Aug, 2012 1 commit
  23. 30 Jul, 2012 2 commits