1. 04 Mar, 2015 1 commit
    • Xin Zhao's avatar
      Correct the usage of req's segment_first and segment_size in sendNonContig · 5132e070
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      The implementations of sendNoncontig for intra-node communication in
      Nemesis and inter-node communication in network modules (except for
      TCP and SCIF) assume that req->dev.segment_first is zero and
      req->dev.segment_size is the size of data, which is not always true.
      If we stream an RMA operation and issue partial of derived data,
      req->dev.segment_first specifies the current starting location of the data
      and req->dev.segment_size specifies the current ending location of the data.
      Also, the data size should be (req->dev.segment_size - req->dev.segment_first).
      This patch corrects this issue in Nemesis and network modules.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
  2. 26 Feb, 2015 4 commits
  3. 19 Nov, 2014 2 commits
  4. 13 Nov, 2014 2 commits
  5. 12 Nov, 2014 3 commits
  6. 06 Nov, 2014 1 commit
  7. 03 Nov, 2014 1 commit
    • Pavan Balaji's avatar
      Initial draft of flow-control in the portals4 netmod. · f4253c38
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      Portals4 by itself does not provide any flow-control.  This needs to
      be managed by an upper-layer, such as MPICH.  Before this patch we
      were relying on a bunch of unexpected buffers that were posted to the
      portals library to manage unexpected messages.  However, since portals
      asynchronously pulls out messages from the network, if the application
      is delayed, it might result in the unexpected buffers being filled out
      and the portal disabled.  This would cause MPICH to abort.
      In this patch, we implement an initial version of flow-control that
      allows us to reenable the portal when it gets disabled.  All this is
      done in the context of the "rportals" wrappers that are implemented in
      the rptl.* files.  We create an extra control portal that is only used
      by rportals.  When the primary data portal gets disabled, the target
      sends PAUSE messages to all other processes.  Once each process
      confirms that it has no outstanding packets on the wire (i.e., all
      packets have either been ACKed or NACKed), it sends a PAUSE-ACK
      message.  When the target receives PAUSE-ACK messages from all
      processes (thus confirming that the network traffic to itself has been
      quiesced), it reenables the portal and sends an UNPAUSE message to all
      This patch still does not deal with origin-side resource exhaustion.
      This can happen, for example, if we run out of space on the event
      queue on the origin side.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
  8. 20 Oct, 2014 1 commit
  9. 23 Sep, 2014 1 commit
  10. 26 Oct, 2012 2 commits