- 17 Jun, 2015 1 commit
-
-
Kenneth Raffenetti authored
MPID_Segment_pack/unpack should be all we need to manipulate noncontig messages. Not sure why these were originally included. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 16 Jun, 2015 1 commit
-
-
Kenneth Raffenetti authored
No reviewer.
-
- 14 Jun, 2015 1 commit
-
-
This patch includes three changes: (1) Added netmod API get_ordering to allow netmod to expose the network ordering. A netmod may issue some packets via multiple connections in parallel if those packets (such as RMA) do not require ordering, and thus the packets may be unordered. This patch sets the network ordering in every existing netmod (tcp|mxm|ofi|portals|llc) to true, since all packets are sent orderly via one connection. (2) Nemesis exposes the window packet orderings such as AM flush ordering at init time. It supports ordered packets only when netmod supports ordered network. (3) If AM flush is ordered (flush must be finished after all previous operations), then CH3 RMA only requests FLUSH ACK on the last operation. Otherwise, CH3 must request per-OP FLUSH ACK to ensure all operations are remotely completed. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu> Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 30 May, 2015 2 commits
-
-
There are two APIs that are used to issue data with a request created and passed: iSendv() --- issue contiguous data; sendNoncontig_fn() --- issue non-contiguous data; In this patch, we modify the implementation of those two functions in nemesis and netmod (tcp/mxm/ptl) to make them issue the extended packet header stored in the request. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 27 May, 2015 1 commit
-
-
Kenneth Raffenetti authored
Two tag bits are reserved for error propagation. We need to make sure they are ignored by the network matching capabilities. Refs #2260 No reviewer.
-
- 20 May, 2015 1 commit
-
-
Kenneth Raffenetti authored
Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 10 Apr, 2015 1 commit
-
-
Kenneth Raffenetti authored
The 2 commits being reverted introduced a "safe" PtlMEAppend function that would call MPID_nem_ptl_poll to process some events in case there was no space to append the match list entry. However the poll function is not reentrant safe, which could lead to ordering problems. The increased list entry limit from [c6c0d6f6 ] should prevent PTL_NO_SPACE errors from happening, except in the extreme case. If we still find we are hitting this error, a proper fix can be done in the Rportals layer. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 19 Nov, 2014 1 commit
-
-
Kenneth Raffenetti authored
Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 14 Nov, 2014 1 commit
-
-
Antonio Pena Monferrer authored
Going from a macro to a function fixes the issue because of creating a copy of the pointer. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 13 Nov, 2014 1 commit
-
-
Kenneth Raffenetti authored
Helps clarity since we no longer use ACKs in the netmod code. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 12 Nov, 2014 4 commits
-
-
Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Full redesign, mainly of the functions in ptl_nm.c and the communications involving the "control" portal. Still some problems with flow control. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Kenneth Raffenetti authored
All MPI_Sends in the Portals4 netmod will cause some or all of the data to be sent eagerly to the receiver. Canceling a send means having to find the data in the unexpected message queue and removing it in order to preserve matching. Because the message queues exist at the netmod level, it needs its own cancel protocol. The protocol is modeled on a similar case in CH3, but with its own method for searching the unexpected queue. Custom netmod packet handlers are used to receive and process the control messages. Known Issue: Because we are using different PTs for the send and cancel message, it is possible the cancel request could arrive before the message being canceled. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 05 Nov, 2014 4 commits
-
-
The large send handler incorrectly assumed event ordering from portals. This could lead to a request being freed while pending events would still attempt to access it, causing a segfault or incorrect handler to execute. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In large message cases, when multiple get operations are issued, the data may arrive out-of-order back at the initiator. A counter is required to ensure all operations have completed. In the temporary buffer case, we simply wait for all the data to arrive, and unpack in one operation. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Large messages (either larger than max_msg_size or > MPID_IOV_LIMIT), will be packed into a temporary buffer. These need to be optimized. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
If a message is larger than the max_msg_size limit, issue multiple MEs for the remainder of the message. Completion events for the intermediate operations will be ignored. Only the final operation will trigger the event handler to tell MPI that communication is complete. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 04 Nov, 2014 1 commit
-
-
Kenneth Raffenetti authored
An EQ for origin events is useful for rate-limiting operations so that a process does not locally trigger a flow control event on its portal. We will implement the rate-limiting logic in the rportals layer. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
- 03 Nov, 2014 1 commit
-
-
Portals4 by itself does not provide any flow-control. This needs to be managed by an upper-layer, such as MPICH. Before this patch we were relying on a bunch of unexpected buffers that were posted to the portals library to manage unexpected messages. However, since portals asynchronously pulls out messages from the network, if the application is delayed, it might result in the unexpected buffers being filled out and the portal disabled. This would cause MPICH to abort. In this patch, we implement an initial version of flow-control that allows us to reenable the portal when it gets disabled. All this is done in the context of the "rportals" wrappers that are implemented in the rptl.* files. We create an extra control portal that is only used by rportals. When the primary data portal gets disabled, the target sends PAUSE messages to all other processes. Once each process confirms that it has no outstanding packets on the wire (i.e., all packets have either been ACKed or NACKed), it sends a PAUSE-ACK message. When the target receives PAUSE-ACK messages from all processes (thus confirming that the network traffic to itself has been quiesced), it reenables the portal and sends an UNPAUSE message to all processes. This patch still does not deal with origin-side resource exhaustion. This can happen, for example, if we run out of space on the event queue on the origin side. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 27 Aug, 2014 1 commit
-
-
Kenneth Raffenetti authored
PtlNIInit can optionally return the limitations of a network interface. Get these limits so we can account for things like the max_msg_size. Signed-off-by:
Antonio Pena Monferrer <apenya@mcs.anl.gov>
-
- 07 Aug, 2014 2 commits
-
-
Antonio Pena Monferrer authored
First working version. See #2152. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
Antonio Pena Monferrer authored
The out-of-interest bits must be zeroed-out to avoid them colliding with their neighbor bits. This is relevant In cases of special values, i.e., negative values such as MPI_*_ANY. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 29 Oct, 2013 1 commit
-
-
Pavan Balaji authored
Based on an Intel contributed patch. The idea is to use the bits from the cancelled field to extend the count, rather than increasing the count datatype itself. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov> Fixes to the bit manipulation based on feedback from Artem Yalozo @ Intel. Fixes to the naming convention based on feedback from Bill Gropp. Signed-off-by:
William Gropp <wgropp@illinois.edu>
-
- 17 Dec, 2012 1 commit
-
-
Darius Buntinas authored
[svn-r10770] removed channel_private field in VC and used MPIDI_CH3_VC_DECL macro which is overridden by channel. Reviewed by Dinan
-
- 05 Nov, 2012 1 commit
-
-
Darius Buntinas authored
-
- 26 Oct, 2012 1 commit
-
-
Darius Buntinas authored
-
- 12 Oct, 2012 1 commit
-
-
Darius Buntinas authored
-
- 10 Oct, 2012 1 commit
-
-
David Goodell authored
By setting "indent-tabs-mode:nil" we should hopefully begin to slowly squeeze out hard tabs from the source without a disruptive (to downstream projects) whitespace-fixing change. No reviewer.
-
- 03 Oct, 2012 1 commit
-
-
Darius Buntinas authored
-
- 11 Sep, 2012 1 commit
-
-
Darius Buntinas authored
-
- 07 Sep, 2012 1 commit
-
-
Darius Buntinas authored
-
- 29 Aug, 2012 1 commit
-
-
Darius Buntinas authored
-
- 30 Jul, 2012 1 commit
-
-
Darius Buntinas authored
-