- 09 Mar, 2015 2 commits
-
-
Kenneth Raffenetti authored
Removes additional whitespace in testlist files to make them easier to manipulate with tools like sed. No reviewer.
-
Kenneth Raffenetti authored
Put tests that check if MPI correctly detects aliased buffers in collective operations into the errors section of the testsuite. Fixes #2211 Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
- 06 Mar, 2015 3 commits
-
-
Norio yamaguchi authored
-
Norio yamaguchi authored
-
Wesley Bland authored
No reviewer
-
- 05 Mar, 2015 4 commits
-
-
Although MPIDI_CH3I_progress_blocked is a variable only used in CH3, it was referenced in the ROMIO glue code. This caused a build problem when pamid is used as a device. This patch removed the reference to MPIDI_CH3I_progress_blocked, but it degrades the efficiency of MPIR_Ext_cs_yield_allfunc_if_progress_blocked() since we do not have a way to check if the progress engine is blocked for now (related to ticket #2202). For a better solution for ticket #2202, we need to fix a wait function of the extended generalized request. Fixes #2242 Signed-off-by:
Rob Latham <robl@mcs.anl.gov>
-
Huiwei Lu authored
comm_idup was caught failing on mpich-portals4 with configuration "intel,strict,ib”. It was not fully tested on portals4 because portals was added after comm_dup patch. On other platforms comm_idup seems to be OK. Ticket #2243 No reviewer
-
Junchao Zhang authored
Since MPI-3.1 has not been voted this time. Signed-off-by:
Antonio J. Pena <apenya@mcs.anl.gov>
-
Antonio J. Pena authored
-
- 04 Mar, 2015 31 commits
-
-
Antonio J. Pena authored
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Antonio J. Pena authored
This reverts commit 077346f6 . Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Antonio J. Pena authored
This reverts commit c2ea6afc . Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Antonio J. Pena authored
This reverts commit 38df8d2a . Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Antonio J. Pena authored
-
Antonio J. Pena authored
-
Antonio J. Pena authored
-
Wesley Bland authored
No reviewer
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In MPI standard, predefined datatype is called as basic type. It is better to make the name same with the standard in the code. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The implementations of sendNoncontig for intra-node communication in Nemesis and inter-node communication in network modules (except for TCP and SCIF) assume that req->dev.segment_first is zero and req->dev.segment_size is the size of data, which is not always true. If we stream an RMA operation and issue partial of derived data, req->dev.segment_first specifies the current starting location of the data and req->dev.segment_size specifies the current ending location of the data. Also, the data size should be (req->dev.segment_size - req->dev.segment_first). This patch corrects this issue in Nemesis and network modules. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The original implementation of ACC/GACC on SHM first allocates a temporary buffer which has the same data layout as the target data, copies the entire origin data to that temporary buffer, and then performs the ACC computation between the temporary buffer and the target buffer. The temporary buffer can use potentially large amount of memory. This patch fixes this issue as follows: (1) SHM ACC/GACC routines directly call do_accumulate_op() function, which requires the origin data to be in a 'packed manner'; (2) if the origin data is basic type, we directly perform do_accumulate_op() between origin buffer and target buffer; if the origin data is derived, we stream the origin data by copying partial of origin data into a packed streaming buffer and performing do_accumulate_op() between the streaming buffer and target buffer each time. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
For queued ACC/GACC data piggybacked with LOCK, we do not need to allocate the buffer for the entire operation, but only need to allocate a buffer with stream unit size. This patch fixes this issue. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
On target side, we always allocate a SRBuf with 256K, which equals to the size of stream unit, to receive ACC/GACC data. Note that in MPIDI_CH3U_Request_load_recv_iov(), for ACC/GACC operations, since we already use SRBuf to receive the data at beginning, we will not use another SRBuf here, in order to avoid one more memory copy. Also, we pass the stream_offset in the current RMA packet to the request struct (when receiving is not finished) and do_accumulate_op function (when receiving is finished). Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Originally, do_accumulate_op() is used to perform the ACC computation on target between data from origin side and data on the target window. It requires that the target side must first unpack the received origin data into the same data layout as the target data before calling this function, which may consume potentially large of memory. This patch fixes do_accumulate_op() function in the following aspects: (1) It requires that the origin data passed to the function must be "in a packed manner", which means it looks as if all basic type elements in the origin data is placed one by one. Note that the origin data is not necessarily contiguous, since we may use non-contiguous basic type. If the basic type is contiguous, then the origin data must be contiguous. (2) It adds a new function argument, stream_offset, which specifies a starting location in the target data. This allows the origin data to work with partial of target data with stream size. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
This patch adds req types for FOP operation, and calls FOP req handler after SRBuf is unpacked. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Add stream_offset area into ACC-related packets and request struct to remember current stream unit's starting position in the entire target data. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Add a counter in op struct to remember number of stream units that have already been issued. For example, when the first stream unit piggybacked with LOCK is issued out, we temporarily stop issuing the following units. After the origin receives the ACK from the target, it can continue to issue the following units. This counter helps avoid issuing the first unit again. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
For all stream units within one RMA operation, we only needs to piggyback flags for the first operation to the first stream unit, and piggyback flags for the last operation to the last stream unit. Note that for operations piggybacked with LOCK flag, we should just issue the first stream unit, and wait until we receive ACK from the target to decide if we continue to issue the following units, or re-transmit the first unit. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In this patch, we define the size of streaming unit the same as the SRBuf size (256 * 1024 bytes), and cut the ACC/GACC packet according to this size. The streaming unit always contains complete basic type data and does not contain partial basic type data. Note that we also increment the ref counter of the pointer to the derived datatype since multiple streaming units within one RMA operation will refer to it. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The stream version of issue_from_origin_buffer is used in ACC/GACC operations. It allows the user to stream the data by passing stream_offset and stream_size to the function. The normal version of issue_from_origin_buffer is used in other RMA operations. It issue all the data as a whole. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The original implementation of create_datatype can only generate a new datatype that describes 'dtype_info + dataloop + one data layout'. It does not support generating 'dtype_info + dataloop + multiple data layouts'. This patch makes create_datatype function to achieve that purpose. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In the request handler, we should use MPIDI_CH3U_Request_complete to complete user request instead of directly setting it to being completed. This is because when one operation is cut into several packets, we must wait until all packets to be completed to set the user request to be completed. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Because we may cut one RMA operation into multiple packets, and each packet needs a request object to track the completion, here we use a request array instead of single request in RMA operation structure. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Increment active_req_cnt when actually issuing the packet instead of issuing the operation, since we may cut one operation into multiple packets. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
In the original implementation, issue_from_origin_buffer is used to issue out one RMA packet. Since each RMA operation only has one packet, it just attaches the returned request pointer to the RMA operation structure. Now since we are going to cut one RMA operation into multiple stream packets, this function will be used to issue each streamed packets, and each RMA operation may have multiple requests. Therefore, we make this function returns the request pointer and let the caller store the request in the request array of op structure. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-