- 03 Nov, 2014 6 commits
-
-
We were duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure, so that we eliminate unnessary copy. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
The packet type MPIDI_CH3_PKT_PT_RMA_DONE is used for ACK of FLUSH / UNLOCK packets. Here we rename it to MPIDI_CH3_PKT_FLUSH_ACK and modify the related functions and data structures. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
We were adding an unnecessary dependency on VC structure declarations in the mpidpkt.h file. The required information in RMA lock queue is only the rank, but not actual VC. Here we replace VC with rank. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Split RMA functionality into smaller files, and move functions to where they belong based on the file names. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
Because we are going to rewrite the RMA infrastructure and many PVARs will no longer be used, here we temporarily remove all PVARs and will add needed PVARs back after new implementation is done. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 01 Nov, 2014 1 commit
-
-
The original implementation includes an optimization which allows Win_unlock for exclusive lock to return without waiting for remote completion. This relys on the assumption that window memory on target process will not be accessed by a third party until that target process finishes all RMA operations and grants the lock to other processes. However, this assumption is not correct if user uses assert MPI_MODE_NOCHECK. Consider the following code: P0 P1 P2 MPI_Win_lock(P1, NULL, exclusive); MPI_Put(X); MPI_Win_unlock(P1, exclusive); MPI_Send (P2); MPI_Recv(P0); MPI_Win_lock(P1, MODE_NOCHECK, exclusive); MPI_Get(X); MPI_Win_unlock(P1, exclusive); Both P0 and P2 issue exclusive lock to P1, and P2 uses assert MPI_MODE_NOCHECK because the lock should be granted to P2 after synchronization between P2 and P0. However, in the original implementation, GET operation on P2 might not get the updated value since Win_unlock on P0 return without waiting for remote completion. In this patch we delete this optimization. In Win_free, since every Win_unlock guarantees the remote completion, target process no longer needs to do additional counting works to detect target-side completion, but only needs to do a global barrier. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 30 Oct, 2014 1 commit
-
-
Xin Zhao authored
No reviewer.
-
- 20 Oct, 2014 1 commit
-
-
Pavan Balaji authored
We were not setting the function states correctly in a bunch of functions. Modifications by Wesley to split up big commit. Signed-off-by:
Wesley Bland <wbland@anl.gov> Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 01 Oct, 2014 1 commit
-
-
Xin Zhao authored
at_completion_counter is used to indicate if all Active Target operations have completed on this target. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 28 Sep, 2014 1 commit
-
- 23 Sep, 2014 1 commit
-
-
Xin Zhao authored
The original implementation of FENCE and PSCW does not guarantee the remote completion of issued-out RMA operations when MPI_Win_complete and MPI_Win_fence returns. They only guarantee the local completion of issued-out operations and the completion of coming-in operations. This is not correct if we try to get updated values on target side using synchronizations with MPI_MODE_NOCHECK. Here we modify it by making runtime wait for ACKs from all targets before returning from MPI_Win_fence and MPI_Win_complete. Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 03 Sep, 2014 1 commit
-
-
Min Si authored
First, cache every SHM window created by Win_allocate or Win_allocate_shared into a global list, and unlink it in Win_free. Then, when user calls Win_create for a new window, check user specified buffer and comm. Enable local SHM communicaiton in the new window if it matches a cached SHM window. It is noted that all the shared resources are still freed by the original SHM window. Matching a SHM window must satisfy following two conditions: 1. The new node comm is equal to, or a subset of the SHM node comm. (Note that in the other cases where two node comms are overlapped, although the overlapped processes could be logically shared, it is not supported for now. To support this, we need to fist modify the implementation of RMA operations in order to remember shared status per target but not just compare its node_id). 2. The buffer is in the range of the SHM segment across local processes in original SHM window (a contigunous segment is mapped across local processes regardless of whether alloc_shared_noncontig is set). Resolves #2161 Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 27 Aug, 2014 1 commit
-
-
Norio Yamaguchi authored
After one thread finishes processing all operations in the ops list, a new RMA operation may be enqueued by another thread in MPID_Progress_wait(). In such case, it has not got issued yet and we should avoid processing it at end of synchronization calls. This situation occurred when running test/mpi/threads/rma/multirma.c Signed-off-by:
Pavan Balaji <balaji@anl.gov>
-
- 26 Aug, 2014 1 commit
-
-
Also move device specific comm structure components to 'dev' to clean up the naming a bit. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 25 Aug, 2014 1 commit
-
-
Wesley Bland authored
For some reason, the error case code between MPIDI_Request_create_rreq and MPIDI_Request_create_null_rreq was different. This is odd, because both macros take FAIL_ as an argument which is executed directly in the error case of create_rreq, but not in null_req. This commit makes the two act the same and updates the only two calls to the function that existed in the code. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
- 31 Jul, 2014 4 commits
-
-
Wesley Bland authored
MPI_Comm_revoke is a special function because it does not have a matching call on the "receiving side". This is because it has to act as an out-of-band, resilient broadcast algorithm. Because of this, in this commit, in addition to the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.), we add a new CH3 packet type that will handle revoking a communicator without involving a matching call from the MPI layer (similar to how RMA is currently implemented). The thing that must be handled most carefully when revoking a communicator is to ensure that a previously used context ID will eventually be returned to the pool of available context IDs and that after this occurs, no old messages will match the new usage of the context ID (for instance, if some messages are very slow and show up late). To accomplish this, revoke is implemented as an all-to-all algorithm. When one process calls revoke, it will send a message to all other processes in the communicator, which will trigger that process to send a message to all other processes, and so on. Once a process has already revoked its communicator locally, it won't send out another wave of messages. As each process receives the revoke messages from the other processes, it will track how many messages have been received. Once it has either received a revoke message or a message about a process failure for each other process, it will release its refcount on the communicator object. After the application has freed all of its references to the communicator (and all requests, files, etc. associated with it), the context ID will be returned to the available pool. Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
Wesley Bland authored
The collectively active field wasn't doing anything anymore so it's been removed. This was a remnant from a previous FT proposal. Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
Wesley Bland authored
This commit adds the new functions MPI(X)_COMM_FAILURE_ACK and MPI(X)_COMM_FAILURE_GET_ACKED. These two functions together allow the user to get the group of failed processes. Most of the implementation for this is pushed into the MPID layer since some systems won't support this (PAMI). The existing function MPIDI_CH3U_Check_for_failed_procs has been modified to give back the group of acknowledged failed processes. There is an inefficiency here in that the list of failed processes is retrieved from PMI and parsed every time the user calls both failure_ack and get_acked, but this means we don't have to try to cache the list that comes back from PMI (which could potentially be expensive, but would have some cost even in the failure-free case). This commit adds a failed to the MPID_Comm structure. There is now a field called last_ack_rank. This is a single integer that stores the last acknowledged failure for this communicator which is used to determine when to stop parsing when getting back the list of acknowledged failed processes. Lastly, this commit includes a test to make sure that all of the above works (test/mpi/ft/failure_ack). This tests that a failure is appropriately included in the failed group and excluded if the failure was not previously acknowledged. Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
Wesley Bland authored
This function will take a last_failed value and generate an MPID_Group. If the value is MPI_PROC_NULL, then it will parse the entire list. This function is exposed by MPID so this can be used by any functions that need the list of failed processes. This change necessitated changing the way the list of failed processes is retreived from PMI. Rather than allocating a char array on demand every time we get the list from PMI, this string is allocated at init time and freed at finalize time now. This means that we can cache the value to be used later for things like just querying the list of processes that we already know have failed, rather than also getting the new list (which is important for the failure_ack/get_acked semantics). Signed-off-by:
Junchao Zhang <jczhang@mcs.anl.gov>
-
- 22 Jul, 2014 2 commits
-
-
Pavan Balaji authored
This reverts commit 9443bde4.
-
- Added cancel_recv and cancel_send netmod calls under ENABLE_COMM_OVERRIDES - Extended MPIDI_CH3I_comm structure with netmode_comm field (this field can store netmod context information related communicator as an example: mxm stores mxm_mq_h value) Change-Id: If89860d44840313bce6f7403190faec302c1bafc Signed-off-by:
Igor Ivanov <Igor.Ivanov@itseez.com>
-
- 21 Jul, 2014 4 commits
-
-
Pavan Balaji authored
We were using defines instead of enum to represent the same class of flags. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Pavan Balaji authored
We were using an enum for packet types and used an int16_t for the storage. Instead we should directly use the enum as the storage. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Pavan Balaji authored
This is to help with debugging. Zero is too common a value, and is often set automatically by the system if not initialized. Starting at a different value helps us catch uninitialized cases more easily. We pick "42" as our magic number as it is the answer to the ultimate question of life, the Universe, and everything. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
Pavan Balaji authored
Bad placement of commas was making indent very unhappy. Signed-off-by:
Wesley Bland <wbland@anl.gov>
-
- 18 Jul, 2014 1 commit
-
-
Pavan Balaji authored
This reverts commit 274a5a70.
-
- 17 Jul, 2014 1 commit
-
-
Pavan Balaji authored
We were creating duplicating information in the operation structure and in the packet structure when the message is actually issued. Since most of the information is the same anyway, this patch just embeds a packet structure into the operation structure. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-
- 11 Apr, 2014 1 commit
-
-
Antonio J. Pena authored
Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 23 Mar, 2014 1 commit
-
-
The constant MPIDI_TAG_UB is used in only one place at the moment, in the initialization of ch3 (source:src/mpid/ch3/src/mpid_init.c@4b35902a#L131). The problem is that the value which is being set (MPIR_Process.attrs.tag_ub) is set differently in pamid (INT_MAX). This leads to weird results when we set apart a bit in the tag space for failure propagation in non-blocking collectives (see #2008). Since this value isn't being referenced anywhere else, there doesn't seem to be a use for it and it's just leading to confusion. To avoid this, here we remove this value and just set MPIR_Process.attrs.tag_ub to INT_MAX in both ch3 and pamid. See #2009 Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 26 Feb, 2014 1 commit
-
-
Pavan Balaji authored
Simply ran the new ./maint/check_copyright.bash script. Fixes #2032. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 27 Jan, 2014 2 commits
-
-
Wesley Bland authored
No reviewer
-
Resets MPIDI_TAG_UB back to 0x7fffffff. This value was changed a while back, but the change should have happened at the MPI layer instead of the CH3 layer. This resets the value to allow CH3 to use the tag space. Instead, the value is now set in the MPI layer during initthread. This means that it will be safe regardless of the device being used. This prevents a collision that was occurring on the pamid device where the values for MPIR_TAG_ERROR_BIT and the MPIR_Process.attr.tagged_coll_mask values were the same. Fixes #2008 Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 17 Dec, 2013 1 commit
-
-
Junchao Zhang authored
Fixes #1962 Signed-off-by: Junchao Zhang<jczhang@mcs.anl.gov> (Reviewed by Bill Gropp)
-
- 15 Nov, 2013 2 commits
-
-
Antonio J. Pena authored
Fixes the following warnings: PGC-W-0095-Type cast required for this conversion (./src/mpid/ch3/include/mpidrma.h: 703) PGC-W-0095-Type cast required for this conversion (./src/mpid/ch3/include/mpidrma.h: 864) Signed-off-by:
Wesley Bland <wbland@mcs.anl.gov>
-
Antonio J. Pena authored
Added support for Ibsend and persistent sends, and fixed all other cases by clearing out the dgb-next field of send requests. Closes #1932. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov>
-
- 31 Oct, 2013 1 commit
-
-
Also includes random fixes to `-Wshorten-64-to-32` warnings which might need to be teased out. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 29 Oct, 2013 1 commit
-
-
Pavan Balaji authored
Based on an Intel contributed patch. The idea is to use the bits from the cancelled field to extend the count, rather than increasing the count datatype itself. Signed-off-by:
Ken Raffenetti <raffenet@mcs.anl.gov> Fixes to the bit manipulation based on feedback from Artem Yalozo @ Intel. Fixes to the naming convention based on feedback from Bill Gropp. Signed-off-by:
William Gropp <wgropp@illinois.edu>
-
- 26 Oct, 2013 1 commit
-
-
New code should use interfaces provided by new MPI_T impl. Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
- 27 Sep, 2013 1 commit
-
-
Pavan Balaji authored
Optimize the case where the origin and target both use basic datatypes. In this case, we assume that the data is aligned correctly for the appropriate datatype and perform a direct assignment instead of a memory copy. Signed-off-by:
Xin Zhao <xinzhao3@illinois.edu>
-