1. 27 Aug, 2014 1 commit
  2. 30 Jul, 2014 1 commit
    • Xin Zhao's avatar
      Change default values of CVARs in RMA code. · 522c2688
      Xin Zhao authored
      
      
      Change default values of MPIR_CVAR_CH3_RMA_NREQUEST_NEW_THRESHOLD,
      MPIR_CVAR_CH3_RMA_NREQUEST_VISIT_THRESHOLD and
      MPIR_CVAR_CH3_RMA_NREQUEST_TEST_THRESHOLD for better performance.
      
      This experience is from running graph500 on single node on BLUES
      and breadboard machine, with 16 or 8 processes and problem size is
      2^16 to 2^20. We make the number of new requests since the last
      attempt to complete pending requests to 0, so that the issuing code
      will always try to complete pending requests. We also disable the
      threshold of completed requests in GC and make the threshold of
      tested requests in GC to be 100, so that we have opportunity to
      find more pending requests in GC.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      522c2688
  3. 18 Jul, 2014 1 commit
  4. 17 Jul, 2014 1 commit
    • Pavan Balaji's avatar
      Simplified RMA_Op structure. · 274a5a70
      Pavan Balaji authored
      
      
      We were creating duplicating information in the operation structure
      and in the packet structure when the message is actually issued.
      Since most of the information is the same anyway, this patch just
      embeds a packet structure into the operation structure.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      274a5a70
  5. 13 Jul, 2014 2 commits
  6. 11 Jul, 2014 1 commit
  7. 08 Jul, 2014 3 commits
    • Pavan Balaji's avatar
      Add a memory barrier at the end of the Win_complete function. · c244ba49
      Pavan Balaji authored
      
      
      We need to add a memory barrier at the end of the Win_complete
      function, so that shared memory operations issued during the
      start/complete epoch are visible to other processes on the node.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      c244ba49
    • Pavan Balaji's avatar
      Add barrier-like semantics in PSCW for shared-memory operations. · 39361532
      Pavan Balaji authored
      
      
      When a window uses direct shared-memory operations that are
      immediately issued internally, we cannot avoid synchronization during
      the start operation.  This patch synchronizes processes that reside on
      the same node during start and the processes that do not reside on the
      same node during complete.
      
      Fixes #2041.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      39361532
    • Xin Zhao's avatar
      Fix bug: add barrier semantic in FENCE for SHM ops. · 1c07dbaf
      Xin Zhao authored
      
      
      When SHM is allocated for RMA window, operations are completed
      eagerly (as soon as they are posted by the user), therefore we
      need barrier semantics in the FENCE that opens an epoch to prevent
      SHM ops happening on target process before that target process
      starts an epoch.
      
      Note that we need memory barrier before and after synchronization
      calls in both FENCEs that starts and ends an epoch to guarantee the
      ordering of load/store operations with synchronizations.
      
      See #2041.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1c07dbaf
  8. 30 Jun, 2014 6 commits
    • Xin Zhao's avatar
      Add CVAR (# of tested reqs) to control when to stop in RMA GC function · 283319f5
      Xin Zhao authored
      
      
      When cleanning up completed requests, the original RMA implementation
      keeps traversing the op list until it finds a completed request. This
      may cause significant O(N) overhead if there is no completed request
      in the list. We add a CVAR to let the user control the number of visited
      requests as a fixed value.
      
      Note that the default value is set to (-1) in order to be in accordance
      with the performance of orignal implementation.
      
      Note that in garbage collection function, if runtime finds a chain
      of completed RMA requests, it will temporarily ignore this CVAR
      and try to find continuous completed requests as many as possible,
      until it meets an incomplete request.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      283319f5
    • Xin Zhao's avatar
      Add CVAR (# of completed reqs) to control when to stop in RMA GC function · dda458a1
      Xin Zhao authored
      
      
      Add a CVAR to let the user specify the threshold for number of
      completed requests the runtime finds before it stops trying to
      find more completed requests in garbage collection function. It
      may make the runtime to find more completed requests, but may also
      cause significant overhead due to visiting too many requests.
      
      Note that the default value is set to 1 in order to be in
      accordance with the performance of original implementation.
      
      Note that in garbage collection function, if runtime finds a chain
      of completed RMA requests, it will temporarily ignore this CVAR
      and try to find continuous completed requests as many as possible,
      until it meets an incomplete request.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      dda458a1
    • Xin Zhao's avatar
      Simplify RMA requests completion function · 7dbdc413
      Xin Zhao authored
      
      
      Originally rma_list_complete() function traverses the
      operation list to clean up completed requests, which is
      what rma_list_gc() is doing now. So we simplify
      rma_list_complete() function by deleting the code of
      traversing loop and just invoking rma_list_gc() in
      rma_list_complete().
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      7dbdc413
    • Xin Zhao's avatar
      Separate progress engine code from garbage collection · da7700a0
      Xin Zhao authored
      
      
      Currently the code of poking progress engine to complete
      requests and the code of cleanning up completed requests
      are mixed up in one function rma_list_gc(), which is not
      a clear code structure. We move the code of poking progress
      engine out of rma_list_gc() and encapsule the code into
      a separate function so that rma_list_gc() only does garbage
      collection work.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      da7700a0
    • Xin Zhao's avatar
      Rename RMA request gc and complete function · 73f6a4b3
      Xin Zhao authored
      
      
      Rename RMAListPartialComplete to rma_list_gc
      and rename RMAListComplete to rma_list_complete.
      Declare both functions as inline function.
      Add error handling code for both functions.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      73f6a4b3
    • Xin Zhao's avatar
      Rename static functions in RMA code · 33b7d251
      Xin Zhao authored
      
      
      Static functions should not have name starting with prefix "MPIDI_CH3I_".
      We delete those prefix in function names as well as in state names.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      33b7d251
  9. 22 May, 2014 1 commit
    • Wesley Bland's avatar
      Make handling of request cleanup more uniform · 1e171ff6
      Wesley Bland authored
      
      
      There are quite a few places where the request cleanup is done via:
      
      MPIU_Object_set_ref(req, 0);
      MPIDI_CH3_Request_destroy(req);
      
      when it should be:
      
      MPID_Request_release(req);
      
      This makes the handling more uniform so requests are cleaned up by releasing
      references rather than hitting them with the destroy hammer.
      
      Fixes #1664
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      1e171ff6
  10. 19 Dec, 2013 1 commit
  11. 17 Dec, 2013 1 commit
  12. 15 Nov, 2013 4 commits
    • Xin Zhao's avatar
      Fix #1701 - cleanup code for zero-size data transfer. · dc9275be
      Xin Zhao authored
      
      
      Delete code for zero-size data transfer in packet handlers
      of Put/Accumulate/Accumulate_Immed/Get_AccumulateResp/GetResp/
      LockPutUnlock/LockAccumUnlock, because they are redundant.
      
      (Note that packet handlers of LockPutUnlock and LockAccumUnlock
      are for single operation optimization in passive RMA)
      
      Zero-size data transfer has already been handled when issuing
      RMA operations (L146, L258, L369 in src/mpid/ch3/src/ch3u_rma_ops.c
      and L50 in src/mpid/ch3/src/ch3u_rma_acc_ops.c). RMA operation
      routines will directly exit if data size is zero.
      Signed-off-by: default avatarWesley Bland <wbland@mcs.anl.gov>
      dc9275be
    • Antonio J. Pena's avatar
      Revert Fixed --enabled-debuginfo segfaults tt#1932 · b9531d3d
      Antonio J. Pena authored
      This reverts commit 676c29f9.
      b9531d3d
    • Antonio J. Pena's avatar
      Fixed --enabled-debuginfo segfaults tt #1932 · 676c29f9
      Antonio J. Pena authored
      
      
      Addresses #1932. Includes:
        - MPI_Bsend/MPI_Ibsend
        - Several collectives
        - Some RMA operations
        - MPI_Dist_graph_create
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      676c29f9
    • Xin Zhao's avatar
      Fix ticket-1960: delete redundant MPIU_Assert. · 0b126663
      Xin Zhao authored
      
      
      MPIU_Assert at L2311 checks if rma_ops_list is empty before exiting
      MPIDI_Win_flush. It causes /test/mpi/threads/rma/multirma to fail
      because while one thread is executing the loop of poking progress
      engine at L2293 ~ L2302, another thread may enqueue new RMA operations
      to rma_ops_list.
      
      rma_ops_list has already been checked for empty before exiting
      MPIDI_CH3I_Do_passive_target_rma (L2724) to ensure that all enqueued
      operations are issued out, therefore it does not need to be checked
      again here.
      Signed-off-by: default avatarWesley Bland <wbland@mcs.anl.gov>
      0b126663
  13. 31 Oct, 2013 1 commit
  14. 26 Oct, 2013 1 commit
  15. 26 Sep, 2013 11 commits
  16. 08 Aug, 2013 1 commit
  17. 01 Aug, 2013 1 commit
    • Xin Zhao's avatar
      When judging if origin and target process are on the same node, using... · c7bc4694
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      When judging if origin and target process are on the same node, using vc->node_id flag instead of vc->ch.is_local flag.
      
      Flag 'is_local' is not correct because it is defined in nemesis, not in CH3.
      Flag 'node_id' is defined in CH3.
      
      Note that for ch3:sock, even if origin and target are on the same node, they are not within the same SHM region.
      Currently ch3:sock is filtered out by checking shm_allocated flag first. In future we need to figure out a way to
      check if origin and target are within the same "SHM comm".
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      c7bc4694
  18. 28 Jul, 2013 2 commits