1. 10 Jun, 2014 2 commits
  2. 06 Jun, 2014 1 commit
    • Michael Blocksome's avatar
      pamid: remove blocking shmem mutex; remove shmem CAS/FOP optimizations · e6ddea13
      Michael Blocksome authored
      The blocking pthread mutex in the shared memory window causes a deadlock
      on bgq - perhaps because the messaging state is not advanced while the
      thread is waiting for the mutex to be released.
      
      Removing this blocking mutex resolves the bgq failures:
      
       - rma/strided_putget_indexed_shared
       - rma/strided_getacc_indexed_shared
      
      With the calls to the blocking mutex removed the CAS and FOP functions
      are not atomic. Solution is to remove the shared memory optimization and
      instead use the common (network) code path.
      
      Removing these shared memory optimizations from CAS/FOP resolves the
      bgq hangs:
      
       - rma/mutex_bench_shared
      e6ddea13
  3. 05 Jun, 2014 2 commits
  4. 04 Jun, 2014 2 commits
  5. 03 Jun, 2014 3 commits
  6. 02 Jun, 2014 7 commits
  7. 31 May, 2014 2 commits
    • Huiwei Lu's avatar
      Adds two tests to MPI_Comm_idup · 7e44a0f1
      Huiwei Lu authored
      
      
      Two multi-threaded tests are added for MPI_Comm_idup in
      test/mpi/threads/comm: ctxidup.c and comm_idup.c. The former has passed
      the test but the latter still not (marked in testlist as xfail).
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      7e44a0f1
    • Huiwei Lu's avatar
      Fixes MPI_Comm_idup · 05eeccb5
      Huiwei Lu authored
      
      
      This patch fixes two related tickets:
      1. MPI_Comm_idup in multithreaded environments
      2. MPI_Comm_idup fails to create multiple communicators
      Because these two tickets are tightly coupled, so they are fixed in this
      single patch.
      
      The original code did not consider the multithreaded case and did not
      use progress engine in correct order when saving a copy of global mask
      to local thread.
      
      Following changes were made to implement the MPI_Comm_idup correctly:
      
      1. It shares the same global flag 'mask_in_use' with other communicator
      functions to protect access to context_mask. And use CONTEXTID lock to
      protext critical sections.
      
      2. It uses the same algorithm as multithreaded MPI_Comm_dup
      (multi-threaded vertion of MPIR_Get_contextid_sparse_group) to allocate
      a context id, but in a nonblocking way. In the case of conflicts, the
      algorithm needs to retry the allocation process again. In the
      nonblocking algorithm, 1) new entries are inserted to the end of
      schedule to replace the 'while' loop in MPI_Comm_dup algorithm; 2) all
      arguments passed to sched_get_cid_nonblock are saved to gcn_state in
      order to be called in the future; 3) in sched_cb_gcn_allocate_cid, if
      the first try failed, it will insert sched_cb_gcn_copy_mask to the
      schedule again.
      
      3. There is a subtle difference between INTRACOMM and INTERCOMM when
      duplicating a communicator.  They needed to be treated differently in
      current algorithm. Specifically, 1) when calling sched_get_cid_nonblock,
      the parameters are different; 2) updating newcommp->recvcontext_id in
      MPIR_Get_intercomm_contextid_nonblock has been moved to
      sched_cb_gcn_bcast because this should happen after
      sched_cb_gcn_allocate_cid has succeed.
      
      Fixes #1935
      Fixes #1913
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      05eeccb5
  8. 30 May, 2014 1 commit
    • Michael Blocksome's avatar
      pamid: fix request completion race condition · 6063f898
      Michael Blocksome authored
      In a multi-threaded environment where one thread is waiting for a
      request to complete, the instant the request completion counter is set
      to zero by a different thread the waiting thread will begin to free the
      request resources. The completing thread must not touch any memory from
      the old request object after setting the completion count to zero.
      
      See ticket #2103
      6063f898
  9. 29 May, 2014 2 commits
  10. 28 May, 2014 1 commit
  11. 27 May, 2014 5 commits
  12. 26 May, 2014 1 commit
    • Junchao Zhang's avatar
      Fix a perl script problem met with perl-5.8.8 · 3d3a4b3a
      Junchao Zhang authored
      To print out something like "$vec[$i]->base_addr", where @vec is an integer array,
      the old perl is confused by "->" and thinks it is an operator. In fact, we just want
      to print out "->base_addr" literally. Newer perl (e.g., 5.16.2) is fine with this syntax.
      
      Change to "$vec[$i]"."->base_addr" to avoid this problem.
      
      No review since F08 binding is experimental now.
      3d3a4b3a
  13. 23 May, 2014 11 commits