1. 24 Apr, 2015 5 commits
  2. 23 Apr, 2015 3 commits
  3. 22 Apr, 2015 6 commits
    • Pavan Balaji's avatar
      Fix arbitrary poll count before yielding. · abb56764
      Pavan Balaji authored
      
      
      Instead of polling for an arbitrarily decided number of times in the
      progress engine before yielding, we now moved the yielding
      intelligence to the threading layer.  The threading layer can keep
      track of other threads that are waiting to enter the critical section
      and only yield if another thread is waiting.  In this way, if no
      thread is waiting to get the lock, the main thread never yields.  At
      the same time, if another thread is waiting to get a lock, there is no
      delay in yielding.
      
      This change, however, introduces possible deadlocks. If a thread enters
      MPIDI_CH3I_progress with is_blocking unset, it may set the
      MPIDI_CH3I_progress_blocked flag and then will yield the critical section.
      Another thread may enter with is_blocking set, find the flag
      MPIDI_CH3I_progress_blocked set, and block in the conditional variable.
      The first thread will wake up and leave the progress engine without
      emitting any signal to wake up the second thread which may sleep forever.
      
      A simple fix is to yield the critical section only if the current thread
      entered the progress engine with is_blocking set.
      Signed-off-by: default avatarHalim Amer <aamer@anl.gov>
      abb56764
    • Pavan Balaji's avatar
      Initial version of the intelligent thread yielding. · b39314a5
      Pavan Balaji authored
      
      
      Instead of a simple thread yield, this patch adds some additional
      information to the yield about how many threads are waiting for it.
      When a thread tries to acquire a lock, they increment a counter.  When
      a thread needs to yield, it can check this counter to see how many
      threads are waiting to get the lock.  If there are no threads waiting,
      the yield can be skipped.
      
      This patch contains various changes to make that happen:
      
      1. We modify the mutex object to maintain additional information on
      the number of queued threads.
      
      2. We improve the yield call to include the unlock and lock as well,
      since it needs to decide whether to do the unlock/lock based on how
      many other threads are queued up.
      Signed-off-by: default avatarHalim Amer <aamer@anl.gov>
      b39314a5
    • Pavan Balaji's avatar
      Cleanup threaded progress. · f385680e
      Pavan Balaji authored
      
      
      The nemesis progress engine was written in a way so that if one thread
      is inside a progress engine, other threads cannot enter the receive
      progress.  They can enter the send progress in some cases.  There
      doesn't seem to be a good reason for this behavior.  This patch
      combines this so threads would simply return for nonblocking
      operations and wait for a signal before entering the progress engine
      for blocking operations.
      Signed-off-by: default avatarHalim Amer <aamer@anl.gov>
      f385680e
    • Sangmin Seo's avatar
      Fix wrong alias names. · 5fb750b9
      Sangmin Seo authored
      
      
      __attribute__((weak,alias())) should have function names starting with
      PMPI, but some MPIX functions, such as MPIX_Grequest_class_create,
      MPIX_Grequest_class_allocate, MPIX_Grequest_start, MPIX_Mutex_create,
      MPIX_Mutex_free, MPIX_Mutex_lock, and MPIX_Mutex_unlock, had the same
      alias names as those of original functions. This patch fixes wrong
      alias names in __attribute__((weak,alias())) and also fixes some wrong
      alias names in #pragma.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      5fb750b9
    • Antonio J. Pena's avatar
      e60c9375
    • Kenneth Raffenetti's avatar
      mxm: fix anysource_matched · 1be5fc49
      Kenneth Raffenetti authored
      
      
      The return value of anysource_matched should be the actual result
      of the cancel operation. If the result is uncancelable, i.e. already
      matched, then CH3 will let the netmod message win and move on to the
      other requests in the queue. When the completion for the unsuccessfully
      canceled message comes in, we process it like normal.
      Reviewed-by: default avatarIgor Ivanov <Igor.Ivanov@itseez.com>
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
      1be5fc49
  4. 21 Apr, 2015 2 commits
  5. 20 Apr, 2015 9 commits
  6. 17 Apr, 2015 9 commits
  7. 16 Apr, 2015 2 commits
  8. 15 Apr, 2015 3 commits
    • Pavan Balaji's avatar
      Increase bcast_full time limit. · 04060d1d
      Pavan Balaji authored
      We increased the number of cases the bcast test was running in
      [e01a20b6].  This is causing it to timeout on some platforms, where
      the test now seems to take close to 3 minutes.  This increased timeout
      should be sufficient on those platforms.
      
      No reviewer.
      04060d1d
    • Charles J Archer's avatar
      OFI Netmod: Add CVAR enhancements for OFI provider selection · 1ed2b434
      Charles J Archer authored
       * Rename MPIR_CVAR_DUMP_PROVIDERS to MPIR_CVAR_OFI_DUMP_PROVIDERS
       * Add MPIR_CVAR_OFI_USE_PROVIDER, which takes a string to desired
         provider name
      1ed2b434
    • Sameh Sharkawi's avatar
      PAMID: MPI_Allreduce/MPI_Reduce coredump w/ DOUBLE_INT datatype · e87c158f
      Sameh Sharkawi authored
      
      
      This commit includes multiple fixes:
       - Fixes for MPI_IN_PLACE checking. cudaGetPointerAttributes returns
         true on MPI_IN_PLACE which causes issues. Now we check on MPI_IN_PLACE
         before passing pointer to cuda.
       - Enabling PAMID geometries (in order to get to PAMID collectives) when
         MP_CUDA_AWARE=yes. This allows for intercepting CUDA buffer.
       - Disabling FCA when MP_CUDA_AWARE=yes if user enables FCA.
       - Copying user recv buffer into temp recv host buffer before collective
         starts, especially in MPI_IN_PLACE cases.
      
      (ibm) D203255
      Signed-off-by: default avatarTsai-Yang (Alan) Jea <tjea@us.ibm.com>
      e87c158f
  9. 14 Apr, 2015 1 commit
    • Min Si's avatar
      Fixed the Fortran common symbol issue on Mac. · eb0e7712
      Min Si authored
      
      
      The linker on Darwin does not allow common symbols, thus libtool adds
      the -fno-common option by default for shared libraries. However, the
      common symbols defined in different shared libraries and object files
      still can not be treated as the same symbol.
      For example:
      with gfortran, the same common block in the shared libraries and the
      object files will have different memory locations separately;
      with ifort, the same common block in different shared libraries will get
      the same memory location but still get a different location in the
      object file.
      
      The -Wl,-commons,use_dylibs option asks linker to check dylibs for
      definitions and use them to replace tentative definitions(commons) from
      object files, thus it solves the issue of the common symbol mismatch
      between the object file and the dylibs (i.e., by setting the address of
      a common symbol to the place located in the first dylib that is linked
      with the object file and contains this symbol). It needs to be added
      only in the linking stage for the final executable file.
      
      The -flat-namespace option allows linker to unify the same common
      symbols in different dylibs. It needs to be added in linking stage for
      both the shared library and the final executable file.
      (see man ld for their definition)
      
      Although gfortran works fine by only adding -flat-namespace, and ifort
      works by only adding -Wl,-commons,use_dylibs, we should add both options
      here as a generic solution to make sure everything safe.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      eb0e7712