1. 16 Apr, 2015 2 commits
  2. 15 Apr, 2015 3 commits
    • Pavan Balaji's avatar
      Increase bcast_full time limit. · 04060d1d
      Pavan Balaji authored
      We increased the number of cases the bcast test was running in
      [e01a20b6].  This is causing it to timeout on some platforms, where
      the test now seems to take close to 3 minutes.  This increased timeout
      should be sufficient on those platforms.
      No reviewer.
    • Charles J Archer's avatar
      OFI Netmod: Add CVAR enhancements for OFI provider selection · 1ed2b434
      Charles J Archer authored
       * Add MPIR_CVAR_OFI_USE_PROVIDER, which takes a string to desired
         provider name
    • Sameh Sharkawi's avatar
      PAMID: MPI_Allreduce/MPI_Reduce coredump w/ DOUBLE_INT datatype · e87c158f
      Sameh Sharkawi authored
      This commit includes multiple fixes:
       - Fixes for MPI_IN_PLACE checking. cudaGetPointerAttributes returns
         true on MPI_IN_PLACE which causes issues. Now we check on MPI_IN_PLACE
         before passing pointer to cuda.
       - Enabling PAMID geometries (in order to get to PAMID collectives) when
         MP_CUDA_AWARE=yes. This allows for intercepting CUDA buffer.
       - Disabling FCA when MP_CUDA_AWARE=yes if user enables FCA.
       - Copying user recv buffer into temp recv host buffer before collective
         starts, especially in MPI_IN_PLACE cases.
      (ibm) D203255
      Signed-off-by: default avatarTsai-Yang (Alan) Jea <tjea@us.ibm.com>
  3. 14 Apr, 2015 2 commits
    • Min Si's avatar
      Fixed the Fortran common symbol issue on Mac. · eb0e7712
      Min Si authored
      The linker on Darwin does not allow common symbols, thus libtool adds
      the -fno-common option by default for shared libraries. However, the
      common symbols defined in different shared libraries and object files
      still can not be treated as the same symbol.
      For example:
      with gfortran, the same common block in the shared libraries and the
      object files will have different memory locations separately;
      with ifort, the same common block in different shared libraries will get
      the same memory location but still get a different location in the
      object file.
      The -Wl,-commons,use_dylibs option asks linker to check dylibs for
      definitions and use them to replace tentative definitions(commons) from
      object files, thus it solves the issue of the common symbol mismatch
      between the object file and the dylibs (i.e., by setting the address of
      a common symbol to the place located in the first dylib that is linked
      with the object file and contains this symbol). It needs to be added
      only in the linking stage for the final executable file.
      The -flat-namespace option allows linker to unify the same common
      symbols in different dylibs. It needs to be added in linking stage for
      both the shared library and the final executable file.
      (see man ld for their definition)
      Although gfortran works fine by only adding -flat-namespace, and ifort
      works by only adding -Wl,-commons,use_dylibs, we should add both options
      here as a generic solution to make sure everything safe.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
    • Charles J Archer's avatar
  4. 11 Apr, 2015 1 commit
  5. 10 Apr, 2015 7 commits
    • Kenneth Raffenetti's avatar
      portals4: tuning · daf29e33
      Kenneth Raffenetti authored
      Changes the value of various static limits in the Portals4 netmod, based
      on experimentation results and suggestions from collaborators.
      1. Bump most ni_limits from 32K to 64K. These limits relate closely to
         queue depth. We can reasonably expect to support a queue depth
         of 64K.
      2. Limit issued origin events to 500. This translates to sending ~250
         operations to Portals at a time, which over IB is roughly the
         saturation point. TODO: turn this into a CVAR.
      3. Limit per target issued operations to 50. This will give the target a
         better chance to process events without being overwhelmed by a single
         process. TODO: turn this into a CVAR, also.
      4. Allocate more buffer space for incoming control messages. Observed
         results, especially with larger messages, showed that more buffer space
         cuts down on flow-control events.
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
    • Kenneth Raffenetti's avatar
      portals4: revert [722d85a4] and [d459c025] · 2f97f429
      Kenneth Raffenetti authored
      The 2 commits being reverted introduced a "safe" PtlMEAppend function
      that would call MPID_nem_ptl_poll to process some events in case there
      was no space to append the match list entry. However the poll function
      is not reentrant safe, which could lead to ordering problems.
      The increased list entry limit from [c6c0d6f6
      ] should prevent PTL_NO_SPACE
      errors from happening, except in the extreme case. If we still find we are
      hitting this error, a proper fix can be done in the Rportals layer.
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
    • Charles J Archer's avatar
    • Pavan Balaji's avatar
      Update .gitignore. · 5addea2c
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
    • Pavan Balaji's avatar
      Simplify the bcast test. · e01a20b6
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      The current number of combinations we are checking are too many,
      causing the test to take too long on some platforms.  This patch
      simplifies the test, so we build two versions of the test.  In the
      first version, we run only on COMM_WORLD but go through all datatypes.
      In the second version, we run on all communicators, but go through
      only a small subset of datatypes.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
    • Pavan Balaji's avatar
      Cosmetic changes to the bcast2 test. · be82b6a7
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      1. Renamed bcast2 to bcast.
      2. White-space cleanup for bcast.c
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
    • Pavan Balaji's avatar
      Get rid of bcast3.c · e7eab9df
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      This test is exactly the same as bcast2.  Originally these two tests
      were different, but over time they have become essentially the same.
      There's no point testing the same thing twice.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
  6. 09 Apr, 2015 1 commit
  7. 08 Apr, 2015 2 commits
  8. 07 Apr, 2015 14 commits
  9. 06 Apr, 2015 1 commit
  10. 03 Apr, 2015 7 commits