1. 26 Feb, 2015 1 commit
    • Sangmin Seo's avatar
      Fix async progress problem in NBC I/O. · 6523ad97
      Sangmin Seo authored
      
      
      When the async progress thread blocked the progress engine and yielded
      control, if a thread started waiting inside a wait routine, e.g.,
      ADIOI_GEN_iwc_wait_fn, of NBC I/O implementation, a deadlock happened.
      The thread waiting continuously called MPI_Test to make progress, but
      the progress engine did not make progress because it was blocked due to
      the async progress thread.  The async progress thread tried to acquire
      the lock, but the waiting thread did not release the lock because it
      did not finish the wait routine.  Thus, it was a deadlock. This patch
      fixes this deadlock problem by forcing the waiting thread to yield if
      the progress engine has been blocked by another thread.
      
      Fixes #2202
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      6523ad97
  2. 14 Jan, 2015 1 commit
  3. 19 Dec, 2014 1 commit
    • Paul Coffman's avatar
      barrier in close whenever shared files supported · ef1cf141
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      Currently in the MPI_File_close there is a barrier in place whenever the
      ADIO_SHARED_FP feature is enabled AND the ADIO_UNLINK_AFTER_CLOSE
      feature is disabled right before the code to close the shared file
      pointer and potentially unlink the shared file itself.  PE testing on
      GPFS revealed a situation using the non-collective
      MPI_File_read_shared/MPI_File_write_shared
      where based on this implementation all tasks needed to wait for all
      other tasks to complete processing before unlinking the shared file
      pointer or the open of the shared file pointer could fail.  This
      situation is illustrated as follows with the simplest example of 2 tasks
      that do this:
      MPI_File_Open
      MPI_File_set_view
      MPI_File_Read_shared
      MPI_File_close
      
      So both tasks call MPI_File_Read_shared at the same time which first
      does the ADIO_Get_shared_fp which does the file open with create mode on
      the shared file pointer.   Only 1 task can actually create the file, so
      there is a race to see who can get it done first.  If task 0 gets it
      created then he is the winner and goes on to use it, read the file and
      then MPI_File_close which then unlinks the shared file pointer first and
      then closes the output file.  Meanwhile, task 1 lost the race to create
      the file and is in error, the error handling in gpfs goes into effect
      and task 1 now just tries to open the file that task 0 created.  The
      problem is this error handling took longer that task 0 took to read and
      close the output file, so at the time when task 0 does the close he is
      the only process with a link since task 1 is still in the create file
      error handlilng code so therefore gpfs goes ahead and deletes the shared
      file pointer.  Then when the error handling code for task 1 does
      complete and he tries to do the open, the file is no longer there, so
      the open fails as does the subsequent read of the shared file pointer.
      Currently GPFS has the ADIO_UNLINK_AFTER_CLOSE  feature enabled, so the
      fix for this is to remove the additional condition of
      ADIO_UNLINK_AFTER_CLOSE  being disabled for the barrier in the close to
      be done.  Presumably this could be an issue for any parallel file system
      so this change is being done in the common code.
      
      See ticket #2214
      Signed-off-by: default avatarPaul Coffman <pkcoff@us.ibm.com>
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      ef1cf141
  4. 14 Nov, 2014 1 commit
  5. 13 Nov, 2014 1 commit
  6. 24 Oct, 2014 1 commit
  7. 22 Aug, 2014 1 commit
  8. 07 Jul, 2014 1 commit
  9. 23 May, 2014 1 commit
  10. 03 Apr, 2014 1 commit
  11. 15 Mar, 2014 1 commit
  12. 10 Mar, 2014 2 commits
  13. 18 Dec, 2013 1 commit
  14. 10 Dec, 2013 1 commit
  15. 31 Oct, 2013 2 commits
  16. 29 Oct, 2013 1 commit
  17. 04 Oct, 2013 1 commit
  18. 21 Jun, 2013 1 commit
    • Joe Ratterman's avatar
      Trap unsupported read- and write-conversions. · f8bedf1e
      Joe Ratterman authored and Rob Latham's avatar Rob Latham committed
      This change was originally part of an extensive 'PE Code Merge' commit
      that was a sqaushed commit of approximately 50 defect and feature code
      changes.
      
      (ibm) 17b9e8973047abdb461f0303bc6c15509aef160b
      f8bedf1e
  19. 14 May, 2013 1 commit
    • Rob Latham's avatar
      avoid hang if subset of procs have invalid INFO · 470667cb
      Rob Latham authored
      In response to IBM's integration ticket 1822 (but reworked), turn
      MPIO_CHECK_INFO into a collective macro; exchange result of info
      inspection with all procs.  Now a bogus info on one proc won't cause a
      process hang.
      470667cb
  20. 12 Apr, 2013 2 commits
  21. 07 Apr, 2013 1 commit
  22. 05 Feb, 2013 1 commit
    • Dave Goodell's avatar
      tt#1754: fix warnings in ROMIO external32 code · d8eec549
      Dave Goodell authored
      This doesn't fix any of the serious bugs or inefficiencies present in
      the current external32 implementation.  But it at least fixes some very
      valid warnings related to const-ness and passing incorrect pointer
      types.
      
      References ticket #1754
      
      Reviewed-by: robl
      d8eec549
  23. 20 Dec, 2012 3 commits
    • David Goodell's avatar
      [svn-r10801] ROMIO: error checking for MPI_Comm and MPI_Info objects · 47754bc5
      David Goodell authored
      1). There was error checking on the comm object in
          MPI_Comm_test_inter(comm, &flag); So if the return value of
          MPI_Comm_test_inter is not MPI_SUCCESS, then the comm is either an
          invalid MPI_Comm handle or intercommunicator handle.
      2). A new macro MPIO_CHECK_INFO is added into adioi_error.h. It will
          call MPI_Info_dup, unless there is no more memory space left , as
          long as the info object is valid, this function will return
          MPI_SUCCESS; or it will return an error code. So by checking the
          return value of MPI_Info_dup, we could achieve the purpose of
          checking MPI_Info handles.
      
      Based on patch 0006 from the second round of IBM's error checking
      patches.  Replaces 0009 from the first round and augments r10637.
      47754bc5
    • David Goodell's avatar
      [svn-r10800] ROMIO: check for consistent prealloc sizes · 208d90e6
      David Goodell authored
      By using MPI_Allreduce to get the maximum value and minimum value of all
      sizes, when the two values are identical, all processes have same values
      of size.  The problem of checking the sizes with MPI_Bcast is that the
      root will pass the check while the others not.
      
      Based on patch 0004 from the second round of IBM's error checking
      patches.
      208d90e6
    • David Goodell's avatar
      [svn-r10799] ROMIO: check whether datatypes are committed · 011f107a
      David Goodell authored
      This only works when ROMIO is built with MPICH.
      
      Based on patch 0001 from the second round of IBM's error checking
      patches.  Replaces 0008 from the first round.
      011f107a
  24. 21 Nov, 2012 1 commit
  25. 20 Nov, 2012 2 commits
  26. 06 Nov, 2012 1 commit
  27. 05 Nov, 2012 2 commits
  28. 24 Oct, 2012 1 commit
  29. 23 Oct, 2012 2 commits
  30. 11 Oct, 2012 1 commit
  31. 10 Oct, 2012 1 commit
  32. 25 Sep, 2012 1 commit