1. 14 Jan, 2015 1 commit
  2. 08 Jan, 2015 1 commit
  3. 19 Dec, 2014 1 commit
    • Paul Coffman's avatar
      barrier in close whenever shared files supported · ef1cf141
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      Currently in the MPI_File_close there is a barrier in place whenever the
      ADIO_SHARED_FP feature is enabled AND the ADIO_UNLINK_AFTER_CLOSE
      feature is disabled right before the code to close the shared file
      pointer and potentially unlink the shared file itself.  PE testing on
      GPFS revealed a situation using the non-collective
      MPI_File_read_shared/MPI_File_write_shared
      where based on this implementation all tasks needed to wait for all
      other tasks to complete processing before unlinking the shared file
      pointer or the open of the shared file pointer could fail.  This
      situation is illustrated as follows with the simplest example of 2 tasks
      that do this:
      MPI_File_Open
      MPI_File_set_view
      MPI_File_Read_shared
      MPI_File_close
      
      So both tasks call MPI_File_Read_shared at the same time which first
      does the ADIO_Get_shared_fp which does the file open with create mode on
      the shared file pointer.   Only 1 task can actually create the file, so
      there is a race to see who can get it done first.  If task 0 gets it
      created then he is the winner and goes on to use it, read the file and
      then MPI_File_close which then unlinks the shared file pointer first and
      then closes the output file.  Meanwhile, task 1 lost the race to create
      the file and is in error, the error handling in gpfs goes into effect
      and task 1 now just tries to open the file that task 0 created.  The
      problem is this error handling took longer that task 0 took to read and
      close the output file, so at the time when task 0 does the close he is
      the only process with a link since task 1 is still in the create file
      error handlilng code so therefore gpfs goes ahead and deletes the shared
      file pointer.  Then when the error handling code for task 1 does
      complete and he tries to do the open, the file is no longer there, so
      the open fails as does the subsequent read of the shared file pointer.
      Currently GPFS has the ADIO_UNLINK_AFTER_CLOSE  feature enabled, so the
      fix for this is to remove the additional condition of
      ADIO_UNLINK_AFTER_CLOSE  being disabled for the barrier in the close to
      be done.  Presumably this could be an issue for any parallel file system
      so this change is being done in the common code.
      
      See ticket #2214
      Signed-off-by: default avatarPaul Coffman <pkcoff@us.ibm.com>
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      ef1cf141
  4. 05 Dec, 2014 1 commit
  5. 24 Nov, 2014 2 commits
    • Paul Coffman's avatar
      romio gpfs: select correct read buffer · 230c2df3
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      ROMIO GPFSMPIO_P2PCONTIG threaded read needs to toggle first read buffer
      
      When using both the GPFSMPIO_P2PCONTIG and GPFSMPIO_PTHREADIO
      optimizations there was a correctness bug when reading where for the
      first round the read buffer did not toggle to the two-phase buffer for
      the pthread reader, resulting in diseminating the data from the wrong
      buffer.  The fix is to do the toggle after the first read.
      Signed-off-by: default avatarPaul Coffman <pkcoff@us.ibm.com>
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      230c2df3
    • William Gropp's avatar
      Make ROMIO htmldocs update link file · e645371f
      William Gropp authored and Rob Latham's avatar Rob Latham committed
      
      
      Update the use of DOCTEXT to match the rest of MPICH, including adding
      -nolocation (drop the location of the source file from the documentation)
      and ensure that the mpi.cit file contains the I/O routines as well as
      the others (this file can be used to add links to the man pages in
      other documents).
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      e645371f
  6. 14 Nov, 2014 1 commit
  7. 13 Nov, 2014 2 commits
  8. 28 Oct, 2014 2 commits
    • Paul Coffman's avatar
      Assign large blocks first in ADIOI_GPFS_Calc_file_domains · c16466e3
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      For files that are less than the size of a gpfs block there seems to be
      an issue if successive MPI_File_write_at_all are called with proceeding
      offsets.  Given the simple case of 2 aggs, the 2nd agg/fd will be utilized,
      however the initial offset into the 2nd agg is distorted on the 2nd call
      to MPI_File_write_at_all because of the negative size of the 1st agg/fd
      because the offset info the 2nd agg/fd is influenced by the size of the
      first.  Simple solution is to reverse the default large block assignment so
      in the case where only 1 agg/fd will be used it will be the first.  By chance
      in the 2 agg situation this is what the GPFSMPIO_BALANCECONTIG
      optimization does and it does not have this problem.
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      c16466e3
    • Paul Coffman's avatar
      MP_IOTASKLIST error checking · 976272a7
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      PE users may manually specify the MP_IOTASKLIST for explicit aggregator
      selection.  Code needed to be added to verify that the user
      specification of aggregators were all valid.
      
      Do our best to maintain the old PE behavior of using as much of the
      correctly specified MP_IOTASKLIST as possible and issuing what it
      labeled error messages but were really warnings about the incorrect
      portions and functionally just ignoring it, unless none of it was usable
      in which case it fell back on the default.
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      976272a7
  9. 24 Oct, 2014 1 commit
  10. 20 Oct, 2014 2 commits
    • Paul Coffman's avatar
      fix failure to update status in p2pcontig case · c2ce2188
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      ADIOI_GPFS_WriteStridedColl and ADIOI_GPFS_ReadStridedColl need to call
      MPIR_Status_set_bytes when GPFSMPIO_P2PCONTIG=1.
      
      When the GPFSMPIO_P2PCONTIG optimization is set, the code path for
      ADIOI_GPFS_WriteStridedColl and ADIOI_GPFS_ReadStridedColl returns
      before MPIR_Status_set_bytes is called.  Duplicate the call to
      MPIR_Status_set_bytes in the GPFSMPIO_P2PCONTIG code path.
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      c2ce2188
    • Rob Latham's avatar
      romio: small formatting fix for compiler warnings · 410ba24a
      Rob Latham authored
      Update a debug-only print string to accomodate recent updates to the
      type of the length parameter.
      
      No reviewer
      410ba24a
  11. 17 Oct, 2014 1 commit
  12. 01 Oct, 2014 2 commits
  13. 26 Sep, 2014 1 commit
  14. 18 Sep, 2014 1 commit
  15. 16 Sep, 2014 3 commits
  16. 03 Sep, 2014 3 commits
  17. 02 Sep, 2014 1 commit
  18. 22 Aug, 2014 3 commits
  19. 11 Aug, 2014 4 commits
  20. 07 Aug, 2014 2 commits
    • Rob Latham's avatar
      HINDEXED_BLOCK is not quite an INDEXED_BLOCK · 0e675b02
      Rob Latham authored
      
      
      Someone (Mohamad Chaarawi <chaarawi@hdfgroup.org>) finally used
      HINDEXED_BLOCK and  discovered that ROMIO's HINDEXED_BLOCK
      implementation was.... incomplete.  or at least untested.
      
      - ADIOI_Count_contiguous_blocks simply aborted when fed a type it did
        not know about.  hindexed_block blocks are counted same as
        indexed_block blocks.
      
      - But, the stride between hindexed_block blocks is given by the explicit
        addresses.  indexed_block, on the other hand, computes a stride based
        on type.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      0e675b02
    • Paul Coffman's avatar
      Reduce P2PContig communication with local compute · bfc09241
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      P2PContig additional optimizations for performance improvement to
      exchange some communication during aggregation for local computation -
      most helpful at scale
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      bfc09241
  21. 05 Aug, 2014 2 commits
  22. 28 Jul, 2014 1 commit
  23. 19 Jul, 2014 1 commit
  24. 18 Jul, 2014 1 commit