1. 14 Jan, 2015 2 commits
    • Rob Latham's avatar
      use PATH_MAX instead of magic number · a30a4721
      Rob Latham authored
      
      
      User on OpenMPI list wanted to create a 259 character file.  shared file
      pointer name construction used the magic '256' value to construct a full
      path to the hidden shared file pointer file.  PATH_MAX already exists
      for this purpose, so use it.
      
      While there, found a few spots checking/setting PATH_MAX, so do that in
      one place
      
      Closes #2212
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      a30a4721
    • Rob Latham's avatar
      make ADIOI_Shfp_fname report errors · ed39c901
      Rob Latham authored
      
      
      Right now there's only one error condition: file name too long.  This
      change checks return codes of ADIOI_Strncpy and informs caller.
      Otherwise, really long names result in buffer overruns.
      
      See #2212
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      ed39c901
  2. 13 Jan, 2015 1 commit
    • Wesley Bland's avatar
      Remove ADI breakage introduced earlier · 6f646ca0
      Wesley Bland authored
      
      
      There was an accidental ADI breakage earlier when MPI level codes would
      query into the dev part of the MPID request object. This commit removes
      that breakage by adding a new macro into the mpiimpl.h file to portably
      check whether a request is anysource. For now, in pamid, this macro
      always evaluates to 0. This can easily be fixed by overwriting it in the
      pamid code, but since pamid doesn't support FT, it won't have any
      functional change either.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      6f646ca0
  3. 12 Jan, 2015 7 commits
    • Wesley Bland's avatar
      Change MPIDI_CH3I_Comm_AS_enabled to be MPID level · 8cbbcae4
      Wesley Bland authored
      
      
      This macro was used inside CH3 to determine if the communicator could be
      used for anysource communication. With the rewrite of the anysource
      fault tolerance logic, it is now necessary to use it at the MPI level.
      Because it is a macro and not a function, the macro is defined in
      mpiimple.h as (1) and then overwritten in the ch3 device. Future devices
      can also overwrite it if desired.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      8cbbcae4
    • Wesley Bland's avatar
      Handle anysource in blocking recv functions · 9f2db553
      Wesley Bland authored
      
      
      If a blocking recv function (MPI_Recv and MPI_Sendrecv) includes an
      MPI_ANY_SOURCE and there is a failure, handle it by cleaning up the
      request and returning MPIX_ERR_PROC_FAILED.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      9f2db553
    • Wesley Bland's avatar
      Allow MPIR_Request_complete to take a NULL request · f6cdb3c8
      Wesley Bland authored
      
      
      If the first argument is NULL, don't try to set it to MPI_REQUEST_NULL.
      For blocking functions that want to complete the MPID_Request object,
      this allows them to reuse the code.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      f6cdb3c8
    • Wesley Bland's avatar
      Handle anysource in the wait* functions · 0418d495
      Wesley Bland authored
      
      
      If a wait operation involves an anysource, we need to first check to
      make sure that they haven't been disabled. If they have been, convert
      the wait* function to a test* function to prevent deadlocking inside the
      progress engine.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      0418d495
    • Wesley Bland's avatar
      Break out of progress for anysource failures · 50d85e51
      Wesley Bland authored
      
      
      If a failure is detected, even if no request is actually complete, the
      completion counter will be incremented now as a way to give control back
      to the MPI layer to let it decide whether or not to continue.
      
      This gives the request completion functions a chance to see if they're
      waiting on an MPI_ANY_SOURCE request and if so, to return an error
      indicating that the completion function has a
      MPIX_ERR_PROC_FAILED_PENDING failure that the user needs to acknowledge.
      
      All of these functions should go into the progress engine at least once
      as a way to ensure that even if they will be returning an error, they'll
      at least give MPI a way to make progress and potentially still complete
      the request objects even if the user never acknowledges the failure.
      
      A follow on commit will add the functionality to keep the progress
      engine from getting stuck if a failure is discovered before entering the
      completion function.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      50d85e51
    • Wesley Bland's avatar
      Strip out pending ANY_SOURCE request handling · 7a785c84
      Wesley Bland authored
      
      
      The existing way that we handle non-blocking requests involving wildcard
      receive operations is incorrect. We're cancelling request operations and
      trying to recreate them later. In the meantime, it's messing with
      matching and makes it possible (likely?) that some messages that arrive
      will never be matched. A new way of handling this is coming next.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      7a785c84
    • Wesley Bland's avatar
      Don't free a request if it still pending · a96ac72e
      Wesley Bland authored
      
      
      If we had a failure that caused a request to be pending, we were freeing
      the request before calling the error handler. That caused segfaults. Now
      we switch the ordering of the two to avoid that.
      
      This also moves the assignment of the status_ptr to be a little earlier
      to avoid another segfault.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      a96ac72e
  4. 08 Jan, 2015 1 commit
  5. 19 Dec, 2014 1 commit
    • Paul Coffman's avatar
      barrier in close whenever shared files supported · ef1cf141
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      Currently in the MPI_File_close there is a barrier in place whenever the
      ADIO_SHARED_FP feature is enabled AND the ADIO_UNLINK_AFTER_CLOSE
      feature is disabled right before the code to close the shared file
      pointer and potentially unlink the shared file itself.  PE testing on
      GPFS revealed a situation using the non-collective
      MPI_File_read_shared/MPI_File_write_shared
      where based on this implementation all tasks needed to wait for all
      other tasks to complete processing before unlinking the shared file
      pointer or the open of the shared file pointer could fail.  This
      situation is illustrated as follows with the simplest example of 2 tasks
      that do this:
      MPI_File_Open
      MPI_File_set_view
      MPI_File_Read_shared
      MPI_File_close
      
      So both tasks call MPI_File_Read_shared at the same time which first
      does the ADIO_Get_shared_fp which does the file open with create mode on
      the shared file pointer.   Only 1 task can actually create the file, so
      there is a race to see who can get it done first.  If task 0 gets it
      created then he is the winner and goes on to use it, read the file and
      then MPI_File_close which then unlinks the shared file pointer first and
      then closes the output file.  Meanwhile, task 1 lost the race to create
      the file and is in error, the error handling in gpfs goes into effect
      and task 1 now just tries to open the file that task 0 created.  The
      problem is this error handling took longer that task 0 took to read and
      close the output file, so at the time when task 0 does the close he is
      the only process with a link since task 1 is still in the create file
      error handlilng code so therefore gpfs goes ahead and deletes the shared
      file pointer.  Then when the error handling code for task 1 does
      complete and he tries to do the open, the file is no longer there, so
      the open fails as does the subsequent read of the shared file pointer.
      Currently GPFS has the ADIO_UNLINK_AFTER_CLOSE  feature enabled, so the
      fix for this is to remove the additional condition of
      ADIO_UNLINK_AFTER_CLOSE  being disabled for the barrier in the close to
      be done.  Presumably this could be an issue for any parallel file system
      so this change is being done in the common code.
      
      See ticket #2214
      Signed-off-by: default avatarPaul Coffman <pkcoff@us.ibm.com>
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      ef1cf141
  6. 08 Dec, 2014 1 commit
  7. 05 Dec, 2014 1 commit
  8. 03 Dec, 2014 2 commits
    • Wesley Bland's avatar
      Fix typo in error code man page · 8672503d
      Wesley Bland authored
      No reviewer
      8672503d
    • James Dinan's avatar
      Fix error class buf in MPI_Error_add_code · 422b06d2
      James Dinan authored
      
      
      During error code creation, the error class was erroneously modified by
      applying ERROR_DYN_MASK when.  The dynamic bit is already set for
      user-defined error classes, so this bug had no effect in all existing
      MPICH tests.  However, when a predefined error class was passed during
      error code creation, it would be incorrectly marked as dynamic,
      resulting in an invalid result when the error class of a returned error
      code was returned via MPI_Error_class.
      Signed-off-by: default avatarWesley Bland <wbland@anl.gov>
      422b06d2
  9. 28 Nov, 2014 3 commits
  10. 26 Nov, 2014 2 commits
  11. 24 Nov, 2014 2 commits
    • Paul Coffman's avatar
      romio gpfs: select correct read buffer · 230c2df3
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      ROMIO GPFSMPIO_P2PCONTIG threaded read needs to toggle first read buffer
      
      When using both the GPFSMPIO_P2PCONTIG and GPFSMPIO_PTHREADIO
      optimizations there was a correctness bug when reading where for the
      first round the read buffer did not toggle to the two-phase buffer for
      the pthread reader, resulting in diseminating the data from the wrong
      buffer.  The fix is to do the toggle after the first read.
      Signed-off-by: default avatarPaul Coffman <pkcoff@us.ibm.com>
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      230c2df3
    • William Gropp's avatar
      Make ROMIO htmldocs update link file · e645371f
      William Gropp authored and Rob Latham's avatar Rob Latham committed
      
      
      Update the use of DOCTEXT to match the rest of MPICH, including adding
      -nolocation (drop the location of the source file from the documentation)
      and ensure that the mpi.cit file contains the I/O routines as well as
      the others (this file can be used to add links to the man pages in
      other documents).
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      e645371f
  12. 14 Nov, 2014 1 commit
  13. 13 Nov, 2014 2 commits
  14. 12 Nov, 2014 3 commits
    • Wesley Bland's avatar
      Correctly handle errflag in MPI collectives · 47f62b0c
      Wesley Bland authored
      
      
      The MPI collectives get and set the errflag used by the collective
      helper functions (MPIC_*). The possible values of the errflag changed,
      so the collective functions need to appropriately set this value using
      either MPIR_ERR_NONE (MPI_SUCCESS), MPIR_ERR_PROC_FAILED
      (MPIX_ERR_PROC_FAILED), or MPIR_ERR_OTHER (MPI_ERR_OTHER).
      
      This should allow collectives to correctly report process failures when
      they occur now, fixing the FT tests that use collectives (see #1945).
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      47f62b0c
    • Wesley Bland's avatar
      Change errflag to be an enum · 3850e6bf
      Wesley Bland authored
      
      
      The errflag value being used in the MPIC helper functions only
      propagated whether or not an error occurred. It did not contain any
      information about what kind of error occurred, which made returning the
      correct error code after a process failure impossible.
      
      This patch converts the binary value to an enum with three options:
      MPIR_ERR_NONE
      MPIR_ERR_PROC_FAILED
      MPIR_ERR_OTHER
      
      The original use of TRUE and false maps to MPIR_ERR_NONE and
      MPIR_ERR_OTHER.
      
      MPIR_ERR_PROC_FAILED indicates that the error occurred
      because of a process failure. It uses the new bit set aside from the tag
      space to track such information between processes.
      
      This change required modifying lots of function signatures and type
      declarations to use the new enum type, but these are actually not very
      intrusive changes and shouldn't be a problem going forward.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      3850e6bf
    • Wesley Bland's avatar
      Take a bit in the tag space for proc failure · 46f59276
      Wesley Bland authored
      
      
      We need to take another bit from the tag space to specify the difference
      between a generic failure and a process failure. This patch modifies the
      macros to handle this situation.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      46f59276
  15. 11 Nov, 2014 1 commit
  16. 06 Nov, 2014 2 commits
    • Wesley Bland's avatar
      Return request from IRECV even if failure · 5b0cfb3b
      Wesley Bland authored
      
      
      We will now return a request handle from MPI_IRECV even if there is a
      failure. The reason for this is because the ULFM spec says that even if
      the function returns MPIX_ERR_PROC_FAILED_PENDING, it still should
      provide a valid request that can be completed later.
      
      This doesn't cause a problem for other situations because the value of
      the request is undefined in that scenario so it's fine for it to be
      garbage.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      5b0cfb3b
    • Wesley Bland's avatar
      Check for pending any source ops · c2be640e
      Wesley Bland authored
      
      
      Before calling the progress engine, make sure none of the operations
      should return an error for MPIX_ERR_PROC_FAILED_PENDING. They would
      cause the progress engine to hang (potentially) so we can't enter it.
      Instead, mark the appropriate error codes and return immediately.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      c2be640e
  17. 05 Nov, 2014 1 commit
  18. 04 Nov, 2014 3 commits
    • Min Si's avatar
      Implement true request-based RMA operations. · 3e005f03
      Min Si authored
      
      
      There are two requests associated with each request-based
      operation: one normal internal request (req) and one newly
      added user request (ureq). We return ureq to user when
      request-based op call returns.
      
      The ureq is initialized with completion counter (CC) to 1
      and ref count to 2 (one is referenced by CH3 and another
      is referenced by user). If the corresponding op can be
      finished immediately in CH3, the runtime will complete ureq
      in CH3, and let user's MPI_Wait/Test to destroy ureq. If
      corresponding op cannot be finished immediately, we will
      first increment ref count to 3 (because now there are
      three places needed to reference ureq: user, CH3,
      progress engine). Progress engine will complete ureq when
      op is completed, then CH3 will release its reference during
      garbage collection, finally user's MPI_Wait/Test will
      destroy ureq.
      
      The ureq can be completed in following three ways:
      
      1. If op is issued and completed immediately in CH3
      (req is NULL), we just complete ureq before free op.
      
      2. If op is issued but not completed, we remember the ureq
      handler in req and specify OnDataAvail / OnFinal handlers
      in req to a newly added request handler, which will complete
      user reqeust. The handler is triggered at three places:
         2-a. when progress engine completes a put/acc req;
         2-b. when get/getacc handler completes a get/getacc req;
         2-c. when progress engine completes a get/getacc req;
      
      3. If op is not issued (i.e., wait for lock granted), the 2nd
      way will be eventually performed when such op is issued by
      progress engine.
      Signed-off-by: default avatarXin Zhao <xinzhao3@illinois.edu>
      3e005f03
    • Junchao Zhang's avatar
      Rename enum MPICH_WITHIN_MPI to MPICH_IN_INIT · 9ea630d0
      Junchao Zhang authored
      
      
      The new enum name is more descriptive to describle an MPIR_MPI_State_t
      that says MPICH is in initialization but not completely finished.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      9ea630d0
    • Junchao Zhang's avatar
      Make MPI_Initialized and friends thread-safe · 435ce800
      Junchao Zhang authored
      Implements MPI-Forum ticket 357 (https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/357
      
      )
      
      The ticket will be included in MPI-3.1, which adds thread-safety to MPI_INITIALIZED,
      MPI_FINALIZED, MPI_QUERY_THREAD, MPI_IS_THREAD_MAIN, MPI_GET_VERSION and
      MPI_GET_LIBRARY_VERSION.
      
      In MPICH, we make MPIR_Process.mpich_state atomic. After MPI is fully initialized, i.e.,
      in POST_INIT state, MPI_QUERY_THREAD, MPI_IS_THREAD_MAIN are inherently thread-safe.
      
      Fixes #2137
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      435ce800
  19. 03 Nov, 2014 2 commits
    • Xin Zhao's avatar
      Add blocking ops / targets aggressively cleanup functions. · 41a365ec
      Xin Zhao authored
      
      
      When we run out of resources for operations and targets,
      we need to make the runtime to complete some operations
      so that it can free some resources.
      
      For RMA operations, we implement by doing an internal
      FLUSH_LOCAL for one target and waiting for operation
      resources; for RMA targets, we implement by doing an
      internal FLUSH operation for one target and wait for
      target resources.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      41a365ec
    • Xin Zhao's avatar
      Embedding packet structure into RMA operation structure. · b1685139
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      We were duplicating information in the operation structure and in the
      packet structure when the message is actually issued.  Since most of
      the information is the same anyway, this patch just embeds a packet
      structure into the operation structure, so that we eliminate unnessary
      copy.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      b1685139
  20. 28 Oct, 2014 2 commits
    • Paul Coffman's avatar
      Assign large blocks first in ADIOI_GPFS_Calc_file_domains · c16466e3
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      For files that are less than the size of a gpfs block there seems to be
      an issue if successive MPI_File_write_at_all are called with proceeding
      offsets.  Given the simple case of 2 aggs, the 2nd agg/fd will be utilized,
      however the initial offset into the 2nd agg is distorted on the 2nd call
      to MPI_File_write_at_all because of the negative size of the 1st agg/fd
      because the offset info the 2nd agg/fd is influenced by the size of the
      first.  Simple solution is to reverse the default large block assignment so
      in the case where only 1 agg/fd will be used it will be the first.  By chance
      in the 2 agg situation this is what the GPFSMPIO_BALANCECONTIG
      optimization does and it does not have this problem.
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      c16466e3
    • Paul Coffman's avatar
      MP_IOTASKLIST error checking · 976272a7
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      
      
      PE users may manually specify the MP_IOTASKLIST for explicit aggregator
      selection.  Code needed to be added to verify that the user
      specification of aggregators were all valid.
      
      Do our best to maintain the old PE behavior of using as much of the
      correctly specified MP_IOTASKLIST as possible and issuing what it
      labeled error messages but were really warnings about the incorrect
      portions and functionally just ignoring it, unless none of it was usable
      in which case it fell back on the default.
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
      976272a7