1. 22 May, 2014 9 commits
    • Wesley Bland's avatar
      Make handling of request cleanup more uniform · 1e171ff6
      Wesley Bland authored
      There are quite a few places where the request cleanup is done via:
      MPIU_Object_set_ref(req, 0);
      when it should be:
      This makes the handling more uniform so requests are cleaned up by releasing
      references rather than hitting them with the destroy hammer.
      Fixes #1664
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
    • Su Huang's avatar
      pamid: task 0 hang in MPI_Init() if MP_PRINTENV=yes · 6b5993af
      Su Huang authored
      In MPIDI_Print_mpenv(), when calling MPIR_Gather_impl to gather all MP environment variables
      from all tasks in a job, the errflag parameter was not initialized to 0 before it was
      passed to the routine:
             mpi_errno = MPIR_Gather_impl(&sender, sizeof(MPIDI_printenv_t), MPI_BYTE, gatherer,
                                          sizeof(MPIDI_printenv_t),MPI_BYTE, 0,comm_ptr,
                                          (int *) &errflag);
      To process the Gather collective call, each task issued MPIC_Recv, MPIC_Send and MPIC_Wait.
      MPIC_Send() sends a message with MPIR_GATHER_TAG (defined as 0x3). Since the routine had a
      non-zero errflag passed in,
          if (*errflag && MPIR_CVAR_ENABLE_COLL_FT_RET)
      the 30th bit of the tag was set to 1 :(1 << 30) (MPIR_TAG_ERROR_BIT). Therefore, the tag was
      changed from 0x3 to 0x40000003.
      On task 1, a message with this modified tag was sent to task 0. When the message arrived at
      task 0, the receive for the message with the original tag of 0x3 had been posted.
      However, the tag in the arrived message differed from the tag from the posted receive.
      So no match was found for the arrived message which was the root cause of the hang.
      MPIR_TAG_SET_ERROR_BIT was added for MPI 3.0 (pe rbrew and beyond) which explains why
      the job does not fail with prior releases.
       (ibm) D197745
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
    • Nysal Jan K.A's avatar
      pamid: Allow message sizes greater than 4GB · 98b5e585
      Nysal Jan K.A authored
      Allow message sizes >= 4GB
      Fixes #2076
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
    • Junchao Zhang's avatar
      Add basic (non)contiguous subarray support · 60059770
      Junchao Zhang authored
      The compile time constant MPI_SUBARRAYS_SUPPORTED is changed to true now.
      To test if a subarray is contiguous, one may use CFI_is_contiguous(cdesc). But note that
      CFI_is_contiguous(cdesc) is only applicable to arrays. When cdesc is a descripitor for a scalar,
      CFI_is_contiguous(cdesc) returns false. But apparently, scalars are contiguous in MPI's viewpoint.
      So we add a check against scalars.
      No review since F08 binding is experimental now.
    • Junchao Zhang's avatar
      Add the missing mpi_c_interface.F90 in makefile · 59acd452
      Junchao Zhang authored
      No review since F08 binding is experimental now.
    • Junchao Zhang's avatar
      Comment out c_funptr code due to Cray ftn bugs · 8ad3b137
      Junchao Zhang authored
      A small test case written by Bill Long. The err msg reported by the Cray compiler is :
      "If the C_PTR_2 argument is specified for the C_ASSOCATED intrinsic, it must have the same type as the C_PTR_1 argument."
      subroutine test (f, g, same)
        use,intrinsic :: iso_c_binding
        external :: f, g
        logical  :: same
        type(c_funptr) :: fp
        type(c_funptr) :: gp
        fp = c_funloc(f)
        gp = c_funloc(g)
        same = c_associated (fp, gp)          ! This gives the error
        same = c_associated (fp, c_funloc(g)) ! This gives the error too.
      end subroutine test
      No review since F08 binding is experimental now.
    • Junchao Zhang's avatar
      Fix whole array access errs on assumed-size arrays · a20b3882
      Junchao Zhang authored
      No review since F08 binding is experimental now.
    • Junchao Zhang's avatar
      Revise impl of MPI_IN_PLACE and MPI_BOTTOM · 9008c2f7
      Junchao Zhang authored
      Since Fortran forbids passing a disassociated (e.g., NULL) pointer to a non-pointer dummy argument
      (e.g., an assumed-type, assumed-rank argument), we can not use the same MPI_BOTTOM value in C from Fortran.
      So we use another approach.
      See implementation details at the EuroMPI-2014 paper "Implementing the MPI-3.0 Fortran 2008 Binding"
      No review since F08 binding is experimental now.
    • Junchao Zhang's avatar
      Code cleanup · 1c2b3d35
      Junchao Zhang authored
      No review since F08 binding is experimental now.
  2. 19 May, 2014 1 commit
    • Pavan Balaji's avatar
      Remove mpd. · 7f8f982b
      Pavan Balaji authored
      MPD has been deprecated for several major releases.  Now we are having
      to deal with bugs in it that are not worth spending time fixing.  It's
      time to let go of it.
      Signed-off-by: default avatarAntonio J. Pena <apenya@mcs.anl.gov>
  3. 13 May, 2014 1 commit
    • Michael Blocksome's avatar
      pamid: force context post on when using 'per-object' locks · 3e57215e
      Michael Blocksome authored
      The previous code would attempt to run in 'latency optimization mode' if
      the application requested !MPI_THREAD_MULTIPLE and yet still was linking
      with a 'per-object' mpich library - which is optimized for throughput.
      This means that in !MPI_THREAD_MULTIPLE:
      - context post was disabled
      - async progress was disabled
      - using multiple contexts was disabled
      This attempt to give users "better" performance in a fundamentally
      flawed configuration would cause the pamid adi to hang on acquiring a
      context lock. For example, consider:
      1. work function is posted to a context
      2. thread acquires the context lock
      3. thread advances the context
      4. work function invoked, then attempts to initiate communication
      5. WITH CONTEXT POST OFF the thread will attempt to acquire the context
         lock before it directly invokes the communcation function
      6. HANG
      A complete solution would be to identify all code paths that might
      result in this situation and instead avoid acquiring a lock that is
      already held.
      It should be noted that this is a bug in the pamid adi and NOT in the
      PAMI library iteself. The PAMI interface is explicit in that the
      PAMI_Context_lock() is non-recursive - for performance reasons.
  4. 12 May, 2014 4 commits
  5. 09 May, 2014 4 commits
  6. 08 May, 2014 4 commits
    • Rob Latham's avatar
      Got a bit carried away zapping zero-length blocks · 97114ec5
      Rob Latham authored
      A partial revert of the portion of commit
      50f3d580 : I did not mean to modify
      anything in the struct case.  I did, though, and that modification
      caused a bug in darray datatypes.  The zero-length blocklens in the
      struct case indicate upper bound and lower bounds and must be respected.
      Closes: #2089
      No Reviewer
    • Rob Latham's avatar
      update darray_read for test suite conventions · 664ef28d
      Rob Latham authored
      be less verbose, returning only ' No Error\n' if all is well, and return
      a non-zero exit code so i may git-bisect with it.
    • Rob Latham's avatar
      Test case for darray · c3d0d897
      Rob Latham authored
      ROMIO was not handling a particular darray pattern well.  Test case
      taken from openmpi mailing list.
      See ticket #2089
    • Rob Latham's avatar
      Actually initialize debug event logging · 5948c2b3
      Rob Latham authored
      ROMIO has DBG_FPRINTF throughout the code, but those DBG_FPRINTF
      statements will not do anything unless ROMIO registers itself with the
      MPICH debug event logging system.
      still todo: the ROMIO logging is way too verbose and needs to implement
      TRACE, DEBUG, and VERBOSE levels.
  7. 06 May, 2014 5 commits
    • Rob Latham's avatar
      ROMIO: Consolidate errno processing · 42056d48
      Rob Latham authored
      not every file system lives in a posix-like environment, but many do.
      for those file systems, open and delete will return -1 and set errno.
      The translation from unix erno to MPI error class was haphazard.  Get
      all aplicable file systems using ADIOI_Err_create_code so we have one
      place to update error code conversion.
      Closes: #2075
      Signed-off-by: default avatarWei-keng Liao <wkliao@ece.northwestern.edu>
    • Rob Latham's avatar
      Deal with more errno values · e0154ed8
      Rob Latham authored
      In preparation for using ADIOI_Err_create_code everywhere,
      ADIOI_Err_create_code did not handle some errno values that fs-specific
      drivers were handling.
      Signed-off-by: default avatarWei-keng Liao <wkliao@ece.northwestern.edu>
    • Rob Latham's avatar
      ROMIO dtype flattening: ignore zero-length counts · 50f3d580
      Rob Latham authored
      Bill Gropp reminds us not to forget this text from MPI-3
      MPI-3, page 84, lines 18-20.
        "Most datatype constructors have replication count or block length
        arguments. Allowed values are non-negative integers. If the value is
        zero, no elements are generated in the type map and there is no effect
        on datatype bounds or extent."
      The ROMIO flattening codes was treating blocklen elments of zero as
      the struct case probably has the same bug, but I'm deeply nervous about
      touching too much of this old code with a release imminent.
      Closes: #2073
    • Rob Latham's avatar
      turn datatype test into a function · 41908007
      Rob Latham authored
    • Wei-keng Liao's avatar
      ROMIO test case demonstrating indexed type bugs · 71f1dae1
      Wei-keng Liao authored and Rob Latham's avatar Rob Latham committed
      The problem is when defining a filetype using MPI_Type_indexed and the
      first few elements of argument blocklens[] are zeros, a collective write
      will miss writing some data.
      The test program first fills a file with 9 integers with values all
      -999.  It then defines a filetype and writes to the file in parallel
      with user buffers with value all 1s. Lastly, the file is read back and
      checked for contents.
      (it's Wei-keng's test case. I just hooked it into ROMIO's test suite)
      Signed-off-by: Rob Latham's avatarRob Latham <robl@mcs.anl.gov>
  8. 05 May, 2014 6 commits
  9. 02 May, 2014 5 commits
  10. 01 May, 2014 1 commit