1. 22 May, 2014 3 commits
    • Wesley Bland's avatar
      Make handling of request cleanup more uniform · 1e171ff6
      Wesley Bland authored
      
      
      There are quite a few places where the request cleanup is done via:
      
      MPIU_Object_set_ref(req, 0);
      MPIDI_CH3_Request_destroy(req);
      
      when it should be:
      
      MPID_Request_release(req);
      
      This makes the handling more uniform so requests are cleaned up by releasing
      references rather than hitting them with the destroy hammer.
      
      Fixes #1664
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      1e171ff6
    • Su Huang's avatar
      pamid: task 0 hang in MPI_Init() if MP_PRINTENV=yes · 6b5993af
      Su Huang authored
      
      
      In MPIDI_Print_mpenv(), when calling MPIR_Gather_impl to gather all MP environment variables
      from all tasks in a job, the errflag parameter was not initialized to 0 before it was
      passed to the routine:
             mpi_errno = MPIR_Gather_impl(&sender, sizeof(MPIDI_printenv_t), MPI_BYTE, gatherer,
                                          sizeof(MPIDI_printenv_t),MPI_BYTE, 0,comm_ptr,
                                          (int *) &errflag);
      
      To process the Gather collective call, each task issued MPIC_Recv, MPIC_Send and MPIC_Wait.
      
      MPIC_Send() sends a message with MPIR_GATHER_TAG (defined as 0x3). Since the routine had a
      non-zero errflag passed in,
      
          if (*errflag && MPIR_CVAR_ENABLE_COLL_FT_RET)
              MPIR_TAG_SET_ERROR_BIT(tag);
      
      the 30th bit of the tag was set to 1 :(1 << 30) (MPIR_TAG_ERROR_BIT). Therefore, the tag was
      changed from 0x3 to 0x40000003.
      
      On task 1, a message with this modified tag was sent to task 0. When the message arrived at
      task 0, the receive for the message with the original tag of 0x3 had been posted.
      However, the tag in the arrived message differed from the tag from the posted receive.
      So no match was found for the arrived message which was the root cause of the hang.
      
      MPIR_TAG_SET_ERROR_BIT was added for MPI 3.0 (pe rbrew and beyond) which explains why
      the job does not fail with prior releases.
      
       (ibm) D197745
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
      6b5993af
    • Nysal Jan K.A's avatar
      pamid: Allow message sizes greater than 4GB · 98b5e585
      Nysal Jan K.A authored
      
      
      Allow message sizes >= 4GB
      
      Fixes #2076
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
      98b5e585
  2. 13 May, 2014 1 commit
    • Michael Blocksome's avatar
      pamid: force context post on when using 'per-object' locks · 3e57215e
      Michael Blocksome authored
      The previous code would attempt to run in 'latency optimization mode' if
      the application requested !MPI_THREAD_MULTIPLE and yet still was linking
      with a 'per-object' mpich library - which is optimized for throughput.
      This means that in !MPI_THREAD_MULTIPLE:
      - context post was disabled
      - async progress was disabled
      - using multiple contexts was disabled
      
      This attempt to give users "better" performance in a fundamentally
      flawed configuration would cause the pamid adi to hang on acquiring a
      context lock. For example, consider:
      1. work function is posted to a context
      2. thread acquires the context lock
      3. thread advances the context
      4. work function invoked, then attempts to initiate communication
      5. WITH CONTEXT POST OFF the thread will attempt to acquire the context
         lock before it directly invokes the communcation function
      6. HANG
      
      A complete solution would be to identify all code paths that might
      result in this situation and instead avoid acquiring a lock that is
      already held.
      
      It should be noted that this is a bug in the pamid adi and NOT in the
      PAMI library iteself. The PAMI interface is explicit in that the
      PAMI_Context_lock() is non-recursive - for performance reasons.
      3e57215e
  3. 09 May, 2014 4 commits
  4. 02 May, 2014 5 commits
  5. 01 May, 2014 1 commit
  6. 30 Apr, 2014 3 commits
    • Su Huang's avatar
      pamid: mapped datatypes needed to be unmapped · 2487b73e
      Su Huang authored
      
      
      for one-sided communication, the completion of a local operation is when
      the completion handler of the operation is activated. The datatype
      associated with the operation can not be freed until the completion
      handler is activated which is MPIDI_Win_DoneCB. In the current code, the
      mapped datatype is unmapped immediately freed after the routine gets the
      control from PAMI calls. The calls could be PAMI_Send, PAMI_Get, PAMI_Rget,
      PAMI_Put and PAMI_Rget. Please note that the datatype is needed for PAMI
      to process the data.
      
      The fix is to unmap the datatype in MPIDI_Win_DoneCB() and remove all
      "unmap" datatype operations from put and get operations.
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
      2487b73e
    • Su Huang's avatar
      pamid: fix a bug in MPID_Win_free · d2fadf79
      Su Huang authored
      
      
      MPID_Win_allocated_shared(), the routine could obtain memory space via either
      shmget() or MPIU_Malloc() if one task on a node. In MPID_Win_free(), the
      routine only frees shared memory space.
      
      The fix is to free none-shared space if there is one.
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
      d2fadf79
    • Su Huang's avatar
      pamid: memory leak on handles for request based RMA operations · 6bef6711
      Su Huang authored
      
      
      Request based RMA communication operations require that each operation
      allocates a communication request object and associates it with the request
      handle (the argument request) that can be used to wait or test for
      completion. Both the structures for the request handle and the window
      structure are not being freed. The problem is when the request handle
      is created, the ref_count of the object is set to 2. When
      MPID_Request_releas_inline() is called by a request based RMA operation,
      the ref_count of the handle is decremented to 1. The allocated memory spaces
      are freed only if ref_count is equal to 0.
      
      To fix the problem, set the ref_count to 1 when the request handle is
      created. In MPID_Request_release_inline, the routine decrements the
      ref_count to 0 which leads the memory space be freed.
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
      6bef6711
  7. 29 Apr, 2014 4 commits
  8. 23 Apr, 2014 1 commit
  9. 11 Apr, 2014 2 commits
  10. 02 Apr, 2014 1 commit
  11. 01 Apr, 2014 2 commits
    • Pavan Balaji's avatar
      Move symbols to correct libraries. · 9c337914
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      
      
      Maintain a list of files that go into each library.  If a particular
      binding is not enabled, the list variable still exists, but will just
      be empty.  This simplifies the management of which files/symbols go
      into which library.
      
      Move all MPI_ symbols to the libmpi library and all other symbols to
      the libpmpi library.  All Fortran 77 symbols go into libmpif77.so,
      while C symbols go into libmpi.so.  There are some exceptions, such as
      status_f2c, which are handled by the Fortran code but used in C.  Our
      Fortran 90 build only creates a few symbols and uses the f77 symbols
      for everything else.  These few symbols go into libmpifort.so.
      
      Also update compiler wrappers to link to correct libraries.  mpif77
      should now link with libmpif77.  mpif90 links with both libmpifort and
      libmpif77, since our F90 build still keeps the core Fortran library
      symbols in libf77.
      
      We completely ignored the F77 library earlier.  This was OK because
      all of the Fortran symbols were ending up in libmpi.  Now that we have
      separated out the symbols to the right library, we now need to link to
      libmpif77 as well.
      
      Also added inter-library dependencies.
      
      libmpi has a dependency on several internal libraries: libmpl, libopa.
      libmpicxx did not have a dependency on libmpi, added.
      libmpif77 did not have a dependency on libmpi, added.
      libmpifort did not have a dependency on libmpi, added.
      
      This dependency model is sufficient for C and F77, but not for C++ and
      F90.  The C and F77 libraries contain all the symbols the application
      relies on, but the F90 and C++ libraries don't.  In the case of F90,
      symbols such as mpi_bcast are missing and are borrowed from the F77
      library.  In the case of C++, mpicxx.h contains calls directly to C
      functions (such as MPI_Reduce_local), which get embedded into the
      application.
      
      Fixes #2023.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      9c337914
    • Pavan Balaji's avatar
      Rename mpich libraries. · 42fe2ccf
      Pavan Balaji authored and Kenneth Raffenetti's avatar Kenneth Raffenetti committed
      
      
      The following library names are used to make the naming consistent
      across the ABI compatibility group:
      
      C libraries: libmpi.* and libpmpi.*
      C++ library: libmpicxx.*
      F77 libraries: libmpif77.*
      F90+ library: libmpifort.*
      
      This patch also gets rid of the FWRAPNAME variable, which is a
      duplicate of MPIFLIBNAME.  Similarly, FCWRAPNAME is removed and a new
      variable MPIFCLIBNAME is added, so it's consistent with the other
      names.
      
      PMPIFLIBNAME, which was unused, is no longer present.
      
      Fixes #2039.
      Signed-off-by: Kenneth Raffenetti's avatarKen Raffenetti <raffenet@mcs.anl.gov>
      42fe2ccf
  12. 27 Mar, 2014 2 commits
  13. 24 Mar, 2014 5 commits
  14. 23 Mar, 2014 1 commit
    • Wesley Bland's avatar
      Remove the use of MPIDI_TAG_UB · 055abbd3
      Wesley Bland authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The constant MPIDI_TAG_UB is used in only one place at the moment, in the
      initialization of ch3 (source:src/mpid/ch3/src/mpid_init.c@4b35902a#L131). The
      problem is that the value which is being set (MPIR_Process.attrs.tag_ub) is
      set differently in pamid (INT_MAX). This leads to weird results when we set
      apart a bit in the tag space for failure propagation in non-blocking
      collectives (see #2008).
      
      Since this value isn't being referenced anywhere else, there doesn't seem to
      be a use for it and it's just leading to confusion. To avoid this, here we
      remove this value and just set MPIR_Process.attrs.tag_ub to INT_MAX in both
      ch3 and pamid.
      
      See #2009
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
      055abbd3
  15. 19 Mar, 2014 1 commit
    • Su Huang's avatar
      MPICH test case linked_list_lockall hang in MPI_Win_flush · 4b35902a
      Su Huang authored
      
      
      The scenario of the hang is described as follows:
      
        Assuming the job runs with 4 tasks, task 0 is in a loop  of processing the
        following RMA operations to fetch the displacement, the loop ends if the
        displacement is being updated.
      
          MPI_Win_get_accumulate( target rank is task 0)
          MPI_Win_flush(task 0)
      
        task 1 and 3 hang in MPI_Win_flush() waiting for a call to
        MPI_Win_compare_and_swap() to complete. The target rank for this operation is
        task 0.
      
        task 2 hangs in MPI_Win_flush() waiting for a call to MPI_Accumulate() to
        complete. The target rank for this operation is task 0 as well.
      
        Task 0 is busy making MPI_Win_get_accumulate() and MPI_Win_flush() calls to
        see if the displacement is being updated, the target rank of the operation is
        task 0 itself which means the operation is local and can be completed without
        a need of making a PAMI dispatcher call.  Meanwhile, the other three tasks
        issue RMA operations to the target task 0 and wait for the completion of the
        operations. Because task 0 is in a loop of making local operations, no PAMI
        dispatcher is called, no progress made for any remote operations which is the
        root cause of the hang.
      
      The fix for the problem is to add a call to PAMI dispatcher in MPI_Win_flush(),
      the call is made prior to the check of the condition. Current code checks the
      condition first, if the condition is satisfied, then no PAMI dispatcher is called.
      
      The following statement in MPI_Win_flush()
      
        MPID_PROGRESS_WAIT_WHILE(sync->total != sync->complete)
      
      will be replaced by
      
        MPID_PROGRESS_WAIT_DO_WHILE(sync->total != sync->complete)
      
      (ibm) D196445
      Signed-off-by: default avatarMichael Blocksome <blocksom@us.ibm.com>
      4b35902a
  16. 13 Mar, 2014 2 commits
  17. 11 Mar, 2014 1 commit
  18. 09 Mar, 2014 1 commit