1. 04 Mar, 2015 3 commits
    • Xin Zhao's avatar
      Store window basic attributes into a struct on window. · 9404e953
      Xin Zhao authored
      
      
      In this patch, we gather window basic attributes of other
      processes (base_addr, size, disp_unit, win_handle) using a
      struct called "basic_info_table". By doing this, we can use
      one contiguous memory region to store them.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      9404e953
    • Xin Zhao's avatar
      Change name of MPIDI_CH3U_Win_create_gather to MPIDI_CH3U_Win_gather_info. · 131e06ef
      Xin Zhao authored
      
      
      Function MPIDI_CH3U_Win_create_gather exchanges the window
      information among processes. It does not create new window.
      Here we change the function name to a more suitable one.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      131e06ef
    • Xin Zhao's avatar
      Add CH3 APIs and macros to allow channel to implement Alloc_mem/Free_mem. · 03d4c77b
      Xin Zhao authored
      
      
      Originally MPIDI_Alloc_mem(size, info) and MPIDI_Free_mem(base_ptr)
      in CH3 layer are implemented by calling MPIU_Malloc(size) and
      MPIU_Free(base_ptr) internally. This makes the underlying hardware
      be unable to develop a specific implementation of Alloc_mem and Free_mem,
      which is necessary when registering memory for RDMA operations.
      
      This patch defines new APIs, MPIDI_CH3I_Alloc_mem(size, info)
      and MPIDI_CH3I_Free_mem(base_ptr), to allow channels to implement
      their own memory allocators. If the channel does not have its own
      implementation, MPICH will fallback to the default implementation
      in CH3 layer which uses MPIU_Malloc and MPIU_Free.
      
      Thanks to Steffen Christgau <christgau@cs.uni-potsdam.de> for
      this contribution.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      03d4c77b
  2. 03 Mar, 2015 1 commit
  3. 27 Feb, 2015 1 commit
  4. 26 Feb, 2015 5 commits
  5. 13 Feb, 2015 18 commits
    • Wesley Bland's avatar
      Don't check for anysource if not recv · 3b04f6c0
      Wesley Bland authored
      
      
      The function to check whether an operation was an anysource receive was
      checking all request kinds, even if they weren't receives. This limits
      that check to only receives to avoid examining an uninitialized
      variable.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      3b04f6c0
    • Xin Zhao's avatar
      Delete comments that no longer make sense. · 21126e9e
      Xin Zhao authored
      
      
      The comments are no longer significant for
      new RMA infrastructure.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      21126e9e
    • Xin Zhao's avatar
      Delete unnecessary code. · e3ccad1f
      Xin Zhao authored
      
      
      Here req->dev.user_count is used when receiving FOP/CAS response
      data on origin in PktHandler_FOPResp and PktHandler_CASResp. Since
      the count always be 1, we did not set rma_op->result_count, and
      we directly set req->dev.user_count to 1 in packet handlers.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      e3ccad1f
    • Xin Zhao's avatar
      Simplify code of issuing RMA packets. · e3fc7e70
      Xin Zhao authored
      
      
      When issuing RMA packets, we do not need to
      store target_win_handle in the request on
      origin side but only need to store source_win_handle.
      Because when the response data is back, we
      only needs to use source_win_handle on origin
      size. This patch simplifies the code in this way.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      e3fc7e70
    • Xin Zhao's avatar
      Remove source_win_handle from GET-like RMA packets. · 80a71e11
      Xin Zhao authored
      
      
      For GET-like RMA packets and response packets (GACC,
      GET, FOP, CAS, GACC_RESP, GET_RESP, FOP_RESP, CAS_RESP),
      originally we carry source_win_handle in packet struct
      in order to locate window handle on origin side in the
      packet handler of response packets. However, this is
      not necessary because source_win_handle can be stored
      in the request on the origin side. This patch delete
      source_win_handle from those packets to reduce the size
      of packet union.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      80a71e11
    • Xin Zhao's avatar
    • Xin Zhao's avatar
      Bug-fix: use do_accumulate_op function for ACC computation. · c8ecef8d
      Xin Zhao authored
      
      
      do_accumulate_op() does more comprehensive work on ACC
      computation than OP function. For example, MPI_REPLACE
      is not defined as predefined computation and therefore
      not handled by OP function, but it is safely handled
      in do_accumulate_op(). This patch replace OP function
      with do_accumulate_op() on target side.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c8ecef8d
    • Xin Zhao's avatar
      Use memcpy for structure assignment. · 59afc29c
      Xin Zhao authored
      
      
      In this patch we replace "=" with memcpy function
      when assigning structure content to another struct.
      Using "=" in this case is not compatible for llvm
      compiler.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      59afc29c
    • Xin Zhao's avatar
      Change argument of function finish_op_on_target. · 1b30ab19
      Xin Zhao authored
      
      
      In this patch, we replace one argument of function
      finish_op_on_target, "packet(op) type", with "has_response_data".
      Since finish_op_on_target does not care what specific
      packet(op) type it is processing on, but only cares
      about if the current op has response data (like GET/GACC),
      changing the argument in this way can simplify the
      code by avoiding acquiring packet(op) type everytime
      before calling finish_op_on_target.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1b30ab19
    • Xin Zhao's avatar
      Add asserts for RMA packet types. · 21479b00
      Xin Zhao authored
      
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      21479b00
    • Xin Zhao's avatar
      Rewrite code of piggybacking IMMED data with RMA packets. · de9d0f21
      Xin Zhao authored
      
      
      Originally we add "immed_data" and "immed_len" areas to RMA packets,
      in order to piggyback small amount of data with packet header to
      reduce number of packets (Note that "immed_len" is necessary when
      the piggybacked data is not the entire data). However, those areas
      potentially increase the packet union size and worsen the two-sided
      communication. This patch fixes this issue.
      
      In this patch, we remove "immed_data" and "immed_len" from normal
      "MPIDI_CH3_Pkt_XXX_t" operation type (e.g. MPIDI_CH3_Pkt_put_t), and
      we introduce new "MPIDI_CH3_Pkt_XXX_immed_t" packt type for each
      operation (e.g. MPIDI_CH3_Pkt_put_immed_t).
      
      "MPIDI_CH3_Pkt_XXX_immed_t" is used when (1) both origin and target
      are basic datatypes, AND, (2) the data to be sent can be entirely fit
      into the header. By doing this, "MPIDI_CH3_Pkt_XXX_immed_t" needs
      "immed_data" area but can drop "immed_len" area. Also, since it only
      works with basic target datatype, it can drop "dataloop_size" area
      as well. All operations that do not satisfy (1) or (2) will use
      normal "MPIDI_CH3_Pkt_XXX_t" type.
      
      Originally we always piggyback FOP data into the packet header,
      which makes the packet size too large. In this patch we split the
      FOP operaton into IMMED packets and normal packets.
      
      Because CAS only work with 2 basic datatype and non-complex
      elements, the data amount is relatively small, we always piggyback
      the data with packet header and only use "MPIDI_CH3_Pkt_XXX_immed_t"
      packet type for CAS.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      de9d0f21
    • Xin Zhao's avatar
      Code-refactoring for RMA operations routines. · 3a017faa
      Xin Zhao authored
      
      
      This patch just does code refactoring for RMA operation rountines
      to make the code structure clearer. This patch does not change any
      functionality.
      
      After code refactoring, in each operation routine, for non-SHM operations
      we do the work in the following order:
      
      (1) allocate a new op struct;
      (2) fill areas in op struct, except for packet struct in op struct;
      (3) initialize packet struct in op struct, fill areas in packet struct;
      (4) enqueue op to data structure on window.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      3a017faa
    • Xin Zhao's avatar
      Remove lock_type and origin_rank areas from RMA packet. · 81e2b274
      Xin Zhao authored
      
      
      Originally we added lock_type and origin_rank areas
      in RMA packet, in order to piggyback passive lock request
      with RMA operations. However, those areas potentially
      enlarged the packet union size, and actually they are
      not necessary and can be completetly avoided.
      
      "Lock_type" is used to remember what types of lock (shared or
      exclusive) the origin wants to acquire on the target. To remove
      it from RMA packet, we use flags (already exists in RMA packet)
      to remember such information.
      
      "Origin_rank" is used to remember which origin has sent lock
      request to the target, so that when the lock is granted to this
      origin later, the target can send ack to that origin. Actually
      the target does not need to store origin_rank but can only store
      origin_vc, which is known from progress engine on target side.
      Therefore, we can completely remove origin_rank from RMA packet.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      81e2b274
    • Xin Zhao's avatar
      Add comments about RMA packet wrappers. · d46b848a
      Xin Zhao authored
      
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      d46b848a
    • Xin Zhao's avatar
      Modify packet wrappers to make them complete. · 064e60ce
      Xin Zhao authored
      
      
      Some packet wrappers did not include all packet types,
      this patch adds missed packet types to those wrappers.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      064e60ce
    • Xin Zhao's avatar
      Re-apply modifications on mpidpkt.h. · fa958833
      Xin Zhao authored
      This patch re-apply modifications on mpidpkt.h that is
      temporarily reverted in bb3f9623
      
      .
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      fa958833
    • Xin Zhao's avatar
      Revert "Code-refactor: arrange RMA pkt structure." · 2cbc9180
      Xin Zhao authored
      This reverts commit 389aab16
      
      .
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      2cbc9180
    • Xin Zhao's avatar
      Temporarily revert commits for src/mpid/ch3/include/mpidpkt.h · bb3f9623
      Xin Zhao authored
      We are going to revert the commit 389aab16 because it re-ordered
      the attributes in RMA packet structs in mpidpkt.h and messed up
      the alignments.
      
      This commit temporarily reverts the following commits, which
      only reverts modification on mpidpkt.h after commit 389aab16.
      
      e36203c3, 45afd1fd, 3a05784f, 87acbbbe, b155e7e0
      
      We will re-apply those modifications after we revert 389aab16
      
      .
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      bb3f9623
  6. 08 Feb, 2015 3 commits
    • Xin Zhao's avatar
      Bug-fix: guarantee atomicity for FOP and GACC. · bad898f9
      Xin Zhao authored
      
      
      FOP, CAS and GACC are atomic "read-modify-write" operations,
      which means when the target window is defined on a SHM region,
      we need inter-process lock to guarantee the atomicity of the
      entire "read+OP". The current implementation is correct for
      SHM-based RMA operations, but not correct for AM-based RMA
      operations: for SHM-based operations, it protects the entire
      "read+OP", but for AM-based operations, it only protects the
      "OP" part.
      
      This patch fixes this issue by protecting the memory copy to
      temporary buffer and computation together for AM-based operations.
      
      Fix ticket 2226
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      bad898f9
    • Xin Zhao's avatar
      Bug-fix: making processes with SHM and without SHM win work corrrectly. · 8c5cb1e6
      Xin Zhao authored
      
      
      In commit 7d71278, if node_comm is NULL (only self process is on that
      node), we call allocate_no_shm() in CH3 to allocate window. If
      node_comm is not NULL (more than one process is on the same node), we
      call allocate_shm() in Nemesis to allocate SHM window. However,
      the exchanged information amount (in MPI_Allgather) is different
      in allocate_no_shm() and allocate_shm(), which leads to wrong execution
      when both SHM window and non-SHM window exist. This patch fixes this issue.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      8c5cb1e6
    • Xin Zhao's avatar
      Delete unnecessary code in SHM allocate / free. · 346050ea
      Xin Zhao authored
      We allocate / free SHM regions only when node_comm exists,
      which means there are more than one processes on the same
      node. When node_comm is NULL (only self process is on that
      node), we call default allocate / free functions in CH3.
      (Please refer to commit f02eed5b
      
      )
      
      Here we delete unnecessary code dealing with node_comm being
      NULL in SHM allocate / free functions.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      346050ea
  7. 04 Feb, 2015 2 commits
  8. 03 Feb, 2015 1 commit
  9. 30 Jan, 2015 2 commits
    • Wesley Bland's avatar
      Add mpir_errflag_t to MPIDI_Request · 67ec0ab1
      Wesley Bland authored
      
      
      Non-blocking communication requests need a way to track whether an error
      has occurred in a previous part of the NBC schedule. This adds an
      errflag to the request object itself so the tracking is possible.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      67ec0ab1
    • Wesley Bland's avatar
      Refactor MPIC functions to use the MPID objects · 54362c00
      Wesley Bland authored
      
      
      The MPIC helper functions have been using MPI_Comm and MPI_Request
      objects instead of their MPID_* counterparts. This leads to a bunch of
      unnecessary conversions back and forth between the two types of objects
      and makes the work incompatible with other parts of the codebase
      (non-blocking collectives for instance).
      
      This patch converts all of the MPIC_* functions to use MPID_Comm and
      MPID_Request and changes all of the collective calls to use them now
      too.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      54362c00
  10. 27 Jan, 2015 1 commit
  11. 22 Jan, 2015 2 commits
    • Huiwei Lu's avatar
      FT: Fixes ref counts in shrink and agree · 93e816cc
      Huiwei Lu authored
      
      
      When process fails, fault tolerance scheme takes a different path to
      deal with MPI object reference counts than the existing one. Some
      reference counts were not properly set in FT path so when configured
      with --enable-g=all, some ft tests will show leaked context id, dirty
      COMM, GROUP and REQUEST objects and so on when exit.
      
      This patch fixes ft/shrink and ft/agree with "--enable-g=all". Stack
      allocated objects of requests, communicators and groups will be freed by
      FT.
      Signed-off-by: default avatarWesley Bland <wbland@anl.gov>
      93e816cc
    • Wesley Bland's avatar
      Fix for MPIX_COMM_AGREE to not return incorrect errors · a3dd5f40
      Wesley Bland authored
      
      
      MPIX_Comm_agree should not return errors if the failed processes have
      all been acknowledged. Previously, it was returning errors
      unnecessarily, but this makes sure that the errcode is MPI_SUCCESS when
      appropriate.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      a3dd5f40
  12. 14 Jan, 2015 1 commit