1. 04 Mar, 2015 15 commits
    • Xin Zhao's avatar
      Change name from data_size to buf_size. · 45cdb282
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      When the lock is not satisfied, we queue up
      the lock request and op data in a lock entry
      queue. In the struct of lock entry, we use 'data_size'
      to remember the size of buffer for storing the
      data. Since the size of buffer is not type_size*count
      but might be type_extent*extent, here we change
      its name from 'data_size' to 'buf_size'.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      45cdb282
    • Xin Zhao's avatar
      Bug-fix: make RMA work correctly with pair basic type. · ce8bc310
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The original implementation of RMA does not consider pair basic
      types (e.g. MPI_FLOAT_INT, MPI_DOUBLE_INT). It only
      works correctly with builtin datatypes (e.g. MPI_INT, MPI_FLOAT).
      This patch makes the RMA work correctly with pair basic types.
      
      The bug is that: (1) when performing the ACC computation, the original
      implementation uses 'eltype' in the datatype structure, which is set
      when all basic elements in this datatype have the same builtin
      datatype. When basic elements have different builtin datatypes, like
      pair datatypes, the 'eltype' is set to MPI_DATATYPE_NULL. This makes
      the ACC computation be unable to work with pair types; (2) for all
      basic type of data, the original implementation assumes that
      they are all contiguous and issues them in an unpacked manner
      with length of data size (count*type_size). This is incorrect for
      pair datatypes, because most pair datatypes are non-contiguous
      (type_extent != type_size).
      
      In the previous patch, we already made 'eltype' to store basic
      type instead of builtin type. In this patch, we fixed this
      bug by (1) modify ACC computation to treat 'eltype' as basic
      type; (2) For non-contiguous basic type data, we use the noncontig
      API so that it will be issued in a packed manner.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      ce8bc310
    • Xin Zhao's avatar
      Make 'eltype' in datatype struct store basic type. · 67b69b2a
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      'eltype' in datatype struct is originally used to store the
      builtin datatype. However, this is not correct when working
      with RMA ACC-like operation since ACC-like operation needs
      to work with basic type.
      
      In this patch we make the 'eltype' to store basic type.
      Note that (1) whenever we need the builtin type,
      we should call macro MPID_Datatype_get_basic_type instead
      of directly accessing 'eltype'; (2) 'element_size' and
      'n_elements' still represents builtin type, whereas 'eltype'
      represents basic type.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      67b69b2a
    • Xin Zhao's avatar
      Modify macro PAIRTYPE_SIZE_EXTENT to accept correct arguments. · 49dd90f4
      Xin Zhao authored and Pavan Balaji's avatar Pavan Balaji committed
      
      
      The original implementation of PAIRTYPE_SIZE_EXTENT is not
      correct because it directly modifies variables internally
      without letting the user pass them. This patch adds those
      variables in the argument list.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      49dd90f4
    • Xin Zhao's avatar
      7899a602
    • Xin Zhao's avatar
    • Xin Zhao's avatar
      Simplify code: deleting derived DT code for op piggybacked with LOCK. · 2317b31d
      Xin Zhao authored
      
      
      We piggyback LOCK flag with operations that does not use
      derived datatypes. Therefore, here we delete the unnecessary
      code that deal with derived datatypes in piggyback LOCK code.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      2317b31d
    • Xin Zhao's avatar
      Simplify code: not using flag MPIDI_CH3_PKT_FLAG_RMA_IMMED_RESP for GACC/FOP. · 344bf958
      Xin Zhao authored
      
      
      Flag MPIDI_CH3_PKT_FLAG_RMA_IMMED_RESP is used to tell the target
      if the response packet of current GET, GACC and FOP should use
      IMMED packet type. We use IMMED packet type only when
      origin/target/result datatypes are all basic types.
      Since the target does not know origin/result datatypes, origin
      process needs to set a flag to inform the target.
      
      However, this usage is redundant for GACC and FOP packets. The
      reason is that, when we use IMMED packet type for GACC/FOP packets,
      origin/target/result datatypes must be basic types,
      in such case, we must use IMMED packet type for response packets
      as well, and usage of MPIDI_CH3_PKT_FLAG_RMA_IMMED_RESP and
      related code is not necessary. In short,
      flag MPIDI_CH3_PKT_FLAG_RMA_IMMED_RESP is useful only for GET operation.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      344bf958
    • Xin Zhao's avatar
      Use function hook instead of function pointer for win_free. · 42b5fcf1
      Xin Zhao authored
      
      
      The original implementation of win_free is not correct. The
      problem is described as follows:
      
      It uses a function pointer which is initially set to the CH3
      implementation, and can be overridden by the channel layer if
      the channel provides an specific implementation.  In the CH3
      win_free implementation, it first checks if all RMA
      communication is finished and epoch states is reset, then
      performs a global barrier, then frees the window resources
      that are allocated in CH3, and finally returns. In the Nemesis
      win_free implementation, it directly frees the window resources
      that are allocated in Nemesis, and calls the CH3 win_free at last.
      This makes no sense because we free the window resources before
      checking if the RMA communication is completed.
      
      To fix this issue, we add a function hook for channel layer
      to free its own resources, the the function hook is called from
      the CH3 win_free.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      42b5fcf1
    • Xin Zhao's avatar
      Allow the channel layer to implement Win_gather_info function. · 9dbcae0c
      Xin Zhao authored
      
      
      In this patch, we first add a function pointer of Win_gather_info
      in CH3 to allow different channel layers to implement their own
      version of Win_gather_info function. The function pointer is
      initially set to the default implementation in CH3 layer. If the
      channel layer provides an implementation of Win_gather_info, it
      will override the function pointer.
      
      Secondly, we provide an implementation of Win_gather_info in the
      Nemesis layer. In this implementation, we allocate basic_info_table[]
      in the SHM region, so that processes on the same node can share the
      same base_info_table[].
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      9dbcae0c
    • Xin Zhao's avatar
      Add a function hook to initialize window attributes in channel layer. · 7c1a8fb1
      Xin Zhao authored
      
      
      There are some window attributes in the channel layer that
      needs to be initialized during window creation. In this
      patch, we first add a win_hooks table that contains pointers
      to the channel's implementation of the function hooks. Secondly,
      we add a function hook 'win_init' to allow the channel layer to
      initialize its own attributes. The hook is called from the
      CH3 win_init function.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      7c1a8fb1
    • Xin Zhao's avatar
      Reduce size of shm_base_addrs[] from comm_size to node_size. · eddd8b91
      Xin Zhao authored
      
      
      Given one process, shm_base_addrs[] is used to store the base
      addresses (in the address space of this process) of SHM window
      on other processes. The original size of it is comm_size. However,
      the maximum number of SHM windows that this process can access
      to is node_size instead of comm_size, which results in a waste
      of memory since most slots in the array is NULL. In this patch
      we reduce the size of shm_base_addrs[] from comm_size to node_size.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      eddd8b91
    • Xin Zhao's avatar
      Store window basic attributes into a struct on window. · 9404e953
      Xin Zhao authored
      
      
      In this patch, we gather window basic attributes of other
      processes (base_addr, size, disp_unit, win_handle) using a
      struct called "basic_info_table". By doing this, we can use
      one contiguous memory region to store them.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      9404e953
    • Xin Zhao's avatar
      Change name of MPIDI_CH3U_Win_create_gather to MPIDI_CH3U_Win_gather_info. · 131e06ef
      Xin Zhao authored
      
      
      Function MPIDI_CH3U_Win_create_gather exchanges the window
      information among processes. It does not create new window.
      Here we change the function name to a more suitable one.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      131e06ef
    • Xin Zhao's avatar
      Add CH3 APIs and macros to allow channel to implement Alloc_mem/Free_mem. · 03d4c77b
      Xin Zhao authored
      
      
      Originally MPIDI_Alloc_mem(size, info) and MPIDI_Free_mem(base_ptr)
      in CH3 layer are implemented by calling MPIU_Malloc(size) and
      MPIU_Free(base_ptr) internally. This makes the underlying hardware
      be unable to develop a specific implementation of Alloc_mem and Free_mem,
      which is necessary when registering memory for RDMA operations.
      
      This patch defines new APIs, MPIDI_CH3I_Alloc_mem(size, info)
      and MPIDI_CH3I_Free_mem(base_ptr), to allow channels to implement
      their own memory allocators. If the channel does not have its own
      implementation, MPICH will fallback to the default implementation
      in CH3 layer which uses MPIU_Malloc and MPIU_Free.
      
      Thanks to Steffen Christgau <christgau@cs.uni-potsdam.de> for
      this contribution.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      03d4c77b
  2. 03 Mar, 2015 1 commit
  3. 27 Feb, 2015 1 commit
  4. 26 Feb, 2015 5 commits
  5. 13 Feb, 2015 18 commits
    • Wesley Bland's avatar
      Don't check for anysource if not recv · 3b04f6c0
      Wesley Bland authored
      
      
      The function to check whether an operation was an anysource receive was
      checking all request kinds, even if they weren't receives. This limits
      that check to only receives to avoid examining an uninitialized
      variable.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
      3b04f6c0
    • Sameh Sharkawi's avatar
      PAMID: Initial CUDA support · d9c15cf3
      Sameh Sharkawi authored
      
      
      This is an initial limited implementation for CUDA support. This is not
      performance optimized and only for testing.
      
      (ibm) D202477
      Signed-off-by: default avatarSu Huang <suhuang@us.ibm.com>
      d9c15cf3
    • Xin Zhao's avatar
      Delete comments that no longer make sense. · 21126e9e
      Xin Zhao authored
      
      
      The comments are no longer significant for
      new RMA infrastructure.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      21126e9e
    • Xin Zhao's avatar
      Delete unnecessary code. · e3ccad1f
      Xin Zhao authored
      
      
      Here req->dev.user_count is used when receiving FOP/CAS response
      data on origin in PktHandler_FOPResp and PktHandler_CASResp. Since
      the count always be 1, we did not set rma_op->result_count, and
      we directly set req->dev.user_count to 1 in packet handlers.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      e3ccad1f
    • Xin Zhao's avatar
      Simplify code of issuing RMA packets. · e3fc7e70
      Xin Zhao authored
      
      
      When issuing RMA packets, we do not need to
      store target_win_handle in the request on
      origin side but only need to store source_win_handle.
      Because when the response data is back, we
      only needs to use source_win_handle on origin
      size. This patch simplifies the code in this way.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      e3fc7e70
    • Xin Zhao's avatar
      Remove source_win_handle from GET-like RMA packets. · 80a71e11
      Xin Zhao authored
      
      
      For GET-like RMA packets and response packets (GACC,
      GET, FOP, CAS, GACC_RESP, GET_RESP, FOP_RESP, CAS_RESP),
      originally we carry source_win_handle in packet struct
      in order to locate window handle on origin side in the
      packet handler of response packets. However, this is
      not necessary because source_win_handle can be stored
      in the request on the origin side. This patch delete
      source_win_handle from those packets to reduce the size
      of packet union.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      80a71e11
    • Xin Zhao's avatar
    • Xin Zhao's avatar
      Bug-fix: use do_accumulate_op function for ACC computation. · c8ecef8d
      Xin Zhao authored
      
      
      do_accumulate_op() does more comprehensive work on ACC
      computation than OP function. For example, MPI_REPLACE
      is not defined as predefined computation and therefore
      not handled by OP function, but it is safely handled
      in do_accumulate_op(). This patch replace OP function
      with do_accumulate_op() on target side.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      c8ecef8d
    • Xin Zhao's avatar
      Use memcpy for structure assignment. · 59afc29c
      Xin Zhao authored
      
      
      In this patch we replace "=" with memcpy function
      when assigning structure content to another struct.
      Using "=" in this case is not compatible for llvm
      compiler.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      59afc29c
    • Xin Zhao's avatar
      Change argument of function finish_op_on_target. · 1b30ab19
      Xin Zhao authored
      
      
      In this patch, we replace one argument of function
      finish_op_on_target, "packet(op) type", with "has_response_data".
      Since finish_op_on_target does not care what specific
      packet(op) type it is processing on, but only cares
      about if the current op has response data (like GET/GACC),
      changing the argument in this way can simplify the
      code by avoiding acquiring packet(op) type everytime
      before calling finish_op_on_target.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      1b30ab19
    • Xin Zhao's avatar
      Add asserts for RMA packet types. · 21479b00
      Xin Zhao authored
      
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      21479b00
    • Xin Zhao's avatar
      Rewrite code of piggybacking IMMED data with RMA packets. · de9d0f21
      Xin Zhao authored
      
      
      Originally we add "immed_data" and "immed_len" areas to RMA packets,
      in order to piggyback small amount of data with packet header to
      reduce number of packets (Note that "immed_len" is necessary when
      the piggybacked data is not the entire data). However, those areas
      potentially increase the packet union size and worsen the two-sided
      communication. This patch fixes this issue.
      
      In this patch, we remove "immed_data" and "immed_len" from normal
      "MPIDI_CH3_Pkt_XXX_t" operation type (e.g. MPIDI_CH3_Pkt_put_t), and
      we introduce new "MPIDI_CH3_Pkt_XXX_immed_t" packt type for each
      operation (e.g. MPIDI_CH3_Pkt_put_immed_t).
      
      "MPIDI_CH3_Pkt_XXX_immed_t" is used when (1) both origin and target
      are basic datatypes, AND, (2) the data to be sent can be entirely fit
      into the header. By doing this, "MPIDI_CH3_Pkt_XXX_immed_t" needs
      "immed_data" area but can drop "immed_len" area. Also, since it only
      works with basic target datatype, it can drop "dataloop_size" area
      as well. All operations that do not satisfy (1) or (2) will use
      normal "MPIDI_CH3_Pkt_XXX_t" type.
      
      Originally we always piggyback FOP data into the packet header,
      which makes the packet size too large. In this patch we split the
      FOP operaton into IMMED packets and normal packets.
      
      Because CAS only work with 2 basic datatype and non-complex
      elements, the data amount is relatively small, we always piggyback
      the data with packet header and only use "MPIDI_CH3_Pkt_XXX_immed_t"
      packet type for CAS.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      de9d0f21
    • Xin Zhao's avatar
      Code-refactoring for RMA operations routines. · 3a017faa
      Xin Zhao authored
      
      
      This patch just does code refactoring for RMA operation rountines
      to make the code structure clearer. This patch does not change any
      functionality.
      
      After code refactoring, in each operation routine, for non-SHM operations
      we do the work in the following order:
      
      (1) allocate a new op struct;
      (2) fill areas in op struct, except for packet struct in op struct;
      (3) initialize packet struct in op struct, fill areas in packet struct;
      (4) enqueue op to data structure on window.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      3a017faa
    • Xin Zhao's avatar
      Remove lock_type and origin_rank areas from RMA packet. · 81e2b274
      Xin Zhao authored
      
      
      Originally we added lock_type and origin_rank areas
      in RMA packet, in order to piggyback passive lock request
      with RMA operations. However, those areas potentially
      enlarged the packet union size, and actually they are
      not necessary and can be completetly avoided.
      
      "Lock_type" is used to remember what types of lock (shared or
      exclusive) the origin wants to acquire on the target. To remove
      it from RMA packet, we use flags (already exists in RMA packet)
      to remember such information.
      
      "Origin_rank" is used to remember which origin has sent lock
      request to the target, so that when the lock is granted to this
      origin later, the target can send ack to that origin. Actually
      the target does not need to store origin_rank but can only store
      origin_vc, which is known from progress engine on target side.
      Therefore, we can completely remove origin_rank from RMA packet.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      81e2b274
    • Xin Zhao's avatar
      Add comments about RMA packet wrappers. · d46b848a
      Xin Zhao authored
      
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      d46b848a
    • Xin Zhao's avatar
      Modify packet wrappers to make them complete. · 064e60ce
      Xin Zhao authored
      
      
      Some packet wrappers did not include all packet types,
      this patch adds missed packet types to those wrappers.
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      064e60ce
    • Xin Zhao's avatar
      Re-apply modifications on mpidpkt.h. · fa958833
      Xin Zhao authored
      This patch re-apply modifications on mpidpkt.h that is
      temporarily reverted in bb3f9623
      
      .
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      fa958833
    • Xin Zhao's avatar
      Revert "Code-refactor: arrange RMA pkt structure." · 2cbc9180
      Xin Zhao authored
      This reverts commit 389aab16
      
      .
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@anl.gov>
      2cbc9180