      modify SHM_Win_allocate_shared and SHM_Win_free to accomodate global comm.
      MPIDI_CH3I_Win_allocate_shared can be called by both MPI_Win_allocate_shared
      and MPI_Win_allocate. If it is called by MPI_Win_allocate, we need node_comm,
      node_sizes and node_shm_base_addrs to allocate shm segment region, hence we need
      to copy from win_ptr->sizes to node_sizes at beginning and copy from node_shm_base_addrs
      to win_ptr->shm_base_addrs at last. If it is called by MPI_Win_allocate_shared,
      these copies can be eliminated.
      If there is only one process on this node, node_comm is NULL, we use comm_self instead.
      Waiting for passive RMA operations to finish in MPIDI_CH3_SHM_Win_free.
      This is for the optimization of allocating shared memory region in
      MPI_Win_allocate. In this case MPIDI_CH3_SHM_Win_free must first wait
      for passive RMA operations to finish before free the shared memory region.
      Note that because MPIDI_CH3_SHM_Win_free calls MPIDI_Win_free at last,
      and MPIDI_Win_free will also call inline function of waiting for passive
      RMA operation to finish, in this case the inline function will be called
      Refactoring code of waiting passive RMA operations.
      Moving code of waiting finish of passive RMA operations from MPIDI_Win_free
      to an inline function in mpidrma.h.
