1. 26 Mar, 2014 36 commits
    • Rob Latham's avatar
      feature request for pread/pwrite · 3e8d8114
      Rob Latham authored
      as with lustre, pread/pwrtie need a feature level newer than
      --enable-strict requests.  autoconf already checked for the function at
      configure time, so we know it's there.  TODO: more robust autoconf
      checks and provide a "my_pwrite" that wraps lseek/write for a fallback.
    • Rob Latham's avatar
      clean up P2Pcontig warnings · 3e37d5b4
      Rob Latham authored
      P2Pcontig needed a prototype and clang complaiend about several shadowed
    • Rob Latham's avatar
      blksize_t not available under enable-strict · f0617bdc
      Rob Latham authored
    • Rob Latham's avatar
      disable aio on BlueGene · e567464f
      Rob Latham authored
      until we figure out what's up with aio routines on blue gene, let's
      just disable it.  the romio aio tests would just hang in aio_suspend
    • Paul Coffman's avatar
      Replace ADIOI_GPFS_assert with ADIOI_Assert · 019b4218
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
    • Paul Coffman's avatar
      fixed configuration but so gpfs-only build does not include bluegene files · e9611507
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
    • Paul Coffman's avatar
      further lockless removal/doc fixes · b991dc7c
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      remove gpfs-specific shared fp call
      further lockless removal
      documentation fixups
    • Paul Coffman's avatar
    • Paul Coffman's avatar
      remove ad_bgl and ad_bglockless directories · 8fa2b391
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
    • Paul Coffman's avatar
      bg to gpfs new files · 88ccf467
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
    • Paul Coffman's avatar
      ad_bg to ad_gpfs major reorganization · d4b3106d
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      reconfiguration changes from bg to gpfs with platformspec; removal of
    • Paul Coffman's avatar
      makefile and autoconf changes · bc1ae637
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
    • Paul Coffman's avatar
      move ad_bg to ad_gpfs files and directories · 614819fd
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
    • Rob Latham's avatar
      romio configure: if/fi unbalanced · 283629cd
      Rob Latham authored
      astonishigly, the blue gene L(!) condition lacked a closing 'fi' but we
      never noticed since async I/O never worked on blue gene.  Use the AS_IF
      macro to make this less likely to recur in the future.
    • Rob Latham's avatar
      Significant simplification of ad_bg_open · e8b5dfdb
      Rob Latham authored
      In order to accomodate deferred open, we can't do *any* collective
      operations in ad_bg_open.  Any collectives have to happen one level up
      at ADIOI_GEN_Opencoll.
      We already promoted fs blksize in a prior patch, and simplified
      "scalable sync" in another patch, so when we remove the collective call
      (bcast of blocksize and fs type), we can also remove the "is it ok to
      scalalbe sync"? (because it will always be ok) and the "are we an
      fsync-aggregator" logic becuase now only the first io aggregator will be
      such an aggregator.
    • Rob Latham's avatar
      Rework "scalable flush" logic · 030fd0f1
      Rob Latham authored
      If deferred open is enabled, the logic that says if we should do a
      scalable flush and which processes should do the flush won't propagate
      to the non-aggregator processes.  Replace old way of doing things with a
      simpler stat-from-first-aggregator approach.
    • Rob Latham's avatar
      additional broadcast in open for blocksize · 1ce0fe81
      Rob Latham authored
      some file systems (e.g. bluegene) might stat the file and wish to inform
      all processes about some bit of underlying file system information (e.g.
      blocksize).  In the deferred open case, not all processes participate in
      the lowest, fs-specific open, so let's broadcast here in common code.
    • Rob Latham's avatar
      Promote blocksize to ADIOI_FileD struct · fdc4cb6f
      Rob Latham authored
      "file system blocksize" seems like one of those generic-enough values we
      should keep track of in the ADIOI_FileD structure.  This promotion will
      make some deferred-open fixes easier, too.
    • Rob Latham's avatar
      option to read/write to /dev/null · 87102f40
      Rob Latham authored
      Useful for situations like evaluating various collective I/O approaches.
      Reading/writing /dev/null eliminates file system variablity.
    • Paul Coffman's avatar
      balancecontig: topology-aware aggregator seleciton · 35d0c5b4
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      Two features in this change:
      - selection of file domains can result in some i/o nodes with more work
        than others (or some with no work at all), so distribute file domains
        with some awareness of i/o nodes
      - since we have some awareness of I/O nodes, select processes that are
        closes to those i/o nodes.
    • Rob Latham's avatar
      additional logging information · 917af7dc
      Rob Latham authored
      robl's got a one-off logger.  can pass extra information to it with an
      environment variable.  probably not useful in general.
    • Paul Coffman's avatar
      subset peer-to-peer two phase · 7ec40e90
      Paul Coffman authored and Rob Latham's avatar Rob Latham committed
      For certain workloads, MPI processes will only speak to one aggregator.
      In those cases, we will restrict communication to just point-to-point
      among those processes and their aggregator.  Sometimes called
      "p2pcontig" optimization.
    • Rob Latham's avatar
      deferred open fixup: broadcast from correct root · a19edd23
      Rob Latham authored
      in deferred open case, we will have created an "aggregator communicator"
      consisting of i/o aggregators.  the 'ranklist' enumerates ranks in
      fd->comm, but is not meaningful in the aggregator communicator.
      likewise, we do not simply broadcast from '0' in the no-deferred-open
      case because rank 0 might not be an aggregator.
    • Rob Latham's avatar
      romio-timing: even more finer grained timing · dde97df0
      Rob Latham authored
    • Rob Latham's avatar
      Two-phase I/O with threaded write · 0a437100
      Rob Latham authored
      Experimental async-with-pthread I/O approach to hiding some of the I/O
      latency/variability from the two-phase collectives.
      heavily modified from Paul Coffman's (pkcoffman@us.ibm.com) original work
    • Rob Latham's avatar
      coll_perf buffer size too small · d1e292ca
      Rob Latham authored
      crank up the size of coll_perf to something not laughably small
    • Rob Latham's avatar
      remove extraneous locks in bluegene driver · da9d3398
      Rob Latham authored
      The only reason these locks exist is becuse way back in BGL days someone
      at IBM thought it might be a good idea to have one driver that could
      access both NFS and GPFS.  There was also some concern about a large
      write call getting split up by the i/o forwarder.  fortunately, MPI-IO
      semantics mean applications that would be harmed by such a split already
      face "undefined" behavior.
    • Rob Latham's avatar
      Allocate two-phase buffer outside write path · 5e34974e
      Rob Latham authored
      There are many memory allocations in the write path.  Allocating the
      two-phase intermediate buffer outside of the write path might on some
      systems make a small difference, especially if there are many collective
      I/O calls, or if the system (like Blue Gene) has a small amount of
      memory.  Modified from Paul Coffman <pkcoff@us.ibm.com>'s original idea.
    • Rob Latham's avatar
      remove uneeded barrier · 6ca13e5d
      Rob Latham authored
      For quite some time the barrier here has had the comment 'Why?'.  Since
      no one knows, and there are plenty of other syncronization points in
      this path, remove it.
    • Rob Latham's avatar
      bluegene timing: condense into one set of timers · f3a43a5a
      Rob Latham authored
      bluegene timer code had two "levels" of timing.  that seemed kind of
      pointless so lump it all into one level.
    • Rob Latham's avatar
      use pwrite/pread instead of seek+write/read · 5bc8aedc
      Rob Latham authored
      this "new" system call (part of POSIX-2001) saves us a system call on
      Blue Gene.  Seems to get us back 5 seconds for one workload at small
      (half rack) scales.
    • Rob Latham's avatar
      bg-timing: DO NOT MERGE WITH MASTER: time lockless · c97af627
      Rob Latham authored
      bglockles uses the common read/write routines for contig read/wrties, so
      bluegene timing infrastrucutre wasn't actually timing anything.  Since
      this introduces blue gene bits into common code, please do not merge to
      master.  Instead, we should rework all the timing bits so that it no
      longer times "bluegene" but rather all of ROMIO.  Furthermore, the
      locky bits of 'bg:' driver should be yanked anyway, obviating the need
      for bglockless.
    • Rob Latham's avatar
      dust off old Blue Gene timing infrastrucutre · 751176bc
      Rob Latham authored
      Protected by an 'ifdef', this BGL-era code bitrotted a bit.  clean it up
      and see if it does anything useful today.
      - Removes preprocessor guards: the counters and timers do nothing
        expensive unless environment variables are set
      - remove the idea of a "level"
      - remove barrier from timing collection.
      - bugfix: MPI_Wtime() does not necessarily start at zero, so properly initialze
        timers for collective read/write
      - report only from I/O aggregators.  when reporting "time spent in i/o"
        vs "time spent communicating" it makes more sense to look only at the
        aggregators.  The non-aggregators are going to skew the results
        because they are spending some communication time actually
        communicating, but some of that time blocked, waiting for aggregators
        to finish.
    • Kenneth Raffenetti's avatar
      avoid duplicate data in MPIR_proctable · a4b73a8e
      Kenneth Raffenetti authored and Pavan Balaji's avatar Pavan Balaji committed
      De-dupes executable and host names in the MPIR_proctable by pointing
      to an existing copy. Closes #1821
      Signed-off-by: Pavan Balaji's avatarPavan Balaji <balaji@mcs.anl.gov>
    • Pavan Balaji's avatar
      Revert aclocal_cc.m4 commits that tweak VA_ARGS. · 5108fe17
      Pavan Balaji authored
      The following commits are reverted.
      1. "Better checks for VA_ARGS."; commit
      2. "Warning squash for clang."; commit
      The clang warning this was originally trying to solve has been fixed
      by the newer versions of clang, AFAICT.
      Signed-off-by: default avatarHuiwei Lu <huiweilu@mcs.anl.gov>
    • Pavan Balaji's avatar
      Remove more windows files. · 24a01405
      Pavan Balaji authored
      No reviewer.
  2. 25 Mar, 2014 1 commit
  3. 24 Mar, 2014 3 commits