- 26 Mar, 2014 36 commits
-
-
Rob Latham authored
as with lustre, pread/pwrtie need a feature level newer than --enable-strict requests. autoconf already checked for the function at configure time, so we know it's there. TODO: more robust autoconf checks and provide a "my_pwrite" that wraps lseek/write for a fallback.
-
Rob Latham authored
P2Pcontig needed a prototype and clang complaiend about several shadowed declarations.
-
Rob Latham authored
-
Rob Latham authored
until we figure out what's up with aio routines on blue gene, let's just disable it. the romio aio tests would just hang in aio_suspend
-
-
-
remove gpfs-specific shared fp call further lockless removal documentation fixups
-
-
-
-
reconfiguration changes from bg to gpfs with platformspec; removal of lockless
-
-
-
Rob Latham authored
astonishigly, the blue gene L(!) condition lacked a closing 'fi' but we never noticed since async I/O never worked on blue gene. Use the AS_IF macro to make this less likely to recur in the future.
-
Rob Latham authored
In order to accomodate deferred open, we can't do *any* collective operations in ad_bg_open. Any collectives have to happen one level up at ADIOI_GEN_Opencoll. We already promoted fs blksize in a prior patch, and simplified "scalable sync" in another patch, so when we remove the collective call (bcast of blocksize and fs type), we can also remove the "is it ok to scalalbe sync"? (because it will always be ok) and the "are we an fsync-aggregator" logic becuase now only the first io aggregator will be such an aggregator.
-
Rob Latham authored
If deferred open is enabled, the logic that says if we should do a scalable flush and which processes should do the flush won't propagate to the non-aggregator processes. Replace old way of doing things with a simpler stat-from-first-aggregator approach.
-
Rob Latham authored
some file systems (e.g. bluegene) might stat the file and wish to inform all processes about some bit of underlying file system information (e.g. blocksize). In the deferred open case, not all processes participate in the lowest, fs-specific open, so let's broadcast here in common code.
-
Rob Latham authored
"file system blocksize" seems like one of those generic-enough values we should keep track of in the ADIOI_FileD structure. This promotion will make some deferred-open fixes easier, too.
-
Rob Latham authored
Useful for situations like evaluating various collective I/O approaches. Reading/writing /dev/null eliminates file system variablity.
-
Two features in this change: - selection of file domains can result in some i/o nodes with more work than others (or some with no work at all), so distribute file domains with some awareness of i/o nodes - since we have some awareness of I/O nodes, select processes that are closes to those i/o nodes.
-
Rob Latham authored
robl's got a one-off logger. can pass extra information to it with an environment variable. probably not useful in general.
-
For certain workloads, MPI processes will only speak to one aggregator. In those cases, we will restrict communication to just point-to-point among those processes and their aggregator. Sometimes called "p2pcontig" optimization.
-
Rob Latham authored
in deferred open case, we will have created an "aggregator communicator" consisting of i/o aggregators. the 'ranklist' enumerates ranks in fd->comm, but is not meaningful in the aggregator communicator. likewise, we do not simply broadcast from '0' in the no-deferred-open case because rank 0 might not be an aggregator.
-
Rob Latham authored
-
Rob Latham authored
Experimental async-with-pthread I/O approach to hiding some of the I/O latency/variability from the two-phase collectives. heavily modified from Paul Coffman's (pkcoffman@us.ibm.com) original work
-
Rob Latham authored
crank up the size of coll_perf to something not laughably small
-
Rob Latham authored
The only reason these locks exist is becuse way back in BGL days someone at IBM thought it might be a good idea to have one driver that could access both NFS and GPFS. There was also some concern about a large write call getting split up by the i/o forwarder. fortunately, MPI-IO semantics mean applications that would be harmed by such a split already face "undefined" behavior.
-
Rob Latham authored
There are many memory allocations in the write path. Allocating the two-phase intermediate buffer outside of the write path might on some systems make a small difference, especially if there are many collective I/O calls, or if the system (like Blue Gene) has a small amount of memory. Modified from Paul Coffman <pkcoff@us.ibm.com>'s original idea.
-
Rob Latham authored
For quite some time the barrier here has had the comment 'Why?'. Since no one knows, and there are plenty of other syncronization points in this path, remove it.
-
Rob Latham authored
bluegene timer code had two "levels" of timing. that seemed kind of pointless so lump it all into one level.
-
Rob Latham authored
this "new" system call (part of POSIX-2001) saves us a system call on Blue Gene. Seems to get us back 5 seconds for one workload at small (half rack) scales.
-
Rob Latham authored
bglockles uses the common read/write routines for contig read/wrties, so bluegene timing infrastrucutre wasn't actually timing anything. Since this introduces blue gene bits into common code, please do not merge to master. Instead, we should rework all the timing bits so that it no longer times "bluegene" but rather all of ROMIO. Furthermore, the locky bits of 'bg:' driver should be yanked anyway, obviating the need for bglockless.
-
Rob Latham authored
Protected by an 'ifdef', this BGL-era code bitrotted a bit. clean it up and see if it does anything useful today. - Removes preprocessor guards: the counters and timers do nothing expensive unless environment variables are set - remove the idea of a "level" - remove barrier from timing collection. - bugfix: MPI_Wtime() does not necessarily start at zero, so properly initialze timers for collective read/write - report only from I/O aggregators. when reporting "time spent in i/o" vs "time spent communicating" it makes more sense to look only at the aggregators. The non-aggregators are going to skew the results because they are spending some communication time actually communicating, but some of that time blocked, waiting for aggregators to finish.
-
De-dupes executable and host names in the MPIR_proctable by pointing to an existing copy. Closes #1821 Signed-off-by:
Pavan Balaji <balaji@mcs.anl.gov>
-
Pavan Balaji authored
The following commits are reverted. 1. "Better checks for VA_ARGS."; commit de80ec87. 2. "Warning squash for clang."; commit ccf7f70c . The clang warning this was originally trying to solve has been fixed by the newer versions of clang, AFAICT. Signed-off-by:
Huiwei Lu <huiweilu@mcs.anl.gov>
-
Pavan Balaji authored
No reviewer.
-
- 25 Mar, 2014 1 commit
-
-
Rob Latham authored
ROMIO code assumes all processes will use the same ROMIO driver. we were not reaching the "find a common file system" logic when NFS was enabled, everyone stat-ed the file system without errors, but some processees found a different file system (like if some processes are writing to NFS and others to UFS) See discussion beginning here: http://lists.mpich.org/pipermail/discuss/2014-March/002403.html Tested-by:
Jeff Squyres <jsquyres@cisco.com>
-
- 24 Mar, 2014 3 commits
-
-
Rob Latham authored
Signed-off-by:
Michael Blocksome <blocksom@us.ibm.com>
-
Rob Latham authored
Signed-off-by:
Michael Blocksome <blocksom@us.ibm.com>
-
Michael Blocksome authored
powerpc64-bgq-linux-gcc v4.7.2
-