Commit df801f02 authored by Shane Snyder's avatar Shane Snyder
Browse files

Merge branch 'python-package' into 'master'

Python Package: Rename mode switch to dtype to select datatype. Simplify interface log_get_record. Have lustre records include OSTs. Add library version check.

See merge request !62
parents 2d54cd8e a0524fd9
playground/*
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
......@@ -21,6 +23,7 @@ parts/
sdist/
var/
wheels/
wheelhouse/
*.egg-info/
.installed.cfg
*.egg
......
.PHONY: clean clean-test clean-pyc clean-build docs help
clean: clean-build clean-pyc clean-test # remove all build, test, coverage and Python artifacts
clean: clean-build clean-docs clean-pyc clean-test # remove all build, test, coverage and Python artifacts
clean-build: # remove build artifacts
rm -rf build/
rm -rf dist/
rm -rf .eggs/
rm -rf docs/_build
find . -name '*.egg-info' -exec rm -rf {} +
find . -name '*.egg' -exec rm -rf {} +
clean-docs:
rm -rf docs/build
rm -f docs/darshan.rst
rm -f docs/darshan.*.rst
rm -f docs/modules.rst
clean-wheels:
rm -rf wheelhouse
clean-pyc: # remove Python file artifacts
find . -name '*.pyc' -exec rm -f {} +
find . -name '*.pyo' -exec rm -f {} +
......@@ -22,7 +30,18 @@ clean-test: # remove test and coverage artifacts
rm -f .coverage
rm -rf htmlcov/
rm -rf .pytest_cache
rm -rf pkgtest
clean-devenv:
rm -rf devenv/venv
rm -rf devenv/libdarshanutil
rm -rf devenv
devenv:
./devel/build-libdarshanutil.sh
python3 -m venv devenv/venv
source devenv/venv/bin/activate && pip install -r requirements_dev.txt
lint: # check style with flake8
flake8 darshan tests
......@@ -40,29 +59,42 @@ coverage: # check code coverage quickly with the default Python
xdg-open htmlcov/index.html
docs: # generate Sphinx HTML documentation, including API docs
rm -f docs/darshan.rst
rm -f docs/darshan.backend.rst
rm -f docs/darshan.plots.rst
rm -f docs/modules.rst
docs: clean-docs # generate Sphinx HTML documentation, including API docs
sphinx-apidoc -o docs/ darshan
$(MAKE) -C docs clean
$(MAKE) -C docs html
docs-show: docs
xdg-open docs/_build/html/index.html
docs-show:
xdg-open docs/build/html/index.html
servedocs: docs # compile the docs watching for changes
watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D .
release: dist # package and upload a release
bump-minior:
wheels:
./devel/build-all.sh
release: #dist # package and upload a release
#twine upload --repository testpypi dist/*
twine upload dist/*
dist: clean # builds source and wheel package
python setup.py sdist
python setup.py bdist_wheel
# might want to remove none-any wheel, but more specific wheels seem to take precedence
# rm -r dist/*non-any.whl
# gather binary wheels
# pip >= 19.0
find ./wheelhouse/manylinux2010* -name *manylinux*.whl -exec cp {} ./dist \;
# pip >= 8.1.0, may overwrite previously copied, but these have greater compatability
find ./wheelhouse/manylinux1* -name *manylinux*.whl -exec cp {} ./dist \;
ls -l dist
......
=========
pydarshan
=========
=======================
PyDarshan Documentation
=======================
Python utilities to interact with Darshan log records of HPC applications.
pydarshan requires darshan-utils (3.2.2+) to be installed.
......@@ -39,22 +39,16 @@ A brief examples showing some of the basic functionality is the following::
darshan.enable_experimental()
report.summarize()
# ...
# Generate a timeline from dxt records
report.read_all_dxt_records()
report.create_timeline() # experimental
Installation
------------
To install use either::
To install in most cases the following will work::
make install
pip install darshan
Or::
python setup.py install
For alternative installation instructions and installation from source refer to <docs/install.rst>
Testing
......
Notes on how to release a new version of PyDarshan
2020-06
-----------------------
- Ensure python dev environment with dev dependencies, if not already present
(- python3 -m venv venv)
(- source venv/bin/activate)
(- pip install -r requirements_dev.txt # deps for packaging, testing, and docs generation)
- Make sure documentation in docs/ is up to date
- commit
- make docs
- upload docs/build/html contents to /mcs/web/research/projects/darshan/docs
- (might eventually connect this with readthedocs to have this automatically uploaded)
- Update CHANGELOG.rst
- commit
- Update version numbers in:
setup.py
setup.cfg
darshan/__init__.py
- Run tests with tox against different python versions
- make test
(may extent to test-all which would run against different python verions
(flake8 syntax warnings can be ignored)
- TODO: CI?
- Submit to PyPi using twine:
- make wheels # requires docker, creates ./wheelhouse and builds architecture-specific *.whl that include libdarshan-util
- make dist # gathers relevant wheels build earlier, adds non-binary wheel and a source distribution (zip/tgz)
- make release # pushes contents of ./dist/* to PyPi
(be prompted for username/password)
- Add/update spack package: py-darshan
- add version entry
- add hash of release tar.gz from pypi (because that one should always exist) / or use mcs darshan mirror
- check if new dependencies are required (compare to requirements.txt)
- submit as pull request to https://github.com/spack/spack
- Announce:
- Regular Darshan Release: Copy release notes for PyDarshan and attach as seperate section (mailinglist, website/blog)
- PyDarshan only: Post release note section of PyDarshan (mailinglist, website/blog)
Note on Versionsformat:
Whenever libdarshan-utils has a version change, PyDarshan is bumped accordingly.
4th position version numbers allows for PyDarshan to be on a faster release cycle
# -*- coding: utf-8 -*-
"""Top-level package for pydarshan."""
__version__ = '0.1.0'
__version__ = '0.0.6'
__darshanutil_version__ = '3.2.1'
import logging
logger = logging.getLogger(__name__)
options = {
......
if __name__ == "__main__":
import darshan.cli
darshan.cli.main()
from .cli import main
main()
......@@ -6,6 +6,9 @@ import ctypes
import numpy as np
import pandas as pd
import logging
logger = logging.getLogger(__name__)
from darshan.api_def_c import load_darshan_header
from darshan.discover_darshan import find_utils
......@@ -20,10 +23,39 @@ ffi.cdef(API_def_c)
libdutil = None
libdutil = find_utils(ffi, libdutil)
check_version(ffi, libdutil)
_structdefs = {
"BG/Q": "struct darshan_bgq_record **",
"DXT_MPIIO": "struct dxt_file_record **",
"DXT_POSIX": "struct dxt_file_record **",
"H5F": "struct darshan_hdf5_file **",
"H5D": "struct darshan_hdf5_dataset **",
"LUSTRE": "struct darshan_lustre_record **",
"MPI-IO": "struct darshan_mpiio_file **",
"PNETCDF": "struct darshan_pnetcdf_file **",
"POSIX": "struct darshan_posix_file **",
"STDIO": "struct darshan_stdio_file **",
}
def get_lib_version():
"""
Return the version information hardcoded into the shared library.
Args:
None
Return:
version (str): library version number
"""
ver = ffi.new("char **")
ver = libdutil.darshan_log_get_lib_version()
version = ffi.string(ver).decode("utf-8")
return version
def log_open(filename):
......@@ -217,89 +249,31 @@ def log_lookup_name_records(log, ids=[]):
def log_get_dxt_record(log, mod_name, mod_type, reads=True, writes=True, mode='dict'):
def log_get_record(log, mod, dtype='numpy'):
"""
Returns a dictionary holding a dxt darshan log record.
Standard entry point fetch records via mod string.
Args:
log: Handle returned by darshan.open
mod_name (str): Name of the Darshan module
mod_type (str): String containing the C type
Return:
dict: generic log record
Example:
The typical darshan log record provides two arrays, on for integer counters
and one for floating point counters:
>>> darshan.log_get_dxt_record(log, "DXT_POSIX", "struct dxt_file_record **")
{'rank': 0, 'read_count': 11, 'read_segments': array([...]), ...}
log record of type dtype
"""
modules = log_get_modules(log)
#name_records = log_get_name_records(log)
rec = {}
buf = ffi.new("void **")
r = libdutil.darshan_log_get_record(log['handle'], modules[mod_name]['idx'], buf)
if r < 1:
return None
filerec = ffi.cast(mod_type, buf)
clst = []
rec['id'] = filerec[0].base_rec.id
rec['rank'] = filerec[0].base_rec.rank
rec['hostname'] = ffi.string(filerec[0].hostname).decode("utf-8")
#rec['filename'] = name_records[rec['id']]
wcnt = filerec[0].write_count
rcnt = filerec[0].read_count
rec['write_count'] = wcnt
rec['read_count'] = rcnt
rec['write_segments'] = []
rec['read_segments'] = []
size_of = ffi.sizeof("struct dxt_file_record")
segments = ffi.cast("struct segment_info *", buf[0] + size_of )
for i in range(wcnt):
seg = {
"offset": segments[i].offset,
"length": segments[i].length,
"start_time": segments[i].start_time,
"end_time": segments[i].end_time
}
rec['write_segments'].append(seg)
for i in range(rcnt):
i = i + wcnt
seg = {
"offset": segments[i].offset,
"length": segments[i].length,
"start_time": segments[i].start_time,
"end_time": segments[i].end_time
}
rec['read_segments'].append(seg)
if mode == "pandas":
rec['read_segments'] = pd.DataFrame(rec['read_segments'])
rec['write_segments'] = pd.DataFrame(rec['write_segments'])
if mod in ['LUSTRE']:
rec = _log_get_lustre_record(log, dtype=dtype)
elif mod in ['DXT_POSIX', 'DXT_MPIIO']:
rec = log_get_dxt_record(log, mod, _structdefs[mod], dtype=dtype)
else:
rec = log_get_generic_record(log, mod, _structdefs[mod], dtype=dtype)
return rec
def log_get_generic_record(log, mod_name, mod_type, mode='numpy'):
def log_get_generic_record(log, mod_name, mod_type, dtype='numpy'):
"""
Returns a dictionary holding a generic darshan log record.
......@@ -344,14 +318,35 @@ def log_get_generic_record(log, mod_name, mod_type, mode='numpy'):
rec['fcounters'] = np.array(flst, dtype=np.float64)
fcdict = dict(zip(fcounter_names(mod_name), rec['fcounters']))
if mode == "dict":
rec = {'counters': cdict, 'fcounter': fcdict}
if dtype == "dict":
rec.update({
'counters': cdict,
'fcounters': fcdict
})
if dtype == "pandas":
df_c = pd.DataFrame(cdict, index=[0])
df_fc = pd.DataFrame(fcdict, index=[0])
# flip column order (to prepend id and rank)
df_c = df_c[df_c.columns[::-1]]
df_fc = df_fc[df_fc.columns[::-1]]
# attach id and rank to counters and fcounters
df_c['id'] = rec['id']
df_c['rank'] = rec['rank']
df_fc['id'] = rec['id']
df_fc['rank'] = rec['rank']
# flip column order
df_c = df_c[df_c.columns[::-1]]
df_fc = df_fc[df_fc.columns[::-1]]
if mode == "pandas":
rec = {
'counters': pd.DataFrame(cdict, index=[0]),
'fcounters': pd.DataFrame(fcdict, index=[0])
}
rec.update({
'counters': df_c,
'fcounters': df_fc
})
return rec
......@@ -414,35 +409,7 @@ def fcounter_names(mod_name):
return counter_names(mod_name, fcnts=True)
def log_get_bgq_record(log):
"""
Returns a darshan log record for BG/Q.
Args:
log: handle returned by darshan.open
"""
return log_get_generic_record(log, "BG/Q", "struct darshan_bgq_record **")
def log_get_hdf5_file_record(log):
"""
Returns a darshan log record for an HDF5 file.
Args:
log: handle returned by darshan.open
"""
return log_get_generic_record(log, "H5F", "struct darshan_hdf5_file **")
def log_get_hdf5_dataset_record(log):
"""
Returns a darshan log record for an HDF5 dataset.
Args:
log: handle returned by darshan.open
"""
return log_get_generic_record(log, "H5D", "struct darshan_hdf5_dataset **")
def log_get_lustre_record(log):
def _log_get_lustre_record(log, dtype='numpy'):
"""
Returns a darshan log record for Lustre.
......@@ -465,76 +432,128 @@ def log_get_lustre_record(log):
for i in range(0, len(rbuf[0].counters)):
clst.append(rbuf[0].counters[i])
rec['counters'] = np.array(clst, dtype=np.int64)
# counters
cdict = dict(zip(counter_names('LUSTRE'), rec['counters']))
# FIXME
# ost_ids
sizeof_64 = ffi.sizeof("int64_t")
sizeof_base = ffi.sizeof("struct darshan_base_record")
offset = sizeof_base + sizeof_64 * len(rbuf[0].counters)
offset = int(offset/sizeof_64)
ost_ids = ffi.cast("int64_t *", rbuf[0])
ostlst = []
for i in range(0, cdict['LUSTRE_STRIPE_WIDTH']):
print(rbuf[0].ost_ids[i])
for i in range(offset, cdict['LUSTRE_STRIPE_WIDTH']+offset):
ostlst.append(ost_ids[i])
rec['ost_ids'] = np.array(ostlst, dtype=np.int64)
print(rec['ost_ids'])
sys.exit()
if mode == "dict":
rec = {'counters': cdict, 'fcounter': fcdict}
# dtype conversion
if dtype == "dict":
rec.update({
'counters': cdict,
'ost_ids': ostlst
})
if mode == "pandas":
rec = {
'counters': pd.DataFrame(cdict, index=[0]),
'fcounters': pd.DataFrame(fcdict, index=[0])
}
if dtype == "pandas":
df_c = pd.DataFrame(cdict, index=[0])
# prepend id and rank
df_c = df_c[df_c.columns[::-1]] # flip colum order
df_c['id'] = rec['id']
df_c['rank'] = rec['rank']
df_c = df_c[df_c.columns[::-1]] # flip back
rec.update({
'counters': df_c,
'ost_ids': pd.DataFrame(rec['ost_ids'], columns=['ost_ids']),
})
return rec
def log_get_mpiio_record(log):
def log_get_dxt_record(log, mod_name, mod_type, reads=True, writes=True, dtype='dict'):
"""
Returns a darshan log record for MPI-IO.
Returns a dictionary holding a dxt darshan log record.
Args:
log: handle returned by darshan.open
log: Handle returned by darshan.open
mod_name (str): Name of the Darshan module
mod_type (str): String containing the C type
Returns:
dict: log record
"""
return log_get_generic_record(log, "MPI-IO", "struct darshan_mpiio_file **")
Return:
dict: generic log record
Example:
def log_get_pnetcdf_record(log):
"""
Returns a darshan log record for PnetCDF.
The typical darshan log record provides two arrays, on for integer counters
and one for floating point counters:
>>> darshan.log_get_dxt_record(log, "DXT_POSIX", "struct dxt_file_record **")
{'rank': 0, 'read_count': 11, 'read_segments': array([...]), ...}
Args:
log: handle returned by darshan.open
Returns:
dict: log record
"""
return log_get_generic_record(log, "PNETCDF", "struct darshan_pnetcdf_file **")
modules = log_get_modules(log)
#name_records = log_get_name_records(log)
def log_get_posix_record(log):
"""
Returns a darshan log record for
rec = {}
buf = ffi.new("void **")
r = libdutil.darshan_log_get_record(log['handle'], modules[mod_name]['idx'], buf)
if r < 1:
return None
filerec = ffi.cast(mod_type, buf)
clst = []
Args:
log: handle returned by darshan.open
rec['id'] = filerec[0].base_rec.id
rec['rank'] = filerec[0].base_rec.rank
rec['hostname'] = ffi.string(filerec[0].hostname).decode("utf-8")
#rec['filename'] = name_records[rec['id']]
Returns:
dict: log record
"""
return log_get_generic_record(log, "POSIX", "struct darshan_posix_file **")
wcnt = filerec[0].write_count
rcnt = filerec[0].read_count
rec['write_count'] = wcnt
rec['read_count'] = rcnt
rec['write_segments'] = []
rec['read_segments'] = []
size_of = ffi.sizeof("struct dxt_file_record")
segments = ffi.cast("struct segment_info *", buf[0] + size_of )
for i in range(wcnt):
seg = {
"offset": segments[i].offset,
"length": segments[i].length,
"start_time": segments[i].start_time,
"end_time": segments[i].end_time
}
rec['write_segments'].append(seg)
for i in range(rcnt):
i = i + wcnt
seg = {
"offset": segments[i].offset,
"length": segments[i].length,
"start_time": segments[i].start_time,
"end_time": segments[i].end_time
}
rec['read_segments'].append(seg)
if dtype == "pandas":
rec['read_segments'] = pd.DataFrame(rec['read_segments'])
rec['write_segments'] = pd.DataFrame(rec['write_segments'])