- 04 Jan, 2017 1 commit
-
-
Paul Rich authored
A well (or poorly depending on how you look at it) qdel could cause Cobalt to put a node into cleanup but never complete the cleanup due to there being no ALPS backend reservation to clean up. This would clear if there were no jobs currently running, however, it would hang nodes otherwise.
-
- 09 Dec, 2016 5 commits
-
-
Paul Rich authored
-
Paul Rich authored
Adding in build files for the RESERVATION_SUMMARY view for CDB Adding in templates and build script for the reservation summary view from Gabe West. Closes #52 See merge request !24
-
Paul Rich authored
Qsub path can now be specified for eLogin qsubs. This prevents us from getting a unwrapped qsub.py or the qsub being in a different location on the mom from the eLogin host when using qsub -I from a eLogin node on Cray systems. Closes #54 See merge request !25
-
Paul Rich authored
This prevents us from getting a unwrapped qsub.py or the qsub being in a different location on the mom from the eLogin host when using qsub -I from a eLogin node on Cray systems.
-
Paul Rich authored
-
- 08 Dec, 2016 5 commits
-
-
Paul Rich authored
-
Paul Rich authored
Fixing a possilbe pid leak if 'apbasil' task cleanup interrupted There was the possiblity of losign a PID if child cleanup was interrupted. This ensures retries until the child process is actually dead. Closes #46 See merge request !21
-
Paul Rich authored
If the child fetch succeeds but cleanup fails, make sure we use the intially fetched data, rahter than replacing it with the now potentially lost child data.
-
Paul Rich authored
-
Paul Rich authored
Resolve "Restart failure with system_script_forker using old statefile" Closes #53 See merge request !22
-
- 05 Dec, 2016 1 commit
-
-
Paul Rich authored
Old versions of the forker do not have use_stdout_string that can casue _wait() to fail. Getting out of this would require deleting the statefile and restarting clean. To prevent that, the startup is being modified to add those key variables and initalizing them to being "unused".
-
- 28 Nov, 2016 6 commits
-
-
Paul Rich authored
There was the possiblity of losign a PID if child cleanup was interrupted. This ensures retries until the child process is actually dead.
-
Paul Rich authored
Fixing possible race condition, reported in ticket #45. Adding a final "read the stdout redirect" after exit status collected. This addresses a potential issue spotted in the CRAY port work merge. Closes #45 See merge request !20
-
Paul Rich authored
-
Paul Rich authored
-
Paul Rich authored
-
Paul Rich authored
Adding a final "read the stdout redirect" after exit status collected. This addresses a potential issue spotted in the CRAY port work merge.
-
- 23 Nov, 2016 7 commits
- 22 Nov, 2016 1 commit
-
-
Paul Rich authored
-
- 18 Nov, 2016 2 commits
- 16 Nov, 2016 2 commits
- 15 Nov, 2016 2 commits
- 11 Nov, 2016 2 commits
- 09 Nov, 2016 1 commit
-
-
Paul Rich authored
Conflicts: CHANGES man/setres.8 setup.py src/lib/Components/base.py src/lib/Components/base_forker.py src/lib/Components/system/AlpsBridge.py src/lib/Components/system/CraySystem.py src/lib/client_utils.py testsuite/TestCobalt/TestComponents/test_cray.py
-
- 08 Nov, 2016 5 commits