- 11 Aug, 2017 3 commits
- 31 Jul, 2017 1 commit
-
-
Paul Rich authored
-
- 28 Jul, 2017 1 commit
-
-
Paul Rich authored
-
- 27 Jul, 2017 1 commit
-
-
Paul Rich authored
Slight bit of reorganization. Also added some things to make the presentation elsewhere in Cobalt kind of consistent with the rest of Cray's stack.
-
- 10 Jul, 2017 1 commit
-
-
Paul Rich authored
-
- 03 Jul, 2017 2 commits
-
-
Paul Rich authored
In light of this bug, adding checks to make sure that we don't end up accidentally adding in bad values to reservations again.
-
Paul Rich authored
This was traced to a call that could cause a non-string key to be added to the alps_reservation dictionary, resulting in a version of the reservation with an integer jobid key and a second with a string jobid key. These should be keyed with strings. Added as further mitigation a check to see if there is an integer version of a key to clean. If there is, then notify that it happened and clean that one, too. Triggering condition is an interactive job where the initial ALPS reservation times out.
-
- 30 Jun, 2017 1 commit
-
-
Paul Rich authored
-
- 27 Jun, 2017 2 commits
- 23 Jun, 2017 1 commit
-
-
Paul Rich authored
This reverts merge request !43
-
- 19 Jun, 2017 3 commits
- 08 Jun, 2017 1 commit
-
-
Paul Rich authored
-
- 18 May, 2017 1 commit
-
-
Paul Rich authored
-
- 01 May, 2017 1 commit
-
-
Benjamin Allen authored
Instead of enumerating all groups on the system and comparing, check members of specific queue groups. This change makes CQM compatible with sssd.conf: enumerate = False, and overall is doing less work.
-
- 14 Apr, 2017 1 commit
-
-
Paul Rich authored
-
- 13 Apr, 2017 2 commits
- 12 Apr, 2017 1 commit
-
-
Paul Rich authored
Adding in a better validtor to prevent issues with users typing bad NUMA/MCDRAM modes. Also, adding a default setting if none provided.
-
- 11 Apr, 2017 2 commits
- 10 Apr, 2017 1 commit
-
-
Paul Rich authored
Only set the timers/emit messages once, despite retry attempts.
-
- 07 Apr, 2017 1 commit
-
-
Paul Rich authored
start.
-
- 24 Jan, 2017 2 commits
- 11 Jan, 2017 1 commit
-
-
Paul Rich authored
-
- 05 Jan, 2017 1 commit
-
-
Paul Rich authored
-
- 04 Jan, 2017 1 commit
-
-
Paul Rich authored
A well (or poorly depending on how you look at it) qdel could cause Cobalt to put a node into cleanup but never complete the cleanup due to there being no ALPS backend reservation to clean up. This would clear if there were no jobs currently running, however, it would hang nodes otherwise.
-
- 08 Dec, 2016 2 commits
-
-
Paul Rich authored
If the child fetch succeeds but cleanup fails, make sure we use the intially fetched data, rahter than replacing it with the now potentially lost child data.
-
Paul Rich authored
After discussion the current algorithm for determining backfill time needs to be replaced and needs to depend on which blocks are selected for draining. This is a commit for the current algorithm's optimistic and pessimistic backfill modes.
-
- 06 Dec, 2016 1 commit
-
-
Paul Rich authored
New tests pending, but the optimistic mode backfiller does appear to be working properly. Old behavior is preserved and may be enabled by setting the mode to pessimistic.
-
- 05 Dec, 2016 1 commit
-
-
Paul Rich authored
Old versions of the forker do not have use_stdout_string that can casue _wait() to fail. Getting out of this would require deleting the statefile and restarting clean. To prevent that, the startup is being modified to add those key variables and initalizing them to being "unused".
-
- 02 Dec, 2016 1 commit
-
-
Paul Rich authored
-
- 28 Nov, 2016 3 commits