- 03 May, 2018 1 commit
-
-
Paul Rich authored
-
- 27 Apr, 2018 1 commit
-
-
Paul Rich authored
The initial setting of 1 second is too close to the update interval, making this test fail when there is no actual problem due to jitter between threads. Extending this to 3 seconds should allow this to reliably progress through the excessive reboot failure sequence.
-
- 09 Apr, 2018 2 commits
-
-
Paul Rich authored
The reservation would have gotten marked as dying even if the apids hadn't been harvested. Also, the data for the BASIL check was not constructed correctly and while the right thing was happening, in that an appication_id with a non-BASIL cmd was being resturned, the test wasn't correct. It's been fixed now.
-
Paul Rich authored
-
- 26 Mar, 2018 1 commit
-
-
Wrapped Proxy errors with sanatize password. Cleaned up logging spelling mistakes. Added extract_traceback to format the exceptions Added get_current_thread_identifier to help identify threads Added get_caller to help identify the caller. Added cray_messaging to aid in debugging and allow importing. Fixed a bug that would allow retrying of a fork. Added debugging tools of slp_hammer.py and slp_nail.py Updated the simulator to simulate background tasks.
-
- 23 Mar, 2018 1 commit
-
-
Paul Rich authored
-
- 15 Mar, 2018 1 commit
-
-
Paul Rich authored
-
- 05 Mar, 2018 2 commits
- 26 Feb, 2018 1 commit
-
-
Paul Rich authored
-
- 15 Feb, 2018 2 commits
-
-
Eric Pershey authored
-
Paul Rich authored
-
- 14 Feb, 2018 1 commit
-
-
Paul Rich authored
-
- 07 Feb, 2018 1 commit
-
-
Paul Rich authored
-
- 02 Feb, 2018 3 commits
-
-
Paul Rich authored
queue is no longer needed for cleanup, and the docstrings should reflect the changed behavior.
-
Paul Rich authored
These are the test cases for the bug that precipitated this hunt. These should largely be degenerate with the other single-equivalence cases (because they are now single-equivalence cases), but do things like make sure that drains are getting correctly cleared and times are getting correctly reset.
-
Paul Rich authored
After discussions, the current find_queue_equivalence_classes for this system really only complicates the codebase for very little actual gain. After this, the system will have only one equivalence class at all times consisting of all active queues assigned to nodes and all active reservations. This simplification allows us to ensure that find_job_location only gets called twice, once for reservations, which ignore drain times, and then immediately after for the normal "production" queue jobs, which do set drain times. In both cases we can just clear drain times across the machine. In addition to testing (and more tests coming for the case that caused this examination to begin with), we know that this works, as any system with a queue or set of overlapping queues across all resources on the machine forms a single equivalence class under the old code.
-
- 24 Jan, 2018 1 commit
-
-
Paul Rich authored
-
- 03 Jan, 2018 1 commit
-
-
Paul Rich authored
-
- 13 Nov, 2017 1 commit
-
-
Paul Rich authored
-
- 03 Oct, 2017 1 commit
-
-
Paul Rich authored
-
- 21 Sep, 2017 1 commit
-
-
Paul Rich authored
-
- 02 Aug, 2017 1 commit
-
-
Paul Rich authored
-
- 03 Jul, 2017 2 commits
- 17 Apr, 2017 1 commit
-
-
Paul Rich authored
Attrs is showing up more often due to the code to allow filters and the validator to set attributes on a job.
-
- 07 Apr, 2017 1 commit
-
-
Paul Rich authored
start.
-
- 24 Jan, 2017 2 commits
- 08 Dec, 2016 1 commit
-
-
Paul Rich authored
After discussion the current algorithm for determining backfill time needs to be replaced and needs to depend on which blocks are selected for draining. This is a commit for the current algorithm's optimistic and pessimistic backfill modes.
-
- 06 Dec, 2016 1 commit
-
-
Paul Rich authored
New tests pending, but the optimistic mode backfiller does appear to be working properly. Old behavior is preserved and may be enabled by setting the mode to pessimistic.
-
- 23 Nov, 2016 3 commits
- 08 Nov, 2016 6 commits