1. 03 May, 2018 1 commit
  2. 27 Apr, 2018 1 commit
    • Paul Rich's avatar
      Extending timeout on test_excessive_reboots to allow progress · e2193660
      Paul Rich authored
      The initial setting of 1 second is too close to the update interval,
      making this test fail when there is no actual problem due to jitter
      between threads.  Extending this to 3 seconds should allow this to
      reliably progress through the excessive reboot failure sequence.
      e2193660
  3. 09 Apr, 2018 2 commits
  4. 26 Mar, 2018 1 commit
    • Eric Pershey's avatar
      Added instrumented logging to many of the components. · b9de5d5e
      Eric Pershey authored and Eric Pershey's avatar Eric Pershey committed
      Wrapped Proxy errors with sanatize password.
      Cleaned up logging spelling mistakes.
      Added extract_traceback to format the exceptions
      Added get_current_thread_identifier to help identify threads
      Added get_caller to help identify the caller.
      Added cray_messaging to aid in debugging and allow importing.
      Fixed a bug that would allow retrying of a fork.
      Added debugging tools of slp_hammer.py and slp_nail.py
      Updated the simulator to simulate background tasks.
      b9de5d5e
  5. 23 Mar, 2018 1 commit
  6. 15 Mar, 2018 1 commit
  7. 05 Mar, 2018 2 commits
  8. 26 Feb, 2018 1 commit
  9. 15 Feb, 2018 2 commits
  10. 14 Feb, 2018 1 commit
  11. 07 Feb, 2018 1 commit
  12. 02 Feb, 2018 3 commits
    • Paul Rich's avatar
      Fixing docstrings and remooving defunct variable · 281a9c68
      Paul Rich authored
      queue is no longer needed for cleanup, and the docstrings should
      reflect the changed behavior.
      281a9c68
    • Paul Rich's avatar
      Bringing in tests for reservatins + disjoint queues · 3c00ba6f
      Paul Rich authored
      These are the test cases for the bug that precipitated this hunt.  These
      should largely be degenerate with the other single-equivalence cases
      (because they are now single-equivalence cases), but do things like
      make sure that drains are getting correctly cleared and times are
      getting correctly reset.
      3c00ba6f
    • Paul Rich's avatar
      Refactor find_queue_equivalence_classes and drain clear code · 2cda012d
      Paul Rich authored
      After discussions, the current find_queue_equivalence_classes for this
      system really only complicates the codebase for very little actual gain.
      After this, the system will have only one equivalence class at all times
      consisting of all active queues assigned to nodes and all active
      reservations.
      
      This simplification allows us to ensure that find_job_location only gets
      called twice, once for reservations, which ignore drain times, and then
      immediately after for the normal "production" queue jobs, which do set
      drain times.  In both cases we can just clear drain times across the
      machine.
      
      In addition to testing (and more tests coming for the case that caused
      this examination to begin with), we know that this works, as any system
      with a queue or set of overlapping queues across all resources on the
      machine forms a single equivalence class under the old code.
      2cda012d
  13. 24 Jan, 2018 1 commit
  14. 03 Jan, 2018 1 commit
  15. 13 Nov, 2017 1 commit
  16. 03 Oct, 2017 1 commit
  17. 21 Sep, 2017 1 commit
  18. 02 Aug, 2017 1 commit
  19. 03 Jul, 2017 2 commits
  20. 17 Apr, 2017 1 commit
  21. 07 Apr, 2017 1 commit
  22. 24 Jan, 2017 2 commits
  23. 08 Dec, 2016 1 commit
    • Paul Rich's avatar
      Commiting work to date on backfiller revisions for BG/Q · d0a678e6
      Paul Rich authored
      After discussion the current algorithm for determining backfill time
      needs to be replaced and needs to depend on which blocks are selected
      for draining.  This is a commit for the current algorithm's optimistic
      and pessimistic backfill modes.
      d0a678e6
  24. 06 Dec, 2016 1 commit
  25. 23 Nov, 2016 3 commits
  26. 08 Nov, 2016 6 commits