1. 29 Jul, 2019 1 commit
  2. 24 Jul, 2019 1 commit
    • Paul Rich's avatar
      Lock doesn't need to be held around this state fetch · d658914d
      Paul Rich authored
      This is to improve Cobalt responsiveness during state update.  Combined
      with the fix for aig/cobalt#177, this should allow for much faster
      scheduling cadences as well and should significantly reduce the delay in
      a number of commands.
      
      There is an additional fix here where the check for a starting job
      during the cleanup update wasn't working.  This was due to a hidden type
      mismatch.  This was exposed by both speedups put together.
      d658914d
  3. 19 Dec, 2018 1 commit
  4. 29 Nov, 2018 1 commit
  5. 10 Sep, 2018 1 commit
  6. 26 Jun, 2018 1 commit
  7. 25 Jun, 2018 1 commit
  8. 21 Jun, 2018 1 commit
    • Paul Rich's avatar
      Prevent the sytstem component from trying to start a second PG on retry · 5f313b7c
      Paul Rich authored
      If we are retrying a process group startup, we needed to check to see if
      there was an existing process group from a past call to
      add_process_groups, where the execution happened, but the retrun failed,
      or from another badly timed exception in the queue-manager.
      
      If a process group has been added, this will retrieve the prior PG for a
      job.  It will not invoke the startup a second time unless the job was
      never actually started on the initial pass.
      5f313b7c
  9. 03 May, 2018 1 commit
  10. 02 May, 2018 1 commit
  11. 26 Apr, 2018 1 commit
  12. 10 Apr, 2018 1 commit
  13. 09 Apr, 2018 2 commits
  14. 26 Mar, 2018 1 commit
    • Eric Pershey's avatar
      Added instrumented logging to many of the components. · b9de5d5e
      Eric Pershey authored
      Wrapped Proxy errors with sanatize password.
      Cleaned up logging spelling mistakes.
      Added extract_traceback to format the exceptions
      Added get_current_thread_identifier to help identify threads
      Added get_caller to help identify the caller.
      Added cray_messaging to aid in debugging and allow importing.
      Fixed a bug that would allow retrying of a fork.
      Added debugging tools of slp_hammer.py and slp_nail.py
      Updated the simulator to simulate background tasks.
      b9de5d5e
  15. 15 Mar, 2018 1 commit
  16. 02 Mar, 2018 1 commit
  17. 14 Feb, 2018 2 commits
  18. 02 Feb, 2018 2 commits
    • Paul Rich's avatar
      Fixing docstrings and remooving defunct variable · 281a9c68
      Paul Rich authored
      queue is no longer needed for cleanup, and the docstrings should
      reflect the changed behavior.
      281a9c68
    • Paul Rich's avatar
      Refactor find_queue_equivalence_classes and drain clear code · 2cda012d
      Paul Rich authored
      After discussions, the current find_queue_equivalence_classes for this
      system really only complicates the codebase for very little actual gain.
      After this, the system will have only one equivalence class at all times
      consisting of all active queues assigned to nodes and all active
      reservations.
      
      This simplification allows us to ensure that find_job_location only gets
      called twice, once for reservations, which ignore drain times, and then
      immediately after for the normal "production" queue jobs, which do set
      drain times.  In both cases we can just clear drain times across the
      machine.
      
      In addition to testing (and more tests coming for the case that caused
      this examination to begin with), we know that this works, as any system
      with a queue or set of overlapping queues across all resources on the
      machine forms a single equivalence class under the old code.
      2cda012d
  19. 31 Jan, 2018 1 commit
  20. 06 Nov, 2017 1 commit
  21. 03 Nov, 2017 1 commit
  22. 03 Oct, 2017 1 commit
  23. 02 Oct, 2017 1 commit
  24. 21 Sep, 2017 2 commits
  25. 18 Sep, 2017 1 commit
  26. 14 Sep, 2017 1 commit
    • Paul Rich's avatar
      All forkers will no longer be incremented. Proper forker selected. · d66de8fd
      Paul Rich authored
      I don't even know how I missed this in preliminary testing.  The forker
      increment is now fixed when using multiple forkers.  Additionally, the
      first forker in the list was always getting used regardless of status,
      which kind of defeats the entire point of this patch.
      d66de8fd
  27. 13 Sep, 2017 1 commit
  28. 23 Aug, 2017 2 commits
  29. 18 Aug, 2017 3 commits
  30. 17 Aug, 2017 1 commit
  31. 16 Aug, 2017 1 commit
  32. 11 Aug, 2017 2 commits