1. 02 Feb, 2018 1 commit
    • Paul Rich's avatar
      Refactor find_queue_equivalence_classes and drain clear code · 2cda012d
      Paul Rich authored
      After discussions, the current find_queue_equivalence_classes for this
      system really only complicates the codebase for very little actual gain.
      After this, the system will have only one equivalence class at all times
      consisting of all active queues assigned to nodes and all active
      reservations.
      
      This simplification allows us to ensure that find_job_location only gets
      called twice, once for reservations, which ignore drain times, and then
      immediately after for the normal "production" queue jobs, which do set
      drain times.  In both cases we can just clear drain times across the
      machine.
      
      In addition to testing (and more tests coming for the case that caused
      this examination to begin with), we know that this works, as any system
      with a queue or set of overlapping queues across all resources on the
      machine forms a single equivalence class under the old code.
      2cda012d
  2. 31 Jan, 2018 1 commit
  3. 06 Nov, 2017 1 commit
  4. 18 Sep, 2017 1 commit
  5. 13 Sep, 2017 1 commit
  6. 23 Aug, 2017 1 commit
  7. 18 Aug, 2017 2 commits
  8. 16 Aug, 2017 1 commit
  9. 11 Aug, 2017 2 commits
  10. 03 Aug, 2017 1 commit
  11. 28 Jul, 2017 1 commit
  12. 10 Jul, 2017 1 commit
  13. 03 Jul, 2017 2 commits
    • Paul Rich's avatar
      Adding in test cases for _ALPS_reserve_resources · bedcb951
      Paul Rich authored
      In light of this bug, adding checks to make sure that we don't end up
      accidentally adding in bad values to reservations again.
      bedcb951
    • Paul Rich's avatar
      Fix for double-reservation entry · d72c6774
      Paul Rich authored
      This was traced to a call that could cause a non-string key to be added
      to the alps_reservation dictionary, resulting in a version of the
      reservation with an integer jobid key and a second with a string jobid
      key.  These should be keyed with strings.
      
      Added as further mitigation a check to see if there is an integer
      version of a key to clean.  If there is, then notify that it happened
      and clean that one, too.
      
      Triggering condition is an interactive job where the initial ALPS
      reservation times out.
      d72c6774
  14. 30 Jun, 2017 1 commit
  15. 27 Jun, 2017 2 commits
  16. 23 Jun, 2017 1 commit
  17. 19 Jun, 2017 1 commit
  18. 14 Apr, 2017 1 commit
  19. 13 Apr, 2017 2 commits
  20. 12 Apr, 2017 1 commit
  21. 11 Apr, 2017 1 commit
  22. 04 Jan, 2017 1 commit
    • Paul Rich's avatar
      Fix for nodes getting hung up in cleanup-pending state · 5f751a1a
      Paul Rich authored
      A well (or poorly depending on how you look at it) qdel could cause
      Cobalt to put a node into cleanup but never complete the cleanup due to
      there being no ALPS backend reservation to clean up.  This would clear
      if there were no jobs currently running, however, it would hang nodes
      otherwise.
      5f751a1a
  23. 23 Nov, 2016 2 commits
  24. 22 Nov, 2016 1 commit
  25. 15 Nov, 2016 1 commit
  26. 11 Nov, 2016 1 commit
  27. 03 Nov, 2016 1 commit
  28. 06 Oct, 2016 1 commit
  29. 26 Sep, 2016 2 commits
  30. 23 Sep, 2016 1 commit
  31. 19 Sep, 2016 1 commit
  32. 16 Sep, 2016 1 commit
    • Paul Rich's avatar
      Draining and backfilling basics operational. · 7856d38b
      Paul Rich authored
      Draining and backfilling are passing basic tests.  Need to add more test
      cases to the automated suite and test corner cases around
      queues/reservations/locations list.
      
      Also need to add backfill time display to nodelist/nodeadm -l.
      7856d38b
  33. 14 Sep, 2016 1 commit