1. 11 Apr, 2017 1 commit
  2. 04 Jan, 2017 1 commit
    • Paul Rich's avatar
      Fix for nodes getting hung up in cleanup-pending state · 5f751a1a
      Paul Rich authored
      A well (or poorly depending on how you look at it) qdel could cause
      Cobalt to put a node into cleanup but never complete the cleanup due to
      there being no ALPS backend reservation to clean up.  This would clear
      if there were no jobs currently running, however, it would hang nodes
      otherwise.
      5f751a1a
  3. 23 Nov, 2016 2 commits
  4. 22 Nov, 2016 1 commit
  5. 15 Nov, 2016 1 commit
  6. 11 Nov, 2016 1 commit
  7. 03 Nov, 2016 1 commit
  8. 06 Oct, 2016 1 commit
  9. 26 Sep, 2016 2 commits
  10. 23 Sep, 2016 1 commit
  11. 19 Sep, 2016 1 commit
  12. 16 Sep, 2016 1 commit
    • Paul Rich's avatar
      Draining and backfilling basics operational. · 7856d38b
      Paul Rich authored
      Draining and backfilling are passing basic tests.  Need to add more test
      cases to the automated suite and test corner cases around
      queues/reservations/locations list.
      
      Also need to add backfill time display to nodelist/nodeadm -l.
      7856d38b
  13. 14 Sep, 2016 1 commit
  14. 13 Sep, 2016 3 commits
  15. 08 Sep, 2016 1 commit
  16. 07 Sep, 2016 1 commit
  17. 01 Sep, 2016 2 commits
  18. 24 Aug, 2016 5 commits
  19. 15 Aug, 2016 1 commit
  20. 11 Aug, 2016 2 commits
    • Paul Rich's avatar
      Fix for the aggressive cleanup · 06c5d122
      Paul Rich authored
      The apid fetch wasn't restricting itself to the actual ALPS reservation.
      This was causing everything to get killed.
      06c5d122
    • Paul Rich's avatar
      Fixed error in recovering pgroups. · d9595cc8
      Paul Rich authored
      System component restart on the fly should be safe again.  We recover
      the process groups properly now.  Found this while testing other changes
      in the fix for aggressive cleanup.
      d9595cc8
  21. 08 Aug, 2016 1 commit
  22. 06 Aug, 2016 1 commit
  23. 03 Aug, 2016 2 commits
  24. 01 Aug, 2016 1 commit
  25. 31 Jul, 2016 1 commit
    • Paul Rich's avatar
      Admin down was not getting properly detected. · 93155c72
      Paul Rich authored
      Update node state was resetting an admin down.  Added an additional flag
      so we can differentiate between admin down and hardware down.
      
      If a node is marked down with an admin command, then no matter what, it
      will remain marked down.
      93155c72
  26. 29 Jul, 2016 1 commit
  27. 27 Jul, 2016 1 commit
    • Paul Rich's avatar
      apkill support added · 0fcbb56e
      Paul Rich authored
      Support for apkill added to kill user alps instnace in interactive jobs.
      Kachina testing pending.
      0fcbb56e
  28. 18 Jul, 2016 1 commit
    • Paul Rich's avatar
      Interactive cleanup now working. · 6af7cebf
      Paul Rich authored
      Resources for interactive jobs are now appropriately released.  There is
      still a known issue with currently running aprun instances.  That will
      be addressed in a further patch.
      6af7cebf
  29. 06 Jul, 2016 1 commit