1. 12 Apr, 2017 1 commit
  2. 11 Apr, 2017 1 commit
  3. 04 Jan, 2017 1 commit
    • Paul Rich's avatar
      Fix for nodes getting hung up in cleanup-pending state · 5f751a1a
      Paul Rich authored
      A well (or poorly depending on how you look at it) qdel could cause
      Cobalt to put a node into cleanup but never complete the cleanup due to
      there being no ALPS backend reservation to clean up.  This would clear
      if there were no jobs currently running, however, it would hang nodes
      otherwise.
      5f751a1a
  4. 23 Nov, 2016 2 commits
  5. 22 Nov, 2016 1 commit
  6. 15 Nov, 2016 1 commit
  7. 11 Nov, 2016 1 commit
  8. 03 Nov, 2016 1 commit
  9. 06 Oct, 2016 1 commit
  10. 26 Sep, 2016 2 commits
  11. 23 Sep, 2016 1 commit
  12. 19 Sep, 2016 1 commit
  13. 16 Sep, 2016 1 commit
    • Paul Rich's avatar
      Draining and backfilling basics operational. · 7856d38b
      Paul Rich authored
      Draining and backfilling are passing basic tests.  Need to add more test
      cases to the automated suite and test corner cases around
      queues/reservations/locations list.
      
      Also need to add backfill time display to nodelist/nodeadm -l.
      7856d38b
  14. 14 Sep, 2016 1 commit
  15. 13 Sep, 2016 3 commits
  16. 08 Sep, 2016 1 commit
  17. 07 Sep, 2016 1 commit
  18. 01 Sep, 2016 2 commits
  19. 24 Aug, 2016 5 commits
  20. 15 Aug, 2016 1 commit
  21. 11 Aug, 2016 2 commits
    • Paul Rich's avatar
      Fix for the aggressive cleanup · 06c5d122
      Paul Rich authored
      The apid fetch wasn't restricting itself to the actual ALPS reservation.
      This was causing everything to get killed.
      06c5d122
    • Paul Rich's avatar
      Fixed error in recovering pgroups. · d9595cc8
      Paul Rich authored
      System component restart on the fly should be safe again.  We recover
      the process groups properly now.  Found this while testing other changes
      in the fix for aggressive cleanup.
      d9595cc8
  22. 08 Aug, 2016 1 commit
  23. 06 Aug, 2016 1 commit
  24. 03 Aug, 2016 2 commits
  25. 01 Aug, 2016 1 commit
  26. 31 Jul, 2016 1 commit
    • Paul Rich's avatar
      Admin down was not getting properly detected. · 93155c72
      Paul Rich authored
      Update node state was resetting an admin down.  Added an additional flag
      so we can differentiate between admin down and hardware down.
      
      If a node is marked down with an admin command, then no matter what, it
      will remain marked down.
      93155c72
  27. 29 Jul, 2016 1 commit
  28. 27 Jul, 2016 1 commit
    • Paul Rich's avatar
      apkill support added · 0fcbb56e
      Paul Rich authored
      Support for apkill added to kill user alps instnace in interactive jobs.
      Kachina testing pending.
      0fcbb56e
  29. 18 Jul, 2016 1 commit
    • Paul Rich's avatar
      Interactive cleanup now working. · 6af7cebf
      Paul Rich authored
      Resources for interactive jobs are now appropriately released.  There is
      still a known issue with currently running aprun instances.  That will
      be addressed in a further patch.
      6af7cebf