1. 24 Aug, 2016 3 commits
  2. 08 Aug, 2016 1 commit
  3. 06 Aug, 2016 1 commit
  4. 03 Aug, 2016 2 commits
  5. 01 Aug, 2016 4 commits
  6. 31 Jul, 2016 1 commit
    • Paul Rich's avatar
      Admin down was not getting properly detected. · 93155c72
      Paul Rich authored
      Update node state was resetting an admin down.  Added an additional flag
      so we can differentiate between admin down and hardware down.
      
      If a node is marked down with an admin command, then no matter what, it
      will remain marked down.
      93155c72
  7. 29 Jul, 2016 1 commit
  8. 27 Jul, 2016 1 commit
    • Paul Rich's avatar
      apkill support added · 0fcbb56e
      Paul Rich authored
      Support for apkill added to kill user alps instnace in interactive jobs.
      Kachina testing pending.
      0fcbb56e
  9. 18 Jul, 2016 1 commit
    • Paul Rich's avatar
      Interactive cleanup now working. · 6af7cebf
      Paul Rich authored
      Resources for interactive jobs are now appropriately released.  There is
      still a known issue with currently running aprun instances.  That will
      be addressed in a further patch.
      6af7cebf
  10. 06 Jul, 2016 1 commit
  11. 24 Jun, 2016 2 commits
  12. 23 Jun, 2016 1 commit
  13. 13 Jun, 2016 1 commit
  14. 10 Jun, 2016 1 commit
  15. 02 Jun, 2016 2 commits
    • Paul Rich's avatar
      Merge branch '19-singleton-queue' into 'master' · a1643dae
      Paul Rich authored
      maxtotaljobs limit added.
      
      This adds the limiter for maximum jobs overall running in queue.  Useful
      for profiling machines with noisy network environments.  This also adds
      output to cqadm for this information, and an entry in the cqadm manpage.
      
      See merge request !10
      a1643dae
    • Paul Rich's avatar
      maxtotaljobs limit added. · 26963d8b
      Paul Rich authored
      This adds the limiter for maximum jobs overall running in queue.  Useful
      for profiling machines with noisy network environments.  This also adds
      output to cqadm for this information, and an entry in the cqadm manpage.
      26963d8b
  16. 25 May, 2016 1 commit
  17. 24 May, 2016 1 commit
  18. 23 May, 2016 1 commit
  19. 20 May, 2016 1 commit
  20. 11 May, 2016 1 commit
  21. 09 May, 2016 1 commit
  22. 04 May, 2016 3 commits
  23. 03 May, 2016 1 commit
    • Paul Rich's avatar
      Fixed process group w/o process startup error · 8aeec73f
      Paul Rich authored
      On restart, if cobalt was shutdown abruptly (like with a power failure
      or a kill -9), there was a way to lose the forker child process of a
      process group.  The process group would never finish cleaning up, and
      the associated resources would keep being put into cleanup-pending by
      the reserve_resources_until code.
      
      Now the orphaned process group(s) are cleaned up automatically.  CQM
      jobs that reference these should get back an error stating that the
      underlying task no longer exists/cannot be found.
      
      This circumstance should be rare in production (I hope), but  I could
      see this scenario being triggered during abnormal operations (like a
      facility power/cooling failure).
      8aeec73f
  24. 26 Apr, 2016 1 commit
  25. 22 Apr, 2016 3 commits
  26. 21 Apr, 2016 1 commit
  27. 20 Apr, 2016 1 commit
    • Paul Rich's avatar
      Fix for negative nodes. · 923f6691
      Paul Rich authored
      There was bug that was counting active reservation nodes as 2 nodes for
      the purposes of determining how many nodes were left in the
      non-reservation queue.
      923f6691
  28. 19 Apr, 2016 1 commit