1. 31 Jul, 2016 1 commit
    • Paul Rich's avatar
      Admin down was not getting properly detected. · 93155c72
      Paul Rich authored
      Update node state was resetting an admin down.  Added an additional flag
      so we can differentiate between admin down and hardware down.
      
      If a node is marked down with an admin command, then no matter what, it
      will remain marked down.
      93155c72
  2. 24 Jun, 2016 2 commits
  3. 23 Jun, 2016 1 commit
  4. 13 Jun, 2016 1 commit
  5. 10 Jun, 2016 1 commit
  6. 02 Jun, 2016 2 commits
    • Paul Rich's avatar
      Merge branch '19-singleton-queue' into 'master' · a1643dae
      Paul Rich authored
      maxtotaljobs limit added.
      
      This adds the limiter for maximum jobs overall running in queue.  Useful
      for profiling machines with noisy network environments.  This also adds
      output to cqadm for this information, and an entry in the cqadm manpage.
      
      See merge request !10
      a1643dae
    • Paul Rich's avatar
      maxtotaljobs limit added. · 26963d8b
      Paul Rich authored
      This adds the limiter for maximum jobs overall running in queue.  Useful
      for profiling machines with noisy network environments.  This also adds
      output to cqadm for this information, and an entry in the cqadm manpage.
      26963d8b
  7. 25 May, 2016 1 commit
  8. 24 May, 2016 1 commit
  9. 23 May, 2016 1 commit
  10. 20 May, 2016 1 commit
  11. 11 May, 2016 1 commit
  12. 09 May, 2016 1 commit
  13. 04 May, 2016 3 commits
  14. 03 May, 2016 1 commit
    • Paul Rich's avatar
      Fixed process group w/o process startup error · 8aeec73f
      Paul Rich authored
      On restart, if cobalt was shutdown abruptly (like with a power failure
      or a kill -9), there was a way to lose the forker child process of a
      process group.  The process group would never finish cleaning up, and
      the associated resources would keep being put into cleanup-pending by
      the reserve_resources_until code.
      
      Now the orphaned process group(s) are cleaned up automatically.  CQM
      jobs that reference these should get back an error stating that the
      underlying task no longer exists/cannot be found.
      
      This circumstance should be rare in production (I hope), but  I could
      see this scenario being triggered during abnormal operations (like a
      facility power/cooling failure).
      8aeec73f
  15. 26 Apr, 2016 1 commit
  16. 22 Apr, 2016 3 commits
  17. 21 Apr, 2016 1 commit
  18. 20 Apr, 2016 1 commit
    • Paul Rich's avatar
      Fix for negative nodes. · 923f6691
      Paul Rich authored
      There was bug that was counting active reservation nodes as 2 nodes for
      the purposes of determining how many nodes were left in the
      non-reservation queue.
      923f6691
  19. 19 Apr, 2016 2 commits
  20. 18 Apr, 2016 2 commits
  21. 14 Apr, 2016 2 commits
    • Paul Rich's avatar
      Interrim checkin of reservaion additions · 428a3b6f
      Paul Rich authored
      This is an interrim checkin of reservation handling.  This code is not
      yet functional.  Saving prior to branch switch.
      428a3b6f
    • Paul Rich's avatar
      Merge branch 'Fix-16-node-cleanup' into 'master' · 647c954c
      Paul Rich authored
      Fix 16 node cleanup
      
      Merging in a change to fix a bug where nodes were not cleaning up as long as any reservations were on the system.  This also fixes Cobalt ignoring node roles.  It will now only try to schedule on batch nodes.
      
      See merge request !6
      647c954c
  22. 13 Apr, 2016 1 commit
  23. 12 Apr, 2016 1 commit
  24. 08 Apr, 2016 2 commits
    • Paul Rich's avatar
      Merge branch 'Enh-14-use-system-reservednodes' into 'master' · e73f7cc1
      Paul Rich authored
      Enh 14 use system reservednodes
      
      Smaller system query added.  RESERVENODES support will be added in a later ticket.  Also has fixes from first encounters with Kachina.
      
      See merge request !5
      e73f7cc1
    • Paul Rich's avatar
      Now using smaller system query. · 6bb85c2d
      Paul Rich authored
      This should significantly reduce the overhead of the system inventory
      for updating state.  Also gets memory statuses.
      
      Need to add dynamic attribute update for running systems.
      6bb85c2d
  25. 07 Apr, 2016 1 commit
  26. 04 Apr, 2016 1 commit
  27. 01 Apr, 2016 1 commit
    • Paul Rich's avatar
      Fix for 1.7 issues when putting job together. · a349758c
      Paul Rich authored
      Cray's documentation on what depth and nppn do isn't all that clear.
      Apparenlty this arrangement will actually reserve proper numbers of
      nodes.
      
      Full allocation now works reliably.
      a349758c
  28. 29 Mar, 2016 1 commit
  29. 15 Mar, 2016 2 commits