- 01 Sep, 2016 5 commits
-
-
Paul Rich authored
-
Paul Rich authored
Fixing nodelist so it shows active reservations Node list commands should show which nodes have an active reservation attached to them. Broke when going over to compact-notation. Closes #28 See merge request !15
-
Paul Rich authored
Node list commands should show which nodes have an active reservation attached to them. Broke when going over to compact-notation.
-
Paul Rich authored
Resolve "Nodeadm -l/nodelist fetch is slow" Closes #29 Major speedup achieved doing two things: 1) bypass the large amount of recursion in the XMLRPC marshaler by converting the data to send as a json string (this is a flag as to whether or not you want dictionary data in this form) 2) Added a parameter restriction so you can request only specific fields. Used in nodeadm -l and nodelist to reduce data being sent. See merge request !14
-
Paul Rich authored
Mocked up by replicating node data in testing.
-
- 29 Aug, 2016 1 commit
-
-
Paul Rich authored
-
- 24 Aug, 2016 10 commits
-
-
Paul Rich authored
Resolve "Reservation-location interaction" Closes #25 This resolves a number of issues with _assemble_queue_data with types and methods of reservation avoidance. It also fixes issues with reservations and --attrs location= being used together See merge request !13
-
Paul Rich authored
Make sure that an update cannot change this list midflight for a job. Caller also holds this lock at this point in time. The node_lock must be reentrant!
-
Paul Rich authored
-
Paul Rich authored
Thanks Eric! Duplicated nids are now avoided.
-
Paul Rich authored
-
Paul Rich authored
Resolve "Setres Slow" Closes #27 Now doing a single call to verify_locations instead of one per node requested. See merge request !12
-
Paul Rich authored
Reducing the calls to verify locations should significantly speed up setres on Cray systems.
-
Paul Rich authored
Fixes attrs location evading cobalt admin down on nodes.
-
Paul Rich authored
Non-idle nodes are now fully respected. Consistiently get string nid lists out of this. ValueError doesn't get raised if the attrs location exists stradling a reservation (still in the queue, but not available due to the reservation).
-
Paul Rich authored
-
- 23 Aug, 2016 1 commit
-
-
Paul Rich authored
-
- 22 Aug, 2016 1 commit
-
-
Paul Rich authored
Including this until a later upgrade can be done that has Cobalt directly send a SIGKILL 5 minutes after we try to signal an aprun at cleanup.
-
- 15 Aug, 2016 2 commits
- 11 Aug, 2016 2 commits
-
-
Paul Rich authored
The apid fetch wasn't restricting itself to the actual ALPS reservation. This was causing everything to get killed.
-
Paul Rich authored
System component restart on the fly should be safe again. We recover the process groups properly now. Found this while testing other changes in the fix for aggressive cleanup.
-
- 08 Aug, 2016 1 commit
-
-
Paul Rich authored
Fixing a situation where locations when set in a resrvation job causes issues.
-
- 06 Aug, 2016 1 commit
-
-
Paul Rich authored
There was one further step needed for running jobs. Also fixing a potential statefile issue with prior versions.
-
- 03 Aug, 2016 2 commits
- 01 Aug, 2016 5 commits
- 31 Jul, 2016 1 commit
-
-
Paul Rich authored
Update node state was resetting an admin down. Added an additional flag so we can differentiate between admin down and hardware down. If a node is marked down with an admin command, then no matter what, it will remain marked down.
-
- 29 Jul, 2016 1 commit
-
-
Paul Rich authored
-
- 27 Jul, 2016 1 commit
-
-
Paul Rich authored
Support for apkill added to kill user alps instnace in interactive jobs. Kachina testing pending.
-
- 18 Jul, 2016 1 commit
-
-
Paul Rich authored
Resources for interactive jobs are now appropriately released. There is still a known issue with currently running aprun instances. That will be addressed in a further patch.
-
- 06 Jul, 2016 1 commit
-
-
Paul Rich authored
-
- 24 Jun, 2016 2 commits
- 23 Jun, 2016 1 commit
-
-
Paul Rich authored
Rereservations were broken for long (>5 min) startups. This should allow the CAPMC scripts to do their thing.
-
- 17 Jun, 2016 1 commit
-
-
Paul Rich authored
logging.
-