- 09 Sep, 2016 6 commits
-
-
Paul Rich authored
Conflicts: src/lib/Components/system/CraySystem.py
-
Paul Rich authored
Resolve "Doc Update for Cray Systems" Closes #32 See merge request !17
-
Paul Rich authored
-
Paul Rich authored
Found a bad default path for apbasil. This is now something that should work on a default Cray install.
-
Paul Rich authored
This was data that needed to be added to the manpages for Cray systems.
-
Paul Rich authored
This should effectively end the startup race condition This should get rid of the bulk of the 1234567 exit statuses. Forces a timeout. The timeout goes away when the job is started. This should fix the process group initilization/start gap. Closes #31 See merge request !16
-
- 08 Sep, 2016 1 commit
-
-
Paul Rich authored
This should get rid of the bulk of the 1234567 exit statuses. Forces a timeout. The timeout goes away when the job is started. This should fix the process group initilization/start gap.
-
- 07 Sep, 2016 1 commit
-
-
Paul Rich authored
Checking in fixes for find queue equivalence classes that impact draining. Drain-status-clear now working. Stub for drain selection.
-
- 01 Sep, 2016 6 commits
-
-
Paul Rich authored
-
Paul Rich authored
-
Paul Rich authored
Fixing nodelist so it shows active reservations Node list commands should show which nodes have an active reservation attached to them. Broke when going over to compact-notation. Closes #28 See merge request !15
-
Paul Rich authored
Node list commands should show which nodes have an active reservation attached to them. Broke when going over to compact-notation.
-
Paul Rich authored
Resolve "Nodeadm -l/nodelist fetch is slow" Closes #29 Major speedup achieved doing two things: 1) bypass the large amount of recursion in the XMLRPC marshaler by converting the data to send as a json string (this is a flag as to whether or not you want dictionary data in this form) 2) Added a parameter restriction so you can request only specific fields. Used in nodeadm -l and nodelist to reduce data being sent. See merge request !14
-
Paul Rich authored
Mocked up by replicating node data in testing.
-
- 29 Aug, 2016 1 commit
-
-
Paul Rich authored
-
- 24 Aug, 2016 10 commits
-
-
Paul Rich authored
Resolve "Reservation-location interaction" Closes #25 This resolves a number of issues with _assemble_queue_data with types and methods of reservation avoidance. It also fixes issues with reservations and --attrs location= being used together See merge request !13
-
Paul Rich authored
Make sure that an update cannot change this list midflight for a job. Caller also holds this lock at this point in time. The node_lock must be reentrant!
-
Paul Rich authored
-
Paul Rich authored
Thanks Eric! Duplicated nids are now avoided.
-
Paul Rich authored
-
Paul Rich authored
Resolve "Setres Slow" Closes #27 Now doing a single call to verify_locations instead of one per node requested. See merge request !12
-
Paul Rich authored
Reducing the calls to verify locations should significantly speed up setres on Cray systems.
-
Paul Rich authored
Fixes attrs location evading cobalt admin down on nodes.
-
Paul Rich authored
Non-idle nodes are now fully respected. Consistiently get string nid lists out of this. ValueError doesn't get raised if the attrs location exists stradling a reservation (still in the queue, but not available due to the reservation).
-
Paul Rich authored
-
- 23 Aug, 2016 1 commit
-
-
Paul Rich authored
-
- 22 Aug, 2016 1 commit
-
-
Paul Rich authored
Including this until a later upgrade can be done that has Cobalt directly send a SIGKILL 5 minutes after we try to signal an aprun at cleanup.
-
- 15 Aug, 2016 2 commits
- 11 Aug, 2016 2 commits
-
-
Paul Rich authored
The apid fetch wasn't restricting itself to the actual ALPS reservation. This was causing everything to get killed.
-
Paul Rich authored
System component restart on the fly should be safe again. We recover the process groups properly now. Found this while testing other changes in the fix for aggressive cleanup.
-
- 08 Aug, 2016 1 commit
-
-
Paul Rich authored
Fixing a situation where locations when set in a resrvation job causes issues.
-
- 06 Aug, 2016 1 commit
-
-
Paul Rich authored
There was one further step needed for running jobs. Also fixing a potential statefile issue with prior versions.
-
- 03 Aug, 2016 2 commits
- 01 Aug, 2016 5 commits