- 08 Dec, 2016 1 commit
-
-
Paul Rich authored
If the child fetch succeeds but cleanup fails, make sure we use the intially fetched data, rahter than replacing it with the now potentially lost child data.
-
- 28 Nov, 2016 1 commit
-
-
Paul Rich authored
There was the possiblity of losign a PID if child cleanup was interrupted. This ensures retries until the child process is actually dead.
-
- 28 Oct, 2016 1 commit
-
-
Paul Rich authored
In the end the buffer size had to be increased to avoid timing issues. Added in further try-except safety checks to prevent system component issues if this runs long agian.
-
- 06 Oct, 2016 1 commit
-
-
Paul Rich authored
-
- 18 Jul, 2016 1 commit
-
-
Paul Rich authored
Resources for interactive jobs are now appropriately released. There is still a known issue with currently running aprun instances. That will be addressed in a further patch.
-
- 12 Apr, 2016 1 commit
-
-
Paul Rich authored
Resources weren't actually exititng the cleanup state when there were other resrervations on the system. The check to mark nodes idle was not actually ocurring when a reservation existed to mark nodes as idle..
-
- 08 Apr, 2016 1 commit
-
-
Paul Rich authored
This should significantly reduce the overhead of the system inventory for updating state. Also gets memory statuses. Need to add dynamic attribute update for running systems.
-
- 07 Apr, 2016 1 commit
-
-
Paul Rich authored
We can now get and properly display node attributes via the system type query.
-
- 01 Apr, 2016 1 commit
-
-
Paul Rich authored
Cray's documentation on what depth and nppn do isn't all that clear. Apparenlty this arrangement will actually reserve proper numbers of nodes. Full allocation now works reliably.
-
- 14 Mar, 2016 1 commit
-
-
Paul Rich authored
This query from ALPS allows for more detailed information about node memory to be gained from ALPS. Supported in versions later than 1.4.
-
- 10 Mar, 2016 1 commit
-
-
Paul Rich authored
-
- 04 Mar, 2016 1 commit
-
-
Paul Rich authored
This also necessitated adding in queues at a basic level. Muliple queues now supported. Orthogonal queueus appear to be working correctly. Nodeadm now lists nodes and can set queues on a list of nids.
-
- 18 Feb, 2016 1 commit
-
-
Paul Rich authored
This corrects a bug where we could have stale apbridge invocations lingering in the sytem script forker on system component intialization causing system state initialization to fail due to bad behavior on the part of the now-dead children.
-
- 17 Feb, 2016 1 commit
-
-
Paul Rich authored
This allows a number of constants to be set in the Cobalt config file. We can also now set attributes like width and depth from qsub using --attrs. This also adds the hooks for using attrs=location:xxxyyy for a resource reservation.
-
- 29 Jan, 2016 1 commit
-
-
Paul Rich authored
I've gotten a test job to run end to end and relase resources. ALPS reservation set confirmed released, node added to pool, run and released. Lots of cleanup is needed. Code is at prototype stage. Not production ready, but a lot closer to it.
-
- 20 Jan, 2016 1 commit
-
-
Paul Rich authored
A modified version of the user script forker is needed so that we can confirm the ALPS reservation and set the pg_id from the child. This will let apruns from a user script run.
-
- 18 Jan, 2016 1 commit
-
-
Paul Rich authored
Adds find job location and reserve resources until, along with helper functions. Tests pending. Draining and backfilling pending. Docs pending. Config options pending.
-
- 18 Dec, 2015 1 commit
-
-
Paul Rich authored
alpssystem can now be invoked as a system component. It will initialize query and update nodes. This adds in the udpate behavior for nodes. Persistence isn't supported yet and will be coming soon. This is currently using a base process manager.
-
- 25 Nov, 2015 1 commit
-
-
Paul Rich authored
Cray-specific node class added. Cray node manager added. Can initialize from the ALPS bridge. Core of alps bridge added.
-