Commit e2e99594 authored by Paul Rich's avatar Paul Rich

Fix for draining flap.

A node was getting added to the candidate list twice when looking over
currently running jobs. resulting in a duplicate node and an inflated
count which resulted in an early break.  When this list is later
converted to a set, the duplicates are removed and results in a set
of nodes too small for the job to use for draining, so the job doesn't
get drained for.

Test case added for the drain flap now works correctly now that a check
has been added so a duplicate node doesn't get added.
parent 76b052c7
......@@ -1133,8 +1133,8 @@ class CraySystem(BaseSystem):
# 1. idle nodes that are already marked for draining.
# 2. Nodes that are in an in-use status (busy, allocated).
# 3. Nodes marked for cleanup that are not allocated to a real
# jobid. CLEANING_ID is a sentiel jobid value so we can set
# a drain window on cleaning nodes easiliy. Not sure if this
# jobid. CLEANING_ID is a sentinel jobid value so we can set
# a drain window on cleaning nodes easily. Not sure if this
# is the right thing to do. --PMR
candidate_list = []
candidate_list = [nid for nid in node_id_list
......@@ -1158,8 +1158,9 @@ class CraySystem(BaseSystem):
if (self.nodes[str(nid)].status != 'down' and
self.nodes[str(nid)].managed):
self.nodes[str(nid)].set_drain(loc_time[1], job['jobid'])
# It's a list not a set, need to ensure uniqueness
candidate_list.extend([nid for nid in running_nodes if
self.nodes[str(nid)].draining])
self.nodes[str(nid)].draining and nid not in candidate_list])
candidate_drain_time = int(loc_time[1])
if len(candidate_list) >= int(job['nodes']):
# Enough nodes have been found to drain for this job
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment