Commit 6dff7632 authored by Paul Rich's avatar Paul Rich
Browse files

Merge branch 'cray-merge-doc' into 'develop'

Documentation changes for Cobalt Cray Port.

Documentation changes.  If approved, do not merge until we have a decision on what we want to do with history.  This will lose history otherwise.

See merge request !6
parents 03e7c0d9 675792d6
= Changes from previous Cobalt Versions =
== Changes to 1.0.6 ==
* Backfill now supported on Cray systems.
* Major performance improvement to nodelist and nodeadm -l.
* Fix for an error that can cause loss of job tracking during job startup
resulting in a reported exit status of "1234567".
* Fix for an error in system_script_forker that can cause an error loop
if using stdout to string functionality.
* Fix for an error that caused jobs with a "terminating" status to cause
qstat to fail.
== Changes to 1.0.5 ==
* Fix for an issue that would cause jobs with their location set to not
run if they were submitted to a reservation.
* Performance of setres improved significantly.
* Fixed an issue where reservations and soft-down status on nodes may not
be respected by job placement.
* An enhanced_apkill script is now available to improve job cleanup and
ensure that jobs that ignore signals are otherwise properly terminated.
== Changes to 1.0.4 ==
* Fix for apid not being properly checked when apkill was being invoked.
== Changes to 1.0.3 ==
* Fix for an issue for nodes coming out of disabled state and suddenly
appearing in ALPS.
* Interactive job support
* Support for CAPMC memory management scripts added. Cobalt now gracefully
handles long startup times if these are being used.
* Improved job cleanup using apkill.
== Changes to 1.0.2 ==
* Singleton job filter (maxtotaljobs) now supported
* Multiple alps_script_forkers running simultaneously is now supported. Forkers
are chosen for execution based on which forker has the lowest number of jobs
currently assigned to it.
== Changes to 1.0.0 ==
* Initial support of Cray systems
== Changes to 0.99.42 ==
* Update to database writer to handle attrs fields that are
improperly formatted.
......
# Cobalt: Component Based Lightweight Toolkit #
Cobalt is a scheduler and resource manager for HPC systems as well as general
clusters. Supported systems include generic x86 clusters, the IBM BlueGene
platform and Cray systems using ALPS. It uses a highly customizable and
extensible scheudler that allows for great flexibility in job priorities and
queueing policies.
......@@ -228,6 +228,14 @@ platforms, all jobs are run as script jobs.
.SH "NOTE"
The only thing printed to STDOUT is the jobid, any other error or informational messages are printed to STDERR.
.SS "Cray Systems"
On Cray systems, the "location" attribute may be specified by a comma-delimited
list of node ids. Runs of node ids may be compacted to a hyphenated, inclusive
pair, i.e. 1-4 would expand to 1, 2, 3, 4. All nodes specified in location must
exist on the system. This list format is compatible with the values returned by
Cray's
.BRcnselect (1)
command.
.SH "SEE ALSO"
.BR qstat (1),
.BR qdel (1),
......
......@@ -503,6 +503,59 @@ command. The default is 300 seconds.
.B bgtype
The type of BlueGene being run on. For BlueGene/Q this should be set to 'bgq'.
.SS "CRAY SECTIONS"
.SS "[alps]"
.TP
.B basil
The path to Cray's apbasil command. The default path is
/opt/cray/default/alps/bin/apbasil
.TP
.B apkill
The path to Cray's apkill command. The default path is
/opt/cray/alps/default/bin/apkill
.TP
.B default_depth
The default processors per node. This should be set to the number of KNL cores
on each node for XC40 systems. The default value is 72.
.SS [alpssystem]
.TP
.B pgroup_startup_timeout
The time to allow for process group startup in seconds. The default is 120
seconds.
.TP
.B save_me_interval
The minimum interval that Cobalt will wait between saving statefiles for this
component, in seconds. By default the interval is 10.0 seconds. Under periods
of high load on the component, the interval between statefiles may be longer.
.TP
.B temp_reservation_time
The default time for the temporary allocation reservation for starting jobs in
seconds. The default is 300 seconds.
.TP
.B update_thread_timeout
The polling interval for state updates from ALPS in seconds. The default is
10 seconds.
.SS [system]
.TP
.B backfill_epsillon
Set the amount of time to subtract from the remaining drain window, in seconds,
when placing backfill jobs. This allows time for cleanup for backfill jobs to
prior to the exit time of the job causing the drain to occur. The default is
120 seconds.
.TP
.B cleanup_drain_window
Set the draining time to set for nodes in cleanup statuses. The time is in
seconds. The default time is 300 seconds.
.TP
.B drain_mode
Set the draining algorithm to use. This may be
.I backfill
or
.I first-fit.
The default is
.I first-fit.
.SH "ENVIRONMENT"
......
......@@ -133,6 +133,9 @@ Send an email at the start and stop of every job run through the specified queue
.B maxrunning=x
The maximum number of jobs a user is allowed to have running in the queue.
.TP
.B maxtotaljobs=x
The maximum number of jobs a queue is allowed to run at once.
.TP
.B maxusernodes=x
The maximum number of nodes a user is allowed to have allocated with running jobs in the queue.
.TP
......
.TH "nodesadm" 8
.TH "nodeadm" 8
.SH "NAME"
nodeadm is the administrative interface for cluster systems.
.SH "SYNOPSIS"
.B nodeadm [-l] [--down part1 part2] [--up part1 part2]
.B nodeadm
.R [options] [list\ of\ nodes]
.SH "DESCRIPTION"
.TP
Allows one to mark resource as being down, or back up if there are schedulable again and it handles queue-resource associations.
Allows one to mark resource as being down, or back up if there are scheduleable
again and it handles queue-resource associations.
.SH "OPTIONS"
.TP
.B \-b\ [list\ of\ nodes]
.B [Cray Only]
Print detailed information for all nodes in the list of node ids.
This will accept hyphenated ranges as well. All ranges are inclusive.
.TP
.B \-d \-\-debug
Turn on communication debugging.
......@@ -22,17 +28,23 @@ Displays the usage and a brief descriptions of the options
.B \-\-version
Displays client revision and Cobalt version
.TP
.B \-\-down
mark nodes as down
.B \-\-down [list\ of\ nodes]
Mark nodes as down
.TP
.B \-\-up
mark nodes as up (even if allocated)
.B \-\-up\ [list\ of\ nodes]
Mark nodes as up. If the node is not in a usuable state or is allocated,
this may cause unexpected behavior.
.TP
.B \-\-queue
set queue associations
.B \-\-queue\ [queue1:queue2:...:queueN]\ [list\ of\ nodes]
Set queue associations. The list of queues to set on a node is ':'-delimited.
.TP
.B \-l \-\-list'
list node states
.SH "NOTES"
.SS "Cray Systems"
On Cray systems nodes are referenced by their integer node id. Nodes may be
specified as a comma-delimited list. Ranges of node id's may be compacted with a
hyphen to an inclusive range, i.e. 1-4 will expand to 1,2,3,4.
.SH "SEE ALSO"
.BR nodelist (1)
......
......@@ -11,7 +11,11 @@ nodelist
List resources on the cluster system.
.SH "OPTIONS"
.TP
.B \-b <list of node ids>
.B [Cray Only]
Print detailed information for all nodes in the list of node ids.
This will accept hyphenated ranges as well. All ranges are inclusive.
.TP
.B \-d \-\-debug
Turn on communication debugging.
......
......@@ -2,75 +2,125 @@
.SH "NAME"
setres \- Create or modify a cobalt scheduler reservation
.SH "SYNOPSIS"
.B setres [modify or create reservation options] partition1 [ partition2 ... partitionN ]
.B setres [id changing options]
.SH "DESCRIPTION"
.BR setres
\fB-n\ \fIname\fR\ [\fB-m\fR]\ [\fB-A\ \fIproject\fR]\ [\fB-c\fR]
[\fB--allow_passthrough\fR] [\fB--block_passthrough\fR]\ [\fB-D\fR]
[\fB-d \ \fIduration\fR]\ [\fB--debug\fR] [\fB-p \ \fIlocation\fR]
[\fB-q\ \fIqueue\fR]\ [\fB-s\ \fIstarttime\fR] [\fB-u\ \fIuser-list\fR]
[\fIreservation locations\fR]
.TP
This program creates or modifies a scheduler reservation.
.SH "OPTIONS TO MODIFY OR CREATE RESERVATION (partition arguments required)"
.B setres
[\fB--res_id\ \fIid\fR]\ [\fB--cycle_id\ \fIid\fR]\ [\fB--force\fR]
.TP
.B \-\-debug
Turn on communication debugging.
.BR setres\ -h
.TP
.B \-h \-\-help
Displays the usage and a brief descriptions of the options
.BR setres\ --version
.SH "DESCRIPTION"
Creates or modifies a scheduler reservation. Reservation and cycle ids may also
be reset.
.SH "OPTIONS"
.TP
.B \-\-version
Displays client revision and Cobalt version
.B \-A \-\-project \fIproject
Set project name to associate with the reservation.
.TP
.B \-A \-\-project
Set project name
.B \-\-allow_passthrough
Allow pass through connection on systems with interconnects that allow
passthrough communication.
.TP
.B \-D \-\-defer
Defer current (or next) iteration of recurring reservation (must be
used with -m)
.B \-\-block_passthrough
Block pass through connections on systems with interconnects that allow
passthrough communication.
.TP
.B \-c \-\-cycletime
.B \-c \-\-cycletime \fItime
Set the cycle time (in minutes or DD:HH:MM:SS). This is the amount of
time from reservation start until it is automatically renewed. This
can be used to create repeating reservations.
.TP
.B \-d \-\-duration
.B \-d \-\-duration \fIduration
Set duration (in minutes or HH:MM:SS)
.TP
.B \-m \-\-modify
Modify an existing reservation, specified with -n.
.B \-D \-\-defer
Defer current (or next) iteration of recurring reservation. This must be used
with the
.B -m
flag.
.TP
.B \-n \-\-name
Set reservation name
.B \-\-debug
Turn on communication debugging.
.TP
.B \-p \-\-partition
Set use partition. Now optional
.B \-h \-\-help
Displays the usage and a brief descriptions of the options
.TP
.B \-q \-\-queue
Set the queue name, if something other than the standard reservation naming convention is desired.
.B \-m \-\-modify
Modify an existing reservation. The target reservation specified with
.BR -n .
.TP
.B \-s \-\-starttime
Set start time (in format YYYY_MM_DD-HH:MM)
.B \-n \-\-name
Set reservation name. Names must be unique for all pending and active
reservations on a system.
.TP
.B \-p \-\-partition \fIlocation
Set the location to use for a reservation. This may be used instead of positional arguments
for locations. All locations in a reservation must exist and must
be managed by the system component at the time the reservation is set and active.
.TP
.B \-q \-\-queue \fIqueue
Set the queue name. Optional. Queues may already exist and have jobs in them.
Jobs currently running in a target queue will not be affected by applying a
reservation to the queue. Jobs that are queued in the target queue will not
start until the reservation becomes active. Jobs in a reservations against an
existing queue will be permitted to run on all nodes in that queue.
If this option isn't specified, a queue "R.name" will be created where name is the
reservation name specified by the
.B -n
argument.
.TP
.B \-s \-\-starttime \fIstarttime
Set start time (supported formats include YYYY-MM-DD-HH:MM or
YYYY_MM_DD-HH:MM). The \fIstarttime\fR may also be "now," which will set the
reservation starttime to the current time and the reservation will immediately
activate for the next scheduling iteration.
.TP
.B \-u \-\-user \fIuser-list
Set user(s) for reservation. Set to "*" for all users. User names may be
provided as a colon-delimited (:) list. User names must be valid on the node where
.BR setres (8)
is running.
.TP
.B \-u \-\-user
Set user for reservation. Set to "*" for all users.
.B \-\-version
Displays client revision and Cobalt version
.TP
.B \-\-allow_passthrough
Allow pass through connection
.B \-\-cycle_id \fIid
Set the integer cycle id. Without \-\-force_id this must be a larger value
than the current maximum cycle id. This may not be used with any option other than
\-\-force_id
.TP
.B \-\-block_passthrough
Block pass through connections
.B \-\-force_id
Only used with \-\-res_id or \-\-cycle_id options. Will force the id generator
to start with the specified value. Improper use of this option may cause
non-unique reservation ids and cycle ids to occur.
.TP
.B \-\-res_id \fIid
Set the integer reservation id. Without \-\-force this must be a larger value
than the current maximum reservation id. This may not be used with any option other than
\-\-force_id
.SH "ID CHANGING OPTIONS (no partition arguments)"
.SH "NOTES"
At a minimum all reservation creation requires use of the
.B -n, -s
and
.B -d
flags. Partions and nodes must be specified as positional arguments or via the
.B -p
flag.
.P
On Cray systems nodes are referenced by their integer node id. Nodes may be
specified as a comma-delimited list. Ranges of node id's may be compacted with a
hyphen to an inclusive range, i.e. 1-4 will expand to 1,2,3,4.
.TP
.B \-\-cycle_id
set cycle id
.TP
.B \-\-force_id
only used with \-\-res_id or \-\-cycle_id options
.TP
.B \-\-res_id
reservation id (int)
.SH "SEE ALSO"
.BR showres (1),
.BR releaseres (8)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment