Commit 70332624 authored by Michael Salim's avatar Michael Salim
Browse files

Merge remote-tracking branch 'origin/develop'

parents c3e0845e badd48f6
.*
!.gitignore
*.pyc
exe
log
argojobs
balsamjobs
argobalsam_env
db.sqlite3
argo/migrations
balsam/migrations
balsam/argo/migrations/*.py
!balsam/argo/migrations/__init__.py
balsam/service/migrations/*.py
!balsam/service/migrations/__init__.py
experiments
docs/_build/*
docs/_static/*
*.egg-info
default_balsamdb
testdb
nohup.out
** INSTALLATION
# install virtualenv
https://pypi.python.org/pypi/virtualenv
# set up an installation directory
export BASEDIR=/path/to/installation
# set up virtual env
mkdir $BASEDIR
cd $BASEDIR
virtualenv balsam_env
. balsam_env/bin/activate
# install prerequisites
pip install django
pip install south
pip install pika
# install balsam_core
# (extract the tar file in some directory other than $BASEDIR)
tar xzf balsam_core.tgz
cd balsam_core
python setup.py install --prefix=$BASEDIR/balsam_env
cd $BASEDIR
django-admin.py startproject balsam_deploy
#cp settings.py $BASEDIR/balsam_deploy/balsam_deploy
# edit $BASEDIR/balsam_deploy/balsam_deploy/settings.py
# - set the database filename
# - add balsam_core and south to the INSTALLED_APPLICATIONS
# - add 'from argo_settings import *' at end
# edit argo_settings.py and change the following:
- in the DATABASES section, set the NAME field to the full path to an sqlite3 file, preferably in $BASEDIR/balsam_deploy
- set BALSAM_DEPLOYMENT_DIRECTORY to $BASEDIR/balsam_deploy
- set BALSAM_WORK_DIRECTORY. A work directory will be created here for each job, and the job will be run in this directory.
- set BALSAM_SCHEDULER_SUBMIT_EXE, BALSAM_SCHEDULER_STATUS_EXE appropriately
- set BALSAM_GLOBUS_URL_COPY_EXE, BALSAM_GRID_PROXY_INIT_EXE appropriately
- set BALSAM_ALLOWED_EXECUTABLE_DIRECTORY. Only executables from this directory will be executed.
cd balsam_deploy
python manage.py syncdb --noinput
** START BALSAM SERVICE
The service fetches jobs from the messages queues and adds them to the database, and updates the job status in the database periodically. It operates on a period defined by BALSAM_FETCH_DELAY in settings.py.
. balsam_env/bin/activate
cd $BASEDIR/balsam_deploy
python manage.py balsam_service
** START BALSAM DAEMON
The daemon queries the local database for jobs to be run and manages them over their lifetime. It operates on a period defined by BALSAM_EXECUTION_DELAY in settings.py.
. balsam_env/bin/activate
cd $BASEDIR/balsam_deploy
python manage.py balsam_daemon
** ADD TEST JOB TO MESSAGE QUEUE
. balsam_env/bin/activate
cd $BASEDIR/rabbitmq
./newjob testjob
** INTEGRATION POINTS
- in settings.py, BALSAM_SCHEDULER_SUBMIT_EXE, BALSAM_SCHEDULER_STATUS_EXE identify the qsub and qstat executables, respectively
- in settings.py, BALSAM_GLOBUS_URL_COPY_EXE, BALSAM_GRID_PROXY_INIT_EXE identify the globus-url-copy and grid-proxy-init executables, respectively
- balsam_env/lib/python2.6/site-packages/balsam_core/scheduler.py is where qsub and qstat are called
- balsam_env/lib/python2.6/site-packages/balsam_core/management/commands/balsam_service.py is where qstat output is parsed for updating job status
- the queue for job submission is hard-coded in scheduler.py
include README.md
include LICENSE.md
include docs
# HPC Edge Service
**Authors:** J. Taylor Childers (Argonne National Laboratory), Tom Uram (Argonne National Laboratory), Doug Benjamin (Duke University)
An HPC Edge Service to manage remote job submission. The goal of this service is to provide a secure interface for submitting jobs to large computing resources.
# HPC Edge Service and Workflow Management System
**Authors:** J. Taylor Childers (Argonne National Laboratory), Tom Uram (Argonne National Laboratory), Doug Benjamin (Duke University), Misha Salim (Argonne National Laboratory)
# Prerequisites
This Edge Service uses [RabbitMQ](https://www.rabbitmq.com/) to communicate between the outside (Argo) and inside (Balsam) services. This service must be running on an accessible server machine to use this Edge Service.
The user is responsible for providing an environment with Python 3.6 and mpi4py, because the installation is
system-dependent.
## Prerequisites on Cooley
An easy approach is to use Anaconda:
```
soft add +anaconda
conda config --add channels intel
conda create --name balsam intelpython3_full python=3
source activate balsam
```
On Cooley, mpi4py just works with this environment.
The following instructions assume the appopriate environment for Balsam is set-up and loaded!
# Installation
# Check out the latest release of Balsam
```
git clone git@github.com:hep-cce/hpc-edge-service.git
git clone git@xgitlab.cels.anl.gov:turam/hpc-edge-service.git
cd hpc-edge-service
virtualenv argobalsam_env
source argobalsam_env/bin/activate
pip install pip --upgrade
pip install django
pip install pika
pip install future
export ARGOBALSAM_INSTALL_PATH=$PWD
mkdir log argojobs balsamjobs exe
git checkout release0.1
```
# Install Balsam
```
pip install -e .
```
# Try it out!
The launcher pulls jobs from the database and invokes MPI to run the jobs.
To try it out interactively, grab a couple nodes on Cooley:
```
qsub -A datascience -n 2 -q debug -t 30 -I
soft add +anaconda
source activate balsam
```
# Configure Databases
You can find many settings to change. There are Django specific settings in `argobalsam/settings.py` and Edge Service settings in `user_settings.py`.
The **balsam** command-line tool will have been added to your path.
There are a number of commands to try:
```
balsam --help
balsam ls --help
balsam ls # no jobs in DB yet
```
To create and initialize the default sqlite3 database without password protections do:
Now let's create a couple dummy jobs and see them listed in
the database:
```
./manage.py makemigrations argo
./manage.py makemigrations balsam
./manage.py migrate
./manage -h
balsam qsub "echo hello world" --name hello -t 0
balsam make_dummies 2
balsam ls --hist
```
Finally, run the launcher. Useful log messages will be sent to the log/ directory in real time.
You can change the verbosity, and many other Balsam runtime parameters, in balsam/user_settings.py
```
balsam launcher --consume --time 0.5 # run for 30 seconds
balsam ls --hist # jobs are now done
balsam rm jobs --all
```
# Run a comprehensive test suite
The **balsam-test** command line tool invokes tests in the tests/ directory
You can run specific tests by passing the test module names, or run all of
them just by calling **balsam-test** with no arguments.
```
balsam-test tests.test_dag # this should be quick
balsam-test # the test_functional module might take over 10 minutes!
```
import common.Serializer as Serializer
import balsam.common.Serializer as Serializer
class ArgoJobStatus:
def __init__(self):
......@@ -13,4 +13,4 @@ class ArgoJobStatus:
def get_from_message(message):
tmp = ArgoJobStatus()
tmp.__dict__ = Serializer.deserialize(message)
return tmp
\ No newline at end of file
return tmp
......@@ -6,9 +6,9 @@ from django.db.utils import load_backend
from django.conf import settings
from common import MessageReceiver
from argo import QueueMessage
from argo.models import ArgoJob,ArgoSubJob,BALSAM_JOB_TO_SUBJOB_STATE_MAP
from balsam import BalsamJobStatus,models
from balsam.argo import QueueMessage
from balsam.argo.models import ArgoJob,ArgoSubJob,BALSAM_JOB_TO_SUBJOB_STATE_MAP
from balsam.service import BalsamJobStatus,models
class JobStatusReceiver(MessageReceiver.MessageReceiver):
''' subscribes to the balsam job status queue and updates a job state '''
......@@ -58,7 +58,7 @@ class JobStatusReceiver(MessageReceiver.MessageReceiver):
# get the argo job for this subjob
try:
argojob = ArgoJob.objects.get(job_id=subjob.job_id)
argojob = ArgoJob.objects.get(job_id=subjob.job_id) # BUG !
except Exception as e:
logger.error(' exception received while retrieving ArgoJob with id = ' + str(subjob.job_id + ': ' + str(e)))
# acknoledge message
......@@ -72,9 +72,9 @@ class JobStatusReceiver(MessageReceiver.MessageReceiver):
# get the deserialized balsam job
try:
balsam_job = models.BalsamJob()
statusMsg.get_job(balsam_job)
statusMsg.get_job(balsam_job) # statusMsg.serialzed_job gets loaded into balsam_job
logger.debug('balsam_job = ' + str(balsam_job))
except BalsamJobStatus.DeserializeFailed,e:
except BalsamJobStatus.DeserializeFailed as e:
logger.error('Failed to deserialize BalsamJob for BalsamJobStatus message for argojob: ' + str(argojob.job_id) + ' subjob_id: ' + str(subjob.job_id))
# acknoledge message
channel.basic_ack(method_frame.delivery_tag)
......@@ -84,7 +84,8 @@ class JobStatusReceiver(MessageReceiver.MessageReceiver):
QueueMessage.JobStatusReceiverRetrieveArgoJobFailed))
return
# parse balsam_job into subjob and argojob
# parse balsam_job (just received from balsam, new status) into
# subjob and argojob (need to be synced)
if balsam_job is not None:
# copy scheduler id to subjob
......@@ -98,7 +99,7 @@ class JobStatusReceiver(MessageReceiver.MessageReceiver):
try:
argojob.state = BALSAM_JOB_TO_SUBJOB_STATE_MAP[balsam_job.state].name
logger.debug(' receieved subjob state = ' + subjob.state + ' setting argo job state to ' + argojob.state)
except KeyError,e:
except KeyError as e:
logger.error(' could not map balsam_job state: ' + str(balsam_job.state) + ' to an ArgoJob state for job id: ' + str(argojob.job_id))
# acknoledge message
channel.basic_ack(method_frame.delivery_tag)
......
......@@ -5,9 +5,9 @@ from django.db import connections,DEFAULT_DB_ALIAS
from django.db.utils import load_backend
from django.conf import settings
from argo import models,QueueMessage
from common import db_tools
from common import MessageReceiver,Serializer
from balsam.argo import models,QueueMessage
from balsam.common import db_tools
from balsam.common import MessageReceiver,Serializer
def CreateWorkingPath(job_id):
path = os.path.join(settings.ARGO_WORK_DIRECTORY,str(job_id))
......
......@@ -4,4 +4,4 @@ from django.apps import AppConfig
class ArgoCoreConfig(AppConfig):
name = 'argo'
name = 'balsam.argo'
......@@ -18,7 +18,7 @@ import warnings
from django import forms
from django.forms.widgets import CheckboxInput
from argo import models
from balsam.argo import models
import logging
logger = logging.getLogger(__name__)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment