1. 12 Dec, 2018 3 commits
    • Valentin Reis's avatar
    • Valentin Reis's avatar
      [fix] Sending a raw 'kill' message to upstream when `cmd` receives SIGINT · 5a3b9fc8
      Valentin Reis authored
      `cmd` now sends a container kill message to the upstream api and exits
      whenever it receives SIGINT, via C-c for instance.
      5a3b9fc8
    • Valentin Reis's avatar
      [Feature] Adds configuration management and environment variables · 25443c64
      Valentin Reis authored
      This commit adds a command-line interface to `daemon`:
      ```
      usage: daemon [-h] [-c FILE] [-d] [--nrm_log NRM_LOG] [--hwloc HWLOC]
                    [--argo_nodeos_config ARGO_NODEOS_CONFIG] [--perf PERF]
                    [--argo_perf_wrapper ARGO_PERF_WRAPPER]
      
      optional arguments:
        -h, --help            show this help message and exit
        -c FILE, --configuration FILE
                              Specify a config json-formatted config file to
                              override any of the available CLI options. If an
                              option is actually provided on the command-line, it
                              overrides its corresponding value from the
                              configuration file.
        -d, --print_defaults  Print the default configuration file.
        --nrm_log NRM_LOG     Main log file. Override default with the NRM_LOG.
                              environment variable
        --hwloc HWLOC         Path to the hwloc to use. This path can be relative
                              and makes uses of the $PATH if necessary. Override
                              default with the HWLOC environment variable.
        --argo_nodeos_config ARGO_NODEOS_CONFIG
                              Path to the argo_nodeos_config to use. This path can
                              be relative and makes uses of the $PATH if necessary.
                              Override default with the ARGO_NODEOS_CONFIG
                              environment variable.
        --perf PERF           Path to the linux perf tool to use. This path can be
                              relative and makes uses of the $PATH if necessary.
                              Override default with the PERF environment variable.
        --argo_perf_wrapper ARGO_PERF_WRAPPER
                              Path to the linux perf tool to use. This path can be
                              relative and makes uses of the $PATH if necessary.
                              Override default with the PERFWRAPPER environment
                              variable.
      ```
      25443c64
  2. 10 Dec, 2018 1 commit
    • Valentin Reis's avatar
      Small work session with Swann: · 5b550e0b
      Valentin Reis authored
      - added correct SIGINT/process ending handling to cmd
      - fixed  kill/list containers
      - added ZMQ_LINGER 0 to the socket options.
      5b550e0b
  3. 28 Nov, 2018 4 commits
    • Swann Perarnau's avatar
      [feature/fix] add listen command for upstream pub · e626053c
      Swann Perarnau authored
      Add a listen command to get access to the event stream of the upstream
      pub/sub API.
      
      This patch gives back access from the command line to the power
      information of a container, including filtering the event stream to only
      have events relevent to this container.
      
      This changes the workflow a little bit for users, but should result in a
      cleaner access to profiling data in the future.
      
      Related to #18.
      e626053c
    • Swann Perarnau's avatar
      [refactor] move container start/exit to up_pub · f81b95e0
      Swann Perarnau authored
      Move the container start/exit events to the upstream pub/sub event
      stream. As these are more of a global event now that we support multiple
      commands in the same container, it makes sense to move them to the more
      general event stream.
      
      This patch also remove the code in cmd waiting for container start or
      exit, making (temporarily) the cmd unable to report power metrics. We
      will fix that in a later commit.
      
      This patch fixes complicated issues we had with how a job running
      multiple commands in the container might not all wait for the end of the
      container: now none of them do.
      f81b95e0
    • Swann Perarnau's avatar
      [refactor/fix] always send process events for run · 6e0c1e7a
      Swann Perarnau authored
      Current code sends start/exit events when a container is created and
      process_start/process_exit when its already there. Instead, have the
      container start/exit only care about container stuff, and always sends
      the process start/exit events around. That makes the cmd run fsm easier
      to work out.
      
      Changes the message format a tiny bit.
      Fixes some missing stdout/stderr issues we had before.
      6e0c1e7a
    • Swann Perarnau's avatar
      [fix] have command provide a default uuid · 2344824c
      Swann Perarnau authored
      Previous merges let the cmd send an empty container uuid, resulting in
      some issues when the user doesn't provide one. Restore the previous
      behavior.
      2344824c
  4. 21 Oct, 2018 1 commit
    • Swann Perarnau's avatar
      [refactor] replace upstream comms with msg layer · 0b0ab966
      Swann Perarnau authored
      Replace the fragile upstream communications with the new messaging
      layer, improving the stability and performance of this API.
      
      NOTE: this breaks previous clients
      NOTE: this patch is missing client tracking, to handle children signals.
      0b0ab966
  5. 17 Oct, 2018 1 commit
    • Sridutt Bhalachandra's avatar
      [Feature] Multi- node and process support · 5a41baba
      Sridutt Bhalachandra authored
      Added multi- node and process support that will allow launching of
      multiple processes within a container. This is important for enabling
      use of NRM with MPI applications with multiple processes in a container
      and thus enabling multi-node executions
      
      See Issue #17
      5a41baba
  6. 19 Dec, 2017 4 commits
  7. 18 Dec, 2017 1 commit
    • Swann Perarnau's avatar
      [refactor] daemon should always bind on sockets · 1391a197
      Swann Perarnau authored
      The way 0MQ works on PUB/SUB sockets, publishers might drop
      messages if subscribers are not detected faster enough. One way to fix
      it is to have the "server" always bind sockets, and the "client" use
      connect. This way, the handshake is initiated properly, and the client
      can publish as soon as the connection is done.
      
      This patch makes the daemon bind on the upstream API and the CLI connect,
      fixing in the process the message dropping we were experiencing before.
      
      Long term, we might have a think of using 2 types of sockets for the
      upstream API: pub/sub for actual events published from the daemon, and
      a REQ/REP or ROUTER/DEALER pair for "commands".
      1391a197
  8. 15 Dec, 2017 1 commit
    • Swann Perarnau's avatar
      [feature] Properly handle run events in order · 957deb8d
      Swann Perarnau authored
      This patch implements a small finite state machine on the cmd side to be
      able to run a command, wait for all of its output, and then exit.
      
      As the daemon can send those message in any order, we need to wait them
      properly, in particular the closing of stdout/stderr before exiting.
      
      This patch also fixes the read_until_close callback creation to ensure
      that the stream EOF is handled as a distinct message.
      957deb8d
  9. 14 Dec, 2017 4 commits
    • Swann Perarnau's avatar
      [feature] Add stdout/stderr streaming · 78f63cd4
      Swann Perarnau authored
      This patch adds stdout/stderr streaming capabilities, based on partial
      evaluation of a tornado.iostream callback. The bin/cmd CLI is updated to
      wait until an exit message, although that doesn't guaranty anything on
      message ordering...
      
      The next step is obviously to figure out a message flow that allows the
      CLI to send and receive the command IO properly, in order...
      78f63cd4
    • Swann Perarnau's avatar
      [fix] Fix missing logging changes · ed33ef6d
      Swann Perarnau authored
      The logging improvement patch missed a few calls.
      ed33ef6d
    • Swann Perarnau's avatar
      [style] pep8 and other small style fixes · 3054b894
      Swann Perarnau authored
      Remove unused import, commas at the end of dictionaries.
      3054b894
    • Swann Perarnau's avatar
      [refactor] Use globally configured logger · b66c88ec
      Swann Perarnau authored
      The logging module allow us to configure logging facilities once per
      process using basicConfig, and then to use globally defined, named,
      logger objects. This simplifies access to logger objects, their
      configuration and remove pointers from all objects.
      
      This patch refactor all the logging calls to use a single 'nrm' logger
      object, using those facilities.
      b66c88ec
  10. 13 Dec, 2017 2 commits
    • Swann Perarnau's avatar
      [feature] Add kill command · 63c2dea8
      Swann Perarnau authored
      This patch adds a command to kill the parent process of a container
      based on the container uuid, triggering the death of the container.
      
      The os.kill command interacts pretty badly with the custom built
      children handling, causing us to catch unwanted exceptions in an effort
      to keep the code running. The waitpid code was also missing a bit about
      catching children exiting because of signals, so we fixed that.
      
      At this point, two things should be paid attention to:
        - we don't distinguish properly between a container and a command.
        This will probably cause issues later, as it should be possible to
        launch multiple programs in the same container, and for partitions to
        survive the death of the parent process.
        - the message format is growing more complex, but without any
        component having strong ownership over it. This will probably cause
        stability issues in the long term, as the format complexifies and we
        lose track of the fields expected from everyone.
      63c2dea8
    • Swann Perarnau's avatar
      [feature] Add command to list containers · 2f470afb
      Swann Perarnau authored
      This patch adds a very simple command to list the containers currently
      known by the NRM. There's no history or state tracking on the NRM, so
      the code is pretty simple.
      
      We expect that some of the container tracking doesn't need to be sent
      for such a command, so the listing also filters some of the fields.
      
      This patch also adds an 'event' field to container messages, as it would
      probably be needed further for other kind of operations.
      2f470afb
  11. 11 Dec, 2017 4 commits
    • Swann Perarnau's avatar
      [feature] Pull the Argus code into the NRM · 92290b22
      Swann Perarnau authored
      The Argus (globalos) launcher had prototype code to read a container
      manifest, create a container using Judi's code, and map resources using
      hwloc.
      
      This patch brings that code, almost intact, into the NRM repo. This code
      is quite ugly, and the resource mapping crashes if the kernel
      configuration isn't right. But it's still a good starting point, and we
      should be able to improve things little by little.
      
      One part in particular needs attention: SIGCHLD handling. We should
      think of using ioloop-provided facilities to avoid this mess.
      
      The patch also contains the associated CLI changes.
      
      Note: the messaging format is starting to be difficult to keep in check,
      as there's conversions and field checks all over the code. See #3 for
      a possible solution.
      92290b22
    • Swann Perarnau's avatar
      [feature] Add container run skeleton · 5f6f9415
      Swann Perarnau authored
      This is the first step in a series of patches to integrate the container
      launching code from Argus (globalos) into the NRM infrastructure.
      
      This patch creates a valid command on the CLI, and sends the necessary
      info to the NRM. We still need to take care of the actual container
      creation.
      
      Note that the CLI waits for an event indicating that the container was
      launched, at that at this point the event is never generated by the NRM.
      5f6f9415
    • Swann Perarnau's avatar
      [refactor] Improve power message format · 63db906e
      Swann Perarnau authored
      This commit changes the message format for the upstream API, to use a
      json-encoded dictionary. While the format is not set in stone at this
      point, the goal is to slowly move into a proper protocol, with
      well-defined fields to the messages, and proper mechanisms to send
      commands and receive notification of their completion.
      
      The only current user of this API is the power management piece, and
      this change breaks the GRM code maintained outside of this repo. We will
      need to reconcile the two implementation once the message protocol gets
      more stable.
      
      Related to #1 and #6.
      63db906e
    • Swann Perarnau's avatar
      [feature] Implement basic CLI · bc1b7fd2
      Swann Perarnau authored
      Only supports setpower for now, and while it should work in theory, the
      current code doesn't have a way to check if the command was received, as
      the daemon never advertise the current limit.
      
      We need to change the protocol at this point.
      
      This also fixes a bug in the daemon code, that was expecting a single
      string as a message, instead of a list of parts, as zmqstream always
      receives.
      bc1b7fd2
  12. 29 Aug, 2017 1 commit
  13. 25 Apr, 2017 1 commit
    • Swann Perarnau's avatar
      [refactor] Python rewrite of the software · 86409f88
      Swann Perarnau authored
      We chose to rewrite the entire thing in python. The language should make
      it easy to interact will all the moving parts of the Argo landscape, and
      easy to prototype various control schemes.
      
      The communication protocol is exactly the same, but implemented with
      ZeroMQ + tornado.
      
      Power readings are not integrated yet, we are targeting using the Coolr
      project for that.
      
      This is a rough draft, all the code is in binary scripts instead of
      the package, and there are no unit tests. Nevertheless, it should be
      a decent starting point for future development.
      86409f88