1. 17 Jun, 2019 1 commit
  2. 13 Jun, 2019 1 commit
  3. 13 May, 2019 1 commit
    • Valentin Reis's avatar
      [feature] moves the message formats to json schema. · cd1d86b3
      Valentin Reis authored
      Adds the nrm/schemas repository which defines the communication schemas
      for the upstream and downstream APIs. The messaging.py file now uses
      decorators and two added python dependencies (jsonschema and warlock).
      This commits also adds the .envrc direnv configuration file for
      nix-based development.
      cd1d86b3
  4. 03 May, 2019 2 commits
    • Swann Perarnau's avatar
      [fix] export downstream_event_uri in container · 0ad81e0b
      Swann Perarnau authored
      If monitoring is active, make sure to export the
      ARGO_NRM_DOWNSTREAM_EVENT_URI variable required by libnrm.
      
      Fix #49.
      0ad81e0b
    • Florence Monna's avatar
      [feature] add nrmd-wide support for singularity · e0d0abb4
      Florence Monna authored
      The daemon can be configured to launch singularity containers. In such
      case, the manifest must contain an image section.
      
      Note that this doesn't support any resource management with singularity,
      since that stuff is only available as root. We will add a second
      container runtime option to support it later.
      e0d0abb4
  5. 01 May, 2019 1 commit
  6. 19 Apr, 2019 1 commit
  7. 27 Feb, 2019 1 commit
  8. 15 Feb, 2019 1 commit
  9. 07 Feb, 2019 1 commit
  10. 06 Feb, 2019 2 commits
    • Swann Perarnau's avatar
      [refactor] add an abstraction for container runtimes · c8ca4fa5
      Swann Perarnau authored
      Extract the container runtime interface from the container manager, and
      use a hierarchy of classes to enforce a runtime interface that makes
      sense.
      
      This will allow us to create alternative runtime implementations without
      major changes to the container manager code.
      c8ca4fa5
    • Swann Perarnau's avatar
      [fix] restore the use of resources tuple · dca6debb
      Swann Perarnau authored
      !27 replaced the resource field of the `container` named tuple by a
      dictionary. Restore the old type, a `resources` named tuple and make
      sure to propagate this across all the code.
      dca6debb
  11. 23 Jan, 2019 1 commit
    • Valentin Reis's avatar
      [refactor] messaging style + cmd_listen application_uuid relaxing · 27a3fdec
      Valentin Reis authored
      This commit does two things:
      - re-indents the message schema to be more readable
      - lets `cmd listen --filter` print any incoming message, without
      discriminating on container_uuid. This makes cmd listen usable until
      a proper application_uuid management is written into nrm.
      27a3fdec
  12. 21 Jan, 2019 4 commits
    • Swann Perarnau's avatar
      [fix] stylecheck issues · 5c83d096
      Swann Perarnau authored
      Some inconsistencies in the CI let a merge request go through without
      stylechecking.
      5c83d096
    • Sridutt Bhalachandra's avatar
      [fix] Aggregative downstream & new msg layer · f3c53106
      Sridutt Bhalachandra authored
      Made necesseary fixes required to make the aggregative downstream api
      integration to work with the new downstream messaging layer.
      
      Also,fixed the case where daemon crashed when an application message
      (from libnrm using pmpi) was received after container was killed
      
      run_policy on all containers removed as the controller no longer has
      application manager info
      
      Any other refactoring and fixes required (check merge request
      discussion)
      
      See Issues #13, #20 and Merge !41
      f3c53106
    • Sridutt Bhalachandra's avatar
      [fix] Multi-node support and msg layer interaction · 33316192
      Sridutt Bhalachandra authored
      Fixed the interaction of the multi-node support feature (#17) with the new
      messaging layer feature. Also, added any other fixes required to make the
      libnrm work with the Aggregative downstream API
      33316192
    • Sridutt Bhalachandra's avatar
      [feature] Aggregative Downstream API integration · a501c976
      Sridutt Bhalachandra authored
      Adds support for aggregation of phase context information for an
      application. The damper value (in nanoseconds in the manifest file)
      decides the minimum phase length for which the phase context
      information is sent to the NRM (implemented in 'libnrm' repo
      [See Issue 2]). This will limit the number of msgs sent to the NRM.
      
      See Issue #13
      a501c976
  13. 04 Jan, 2019 2 commits
  14. 21 Dec, 2018 2 commits
  15. 18 Dec, 2018 1 commit
  16. 12 Dec, 2018 1 commit
    • Valentin Reis's avatar
      [Feature] Adds configuration management and environment variables · 25443c64
      Valentin Reis authored
      This commit adds a command-line interface to `daemon`:
      ```
      usage: daemon [-h] [-c FILE] [-d] [--nrm_log NRM_LOG] [--hwloc HWLOC]
                    [--argo_nodeos_config ARGO_NODEOS_CONFIG] [--perf PERF]
                    [--argo_perf_wrapper ARGO_PERF_WRAPPER]
      
      optional arguments:
        -h, --help            show this help message and exit
        -c FILE, --configuration FILE
                              Specify a config json-formatted config file to
                              override any of the available CLI options. If an
                              option is actually provided on the command-line, it
                              overrides its corresponding value from the
                              configuration file.
        -d, --print_defaults  Print the default configuration file.
        --nrm_log NRM_LOG     Main log file. Override default with the NRM_LOG.
                              environment variable
        --hwloc HWLOC         Path to the hwloc to use. This path can be relative
                              and makes uses of the $PATH if necessary. Override
                              default with the HWLOC environment variable.
        --argo_nodeos_config ARGO_NODEOS_CONFIG
                              Path to the argo_nodeos_config to use. This path can
                              be relative and makes uses of the $PATH if necessary.
                              Override default with the ARGO_NODEOS_CONFIG
                              environment variable.
        --perf PERF           Path to the linux perf tool to use. This path can be
                              relative and makes uses of the $PATH if necessary.
                              Override default with the PERF environment variable.
        --argo_perf_wrapper ARGO_PERF_WRAPPER
                              Path to the linux perf tool to use. This path can be
                              relative and makes uses of the $PATH if necessary.
                              Override default with the PERFWRAPPER environment
                              variable.
      ```
      25443c64
  17. 10 Dec, 2018 1 commit
  18. 28 Nov, 2018 4 commits
    • Swann Perarnau's avatar
      [fix] remove container ownership concept · a47ec65f
      Swann Perarnau authored
      Make it so that the daemon will delete containers when all commands it
      is aware of are finished, instead of relying on a single owner that
      needs to be tracked.
      
      This simplifies the handling to multiple commands in the same container,
      and should not impact the rest.
      a47ec65f
    • Swann Perarnau's avatar
      [refactor] move container start/exit to up_pub · f81b95e0
      Swann Perarnau authored
      Move the container start/exit events to the upstream pub/sub event
      stream. As these are more of a global event now that we support multiple
      commands in the same container, it makes sense to move them to the more
      general event stream.
      
      This patch also remove the code in cmd waiting for container start or
      exit, making (temporarily) the cmd unable to report power metrics. We
      will fix that in a later commit.
      
      This patch fixes complicated issues we had with how a job running
      multiple commands in the container might not all wait for the end of the
      container: now none of them do.
      f81b95e0
    • Swann Perarnau's avatar
      [fix] ensure container has single owner · 93ae9144
      Swann Perarnau authored
      Ensure that the client that created the container is considered as the
      one owning it, with the consequence that if its command exits, the
      container is destroyed. Also deals with the race issue we had on the cmd
      side.
      93ae9144
    • Swann Perarnau's avatar
      [refactor/fix] always send process events for run · 6e0c1e7a
      Swann Perarnau authored
      Current code sends start/exit events when a container is created and
      process_start/process_exit when its already there. Instead, have the
      container start/exit only care about container stuff, and always sends
      the process start/exit events around. That makes the cmd run fsm easier
      to work out.
      
      Changes the message format a tiny bit.
      Fixes some missing stdout/stderr issues we had before.
      6e0c1e7a
  19. 23 Oct, 2018 1 commit
  20. 21 Oct, 2018 1 commit
    • Swann Perarnau's avatar
      [refactor] replace upstream comms with msg layer · 0b0ab966
      Swann Perarnau authored
      Replace the fragile upstream communications with the new messaging
      layer, improving the stability and performance of this API.
      
      NOTE: this breaks previous clients
      NOTE: this patch is missing client tracking, to handle children signals.
      0b0ab966
  21. 17 Oct, 2018 1 commit
    • Sridutt Bhalachandra's avatar
      [Feature] Multi- node and process support · 5a41baba
      Sridutt Bhalachandra authored
      Added multi- node and process support that will allow launching of
      multiple processes within a container. This is important for enabling
      use of NRM with MPI applications with multiple processes in a container
      and thus enabling multi-node executions
      
      See Issue #17
      5a41baba
  22. 15 Aug, 2018 1 commit
  23. 14 Aug, 2018 1 commit
  24. 10 Aug, 2018 4 commits
  25. 16 Jul, 2018 1 commit
  26. 21 Dec, 2017 1 commit
  27. 20 Dec, 2017 1 commit
    • Swann Perarnau's avatar
      [feature] Add PowerActuator and update control · 26e9c239
      Swann Perarnau authored
      This patch adds a poweractuator based on rapl settings available through
      the sensor manager. Adding this actuator forces us to use a list of
      actuators in the controller, changing a bit the structure of the code.
      26e9c239