1. 28 Jan, 2019 1 commit
    • Valentin Reis's avatar
      [fix] Reverting API breakage from the last merge · 250198e4
      Valentin Reis authored
      The last merge changed the API visual style to increase readability.
      
      The manual merging was not done in a proper way however, and some of the
      changes from its previous merges were reverted. This commit fixes this.
      250198e4
  2. 23 Jan, 2019 1 commit
    • Valentin Reis's avatar
      [refactor] messaging style + cmd_listen application_uuid relaxing · 27a3fdec
      Valentin Reis authored
      This commit does two things:
      - re-indents the message schema to be more readable
      - lets `cmd listen --filter` print any incoming message, without
      discriminating on container_uuid. This makes cmd listen usable until
      a proper application_uuid management is written into nrm.
      27a3fdec
  3. 21 Jan, 2019 7 commits
    • Swann Perarnau's avatar
      [fix] stylecheck issues · 5c83d096
      Swann Perarnau authored
      Some inconsistencies in the CI let a merge request go through without
      stylechecking.
      5c83d096
    • Sridutt Bhalachandra's avatar
      [fix] Aggregative downstream & new msg layer · f3c53106
      Sridutt Bhalachandra authored
      Made necesseary fixes required to make the aggregative downstream api
      integration to work with the new downstream messaging layer.
      
      Also,fixed the case where daemon crashed when an application message
      (from libnrm using pmpi) was received after container was killed
      
      run_policy on all containers removed as the controller no longer has
      application manager info
      
      Any other refactoring and fixes required (check merge request
      discussion)
      
      See Issues #13, #20 and Merge !41
      f3c53106
    • Sridutt Bhalachandra's avatar
      [feature] Support for Process/Task pinning · 402a2524
      Sridutt Bhalachandra authored
      Added support for pinning process/task to a core. This is important for
      allowing the use of power policies that use contextual information from an
      application phase and use it for computing frequency levels for the next phase.
      In absence of process/task pinning, the contextual information obtained does
      not serve any value as it is not representative of application phase behavior
      on a core as the processes and task can migrate during the next phase.
      
      See Issue #20
      402a2524
    • Sridutt Bhalachandra's avatar
      [fix] Ensure that first N resources are always returned · 442d31f6
      Sridutt Bhalachandra authored
      Fixes NRM not returning the first N resources (cpu and memory). This
      is important for reproducibility and reducing variation
      442d31f6
    • Sridutt Bhalachandra's avatar
      [fix] Multi-node support and msg layer interaction · 33316192
      Sridutt Bhalachandra authored
      Fixed the interaction of the multi-node support feature (#17) with the new
      messaging layer feature. Also, added any other fixes required to make the
      libnrm work with the Aggregative downstream API
      33316192
    • Sridutt Bhalachandra's avatar
      [feature] Aggregative Downstream API integration · a501c976
      Sridutt Bhalachandra authored
      Adds support for aggregation of phase context information for an
      application. The damper value (in nanoseconds in the manifest file)
      decides the minimum phase length for which the phase context
      information is sent to the NRM (implemented in 'libnrm' repo
      [See Issue 2]). This will limit the number of msgs sent to the NRM.
      
      See Issue #13
      a501c976
    • Sridutt Bhalachandra's avatar
      [refactor] Diff calculation without coolr changes · 5d292f9f
      Sridutt Bhalachandra authored
      Refactored diff calculation code to work without needing changes in
      coolr module (Patch for Commit 36401a84)
      5d292f9f
  4. 09 Jan, 2019 1 commit
  5. 04 Jan, 2019 2 commits
  6. 21 Dec, 2018 3 commits
    • Swann Perarnau's avatar
      [fix] disable application actuator · 4bca474a
      Swann Perarnau authored
      Doesn't work with the new downstream API.
      4bca474a
    • Swann Perarnau's avatar
      [refactor] use downstream messaging layer · 75df2004
      Swann Perarnau authored
      Replace the downstream API handling by the new messaging layer. Not that
      we don't have a clean way to deal with dynamic concurrency control using
      this API, so we disable the handling of it for now.
      75df2004
    • Swann Perarnau's avatar
      [feature] add downstream rpc to messaging layer · 0bae924d
      Swann Perarnau authored
      Add downstream RPC client/server classes that are the same as the
      upstream ones.
      
      This is part of a series of changes to downstream to allow for more
      reliable communications between the daemon and applications. At this
      time, the daemon never replies, so the RPC_REQ is basically used as a
      way to publish events to the daemon.
      0bae924d
  7. 18 Dec, 2018 1 commit
  8. 12 Dec, 2018 2 commits
    • Valentin Reis's avatar
      [fix] Fixes the process kill call. · ec503ffa
      Valentin Reis authored
      Fixing a bug introduced by the 'progress-report' branch in a recent
      previous commit. The process object is the result of a tornado spawn, so
      the call has to be slightly different than what was there.
      ec503ffa
    • Valentin Reis's avatar
      [Feature] Adds configuration management and environment variables · 25443c64
      Valentin Reis authored
      This commit adds a command-line interface to `daemon`:
      ```
      usage: daemon [-h] [-c FILE] [-d] [--nrm_log NRM_LOG] [--hwloc HWLOC]
                    [--argo_nodeos_config ARGO_NODEOS_CONFIG] [--perf PERF]
                    [--argo_perf_wrapper ARGO_PERF_WRAPPER]
      
      optional arguments:
        -h, --help            show this help message and exit
        -c FILE, --configuration FILE
                              Specify a config json-formatted config file to
                              override any of the available CLI options. If an
                              option is actually provided on the command-line, it
                              overrides its corresponding value from the
                              configuration file.
        -d, --print_defaults  Print the default configuration file.
        --nrm_log NRM_LOG     Main log file. Override default with the NRM_LOG.
                              environment variable
        --hwloc HWLOC         Path to the hwloc to use. This path can be relative
                              and makes uses of the $PATH if necessary. Override
                              default with the HWLOC environment variable.
        --argo_nodeos_config ARGO_NODEOS_CONFIG
                              Path to the argo_nodeos_config to use. This path can
                              be relative and makes uses of the $PATH if necessary.
                              Override default with the ARGO_NODEOS_CONFIG
                              environment variable.
        --perf PERF           Path to the linux perf tool to use. This path can be
                              relative and makes uses of the $PATH if necessary.
                              Override default with the PERF environment variable.
        --argo_perf_wrapper ARGO_PERF_WRAPPER
                              Path to the linux perf tool to use. This path can be
                              relative and makes uses of the $PATH if necessary.
                              Override default with the PERFWRAPPER environment
                              variable.
      ```
      25443c64
  9. 10 Dec, 2018 2 commits
  10. 28 Nov, 2018 5 commits
    • Swann Perarnau's avatar
      [fix] remove container ownership concept · a47ec65f
      Swann Perarnau authored
      Make it so that the daemon will delete containers when all commands it
      is aware of are finished, instead of relying on a single owner that
      needs to be tracked.
      
      This simplifies the handling to multiple commands in the same container,
      and should not impact the rest.
      a47ec65f
    • Swann Perarnau's avatar
      [refactor] move container start/exit to up_pub · f81b95e0
      Swann Perarnau authored
      Move the container start/exit events to the upstream pub/sub event
      stream. As these are more of a global event now that we support multiple
      commands in the same container, it makes sense to move them to the more
      general event stream.
      
      This patch also remove the code in cmd waiting for container start or
      exit, making (temporarily) the cmd unable to report power metrics. We
      will fix that in a later commit.
      
      This patch fixes complicated issues we had with how a job running
      multiple commands in the container might not all wait for the end of the
      container: now none of them do.
      f81b95e0
    • Swann Perarnau's avatar
      [feature] add messaging class for pub client · c4e50535
      Swann Perarnau authored
      Add a upstream pub client, to be able to listen to messages coming from
      the daemon on the upstream pub/sub channel.
      
      Doesn't support any fancy filter, as that's not used by the daemon so
      far.
      c4e50535
    • Swann Perarnau's avatar
      [fix] ensure container has single owner · 93ae9144
      Swann Perarnau authored
      Ensure that the client that created the container is considered as the
      one owning it, with the consequence that if its command exits, the
      container is destroyed. Also deals with the race issue we had on the cmd
      side.
      93ae9144
    • Swann Perarnau's avatar
      [refactor/fix] always send process events for run · 6e0c1e7a
      Swann Perarnau authored
      Current code sends start/exit events when a container is created and
      process_start/process_exit when its already there. Instead, have the
      container start/exit only care about container stuff, and always sends
      the process start/exit events around. That makes the cmd run fsm easier
      to work out.
      
      Changes the message format a tiny bit.
      Fixes some missing stdout/stderr issues we had before.
      6e0c1e7a
  11. 23 Oct, 2018 1 commit
  12. 21 Oct, 2018 2 commits
    • Swann Perarnau's avatar
      [refactor] replace upstream comms with msg layer · 0b0ab966
      Swann Perarnau authored
      Replace the fragile upstream communications with the new messaging
      layer, improving the stability and performance of this API.
      
      NOTE: this breaks previous clients
      NOTE: this patch is missing client tracking, to handle children signals.
      0b0ab966
    • Swann Perarnau's avatar
      [feature] add messaging layer for upstream API · c29ed7ea
      Swann Perarnau authored
      Abstracts away the exact wire format and client/server details, while
      changing the RPC side to work over ROUTER/DEALER sockets, as to avoid
      the lost messages issues we've been having with PUB/SUB for RPC.
      c29ed7ea
  13. 17 Oct, 2018 1 commit
    • Sridutt Bhalachandra's avatar
      [Feature] Multi- node and process support · 5a41baba
      Sridutt Bhalachandra authored
      Added multi- node and process support that will allow launching of
      multiple processes within a container. This is important for enabling
      use of NRM with MPI applications with multiple processes in a container
      and thus enabling multi-node executions
      
      See Issue #17
      5a41baba
  14. 15 Aug, 2018 2 commits
  15. 14 Aug, 2018 1 commit
  16. 10 Aug, 2018 5 commits
  17. 09 Aug, 2018 1 commit
    • Kamil Iskra's avatar
      Pass environment explicitly · 3fcf2f50
      Kamil Iskra authored
      When invoking 'argo_nodeos_config run', we were passing the job
      environment implicitly.  This wasn't very clean and was also causing
      problems with variables such as LD_PRELOAD, which were being filtered
      out because argo_nodeos_config is suid root.
      3fcf2f50
  18. 25 Jul, 2018 2 commits