1. 10 Dec, 2018 2 commits
  2. 28 Nov, 2018 5 commits
    • Swann Perarnau's avatar
      [fix] remove container ownership concept · a47ec65f
      Swann Perarnau authored
      Make it so that the daemon will delete containers when all commands it
      is aware of are finished, instead of relying on a single owner that
      needs to be tracked.
      
      This simplifies the handling to multiple commands in the same container,
      and should not impact the rest.
      a47ec65f
    • Swann Perarnau's avatar
      [refactor] move container start/exit to up_pub · f81b95e0
      Swann Perarnau authored
      Move the container start/exit events to the upstream pub/sub event
      stream. As these are more of a global event now that we support multiple
      commands in the same container, it makes sense to move them to the more
      general event stream.
      
      This patch also remove the code in cmd waiting for container start or
      exit, making (temporarily) the cmd unable to report power metrics. We
      will fix that in a later commit.
      
      This patch fixes complicated issues we had with how a job running
      multiple commands in the container might not all wait for the end of the
      container: now none of them do.
      f81b95e0
    • Swann Perarnau's avatar
      [feature] add messaging class for pub client · c4e50535
      Swann Perarnau authored
      Add a upstream pub client, to be able to listen to messages coming from
      the daemon on the upstream pub/sub channel.
      
      Doesn't support any fancy filter, as that's not used by the daemon so
      far.
      c4e50535
    • Swann Perarnau's avatar
      [fix] ensure container has single owner · 93ae9144
      Swann Perarnau authored
      Ensure that the client that created the container is considered as the
      one owning it, with the consequence that if its command exits, the
      container is destroyed. Also deals with the race issue we had on the cmd
      side.
      93ae9144
    • Swann Perarnau's avatar
      [refactor/fix] always send process events for run · 6e0c1e7a
      Swann Perarnau authored
      Current code sends start/exit events when a container is created and
      process_start/process_exit when its already there. Instead, have the
      container start/exit only care about container stuff, and always sends
      the process start/exit events around. That makes the cmd run fsm easier
      to work out.
      
      Changes the message format a tiny bit.
      Fixes some missing stdout/stderr issues we had before.
      6e0c1e7a
  3. 23 Oct, 2018 1 commit
  4. 21 Oct, 2018 2 commits
    • Swann Perarnau's avatar
      [refactor] replace upstream comms with msg layer · 0b0ab966
      Swann Perarnau authored
      Replace the fragile upstream communications with the new messaging
      layer, improving the stability and performance of this API.
      
      NOTE: this breaks previous clients
      NOTE: this patch is missing client tracking, to handle children signals.
      0b0ab966
    • Swann Perarnau's avatar
      [feature] add messaging layer for upstream API · c29ed7ea
      Swann Perarnau authored
      Abstracts away the exact wire format and client/server details, while
      changing the RPC side to work over ROUTER/DEALER sockets, as to avoid
      the lost messages issues we've been having with PUB/SUB for RPC.
      c29ed7ea
  5. 17 Oct, 2018 1 commit
    • Sridutt Bhalachandra's avatar
      [Feature] Multi- node and process support · 5a41baba
      Sridutt Bhalachandra authored
      Added multi- node and process support that will allow launching of
      multiple processes within a container. This is important for enabling
      use of NRM with MPI applications with multiple processes in a container
      and thus enabling multi-node executions
      
      See Issue #17
      5a41baba
  6. 15 Aug, 2018 2 commits
  7. 14 Aug, 2018 1 commit
  8. 10 Aug, 2018 5 commits
  9. 09 Aug, 2018 1 commit
    • Kamil Iskra's avatar
      Pass environment explicitly · 3fcf2f50
      Kamil Iskra authored
      When invoking 'argo_nodeos_config run', we were passing the job
      environment implicitly.  This wasn't very clean and was also causing
      problems with variables such as LD_PRELOAD, which were being filtered
      out because argo_nodeos_config is suid root.
      3fcf2f50
  10. 25 Jul, 2018 2 commits
  11. 19 Jul, 2018 2 commits
  12. 17 Jul, 2018 2 commits
  13. 16 Jul, 2018 1 commit
  14. 03 Jul, 2018 1 commit
  15. 21 Dec, 2017 1 commit
  16. 20 Dec, 2017 3 commits
    • Swann Perarnau's avatar
      [feature] Add actuator logic for decreasing power · 36206879
      Swann Perarnau authored
      Change the PowerActuator to be able to lower the power limit. Because
      RAPL doesn't provide an actual lower limit, we use 0 as the minimal
      power.
      36206879
    • Swann Perarnau's avatar
      [feature] Add PowerActuator and update control · 26e9c239
      Swann Perarnau authored
      This patch adds a poweractuator based on rapl settings available through
      the sensor manager. Adding this actuator forces us to use a list of
      actuators in the controller, changing a bit the structure of the code.
      26e9c239
    • Swann Perarnau's avatar
      [feature] Add actuator to the controller logic · cbbf2354
      Swann Perarnau authored
      This patch introduce one more level of abstraction to the controller:
      an actuator. Actuators will act as the middleman between specific
      managers and the controller, while providing enough info to implement
      actual models on top.
      
      For now, we only have the application threads actuator.
      cbbf2354
  17. 19 Dec, 2017 7 commits
    • Kamil Iskra's avatar
      Improve formatting and commentary · 41a91901
      Kamil Iskra authored
      41a91901
    • Swann Perarnau's avatar
      [refactor] Move control scheme to its own module · 246edb75
      Swann Perarnau authored
      The "control" part of the NRM is bound to change and become more complex
      in the near future, so move it in its own module.
      
      This refactor also introduce some controller logic. Control is split
      into 3 steps: planning, execution and updates. The goal is to use this
      new code organization as a way to abstract different control policies
      that could be implemented later.
      
      Note that we might at some point move into a "control manager" and a
      bunch of "policies" and "actuators", as a way of matching typical
      control theory vocabulary.
      246edb75
    • Kamil Iskra's avatar
      Configure perf-wrapper using the manifest · b666f1c2
      Kamil Iskra authored
      b666f1c2
    • Swann Perarnau's avatar
      [fix] Wrong streaming_callback on stderr · dec31967
      Swann Perarnau authored
      Fixes a copy/paste mistake on the name of the callback to trigger on
      stderr events.
      dec31967
    • Swann Perarnau's avatar
      [fix] Use proper env variable for container uuid · 90157c2a
      Swann Perarnau authored
      This patch fixes the daemon code to include the container uuid in the
      environment of the command, while changing that environment variable to
      use a better suited name.
      90157c2a
    • Swann Perarnau's avatar
      [feature] Replace client with dummy application · 66e4c85d
      Swann Perarnau authored
      This patch replace the client code (bin/client and nrm/client) by a new
      application code that integrates progress reports and uses the new
      downstream API.
      
      While git is reporting that both codes are different, the app code is
      basically a refactoring and adaptation of the client code.
      
      This is directly related to issue #2.
      66e4c85d
    • Swann Perarnau's avatar
      [feature] Implement Application Manager · f43a38d3
      Swann Perarnau authored
      This patch moves the tracking of applications clients of the downstream
      API into a ApplicationManager, that is able to track progress and thread
      management.
      
      This change is necessary in the long term to build a comprehensive
      downstream API and centralize the management of application tracking.
      
      Note that this tracking is currently independent of the container and
      pid tracking, and that might be a problem in the long term.
      f43a38d3
  18. 18 Dec, 2017 1 commit
    • Swann Perarnau's avatar
      [feature] Implement skeleton downstream API · 19c9eb54
      Swann Perarnau authored
      This patch refactors the downstream API to use pub/sub socket pair, like
      the upstream API. This is part of the effort to improve the downstream
      API. See #2.
      
      This patch doesn't touch the client module, which will be adapted in
      future commits.
      19c9eb54