1. 09 Aug, 2018 1 commit
    • Kamil Iskra's avatar
      Pass environment explicitly · 3fcf2f50
      Kamil Iskra authored
      When invoking 'argo_nodeos_config run', we were passing the job
      environment implicitly.  This wasn't very clean and was also causing
      problems with variables such as LD_PRELOAD, which were being filtered
      out because argo_nodeos_config is suid root.
      3fcf2f50
  2. 25 Jul, 2018 2 commits
  3. 19 Jul, 2018 2 commits
  4. 17 Jul, 2018 2 commits
  5. 16 Jul, 2018 1 commit
  6. 03 Jul, 2018 1 commit
  7. 21 Dec, 2017 1 commit
  8. 20 Dec, 2017 3 commits
    • Swann Perarnau's avatar
      [feature] Add actuator logic for decreasing power · 36206879
      Swann Perarnau authored
      Change the PowerActuator to be able to lower the power limit. Because
      RAPL doesn't provide an actual lower limit, we use 0 as the minimal
      power.
      36206879
    • Swann Perarnau's avatar
      [feature] Add PowerActuator and update control · 26e9c239
      Swann Perarnau authored
      This patch adds a poweractuator based on rapl settings available through
      the sensor manager. Adding this actuator forces us to use a list of
      actuators in the controller, changing a bit the structure of the code.
      26e9c239
    • Swann Perarnau's avatar
      [feature] Add actuator to the controller logic · cbbf2354
      Swann Perarnau authored
      This patch introduce one more level of abstraction to the controller:
      an actuator. Actuators will act as the middleman between specific
      managers and the controller, while providing enough info to implement
      actual models on top.
      
      For now, we only have the application threads actuator.
      cbbf2354
  9. 19 Dec, 2017 7 commits
    • Kamil Iskra's avatar
      Improve formatting and commentary · 41a91901
      Kamil Iskra authored
      41a91901
    • Swann Perarnau's avatar
      [refactor] Move control scheme to its own module · 246edb75
      Swann Perarnau authored
      The "control" part of the NRM is bound to change and become more complex
      in the near future, so move it in its own module.
      
      This refactor also introduce some controller logic. Control is split
      into 3 steps: planning, execution and updates. The goal is to use this
      new code organization as a way to abstract different control policies
      that could be implemented later.
      
      Note that we might at some point move into a "control manager" and a
      bunch of "policies" and "actuators", as a way of matching typical
      control theory vocabulary.
      246edb75
    • Kamil Iskra's avatar
      Configure perf-wrapper using the manifest · b666f1c2
      Kamil Iskra authored
      b666f1c2
    • Swann Perarnau's avatar
      [fix] Wrong streaming_callback on stderr · dec31967
      Swann Perarnau authored
      Fixes a copy/paste mistake on the name of the callback to trigger on
      stderr events.
      dec31967
    • Swann Perarnau's avatar
      [fix] Use proper env variable for container uuid · 90157c2a
      Swann Perarnau authored
      This patch fixes the daemon code to include the container uuid in the
      environment of the command, while changing that environment variable to
      use a better suited name.
      90157c2a
    • Swann Perarnau's avatar
      [feature] Replace client with dummy application · 66e4c85d
      Swann Perarnau authored
      This patch replace the client code (bin/client and nrm/client) by a new
      application code that integrates progress reports and uses the new
      downstream API.
      
      While git is reporting that both codes are different, the app code is
      basically a refactoring and adaptation of the client code.
      
      This is directly related to issue #2.
      66e4c85d
    • Swann Perarnau's avatar
      [feature] Implement Application Manager · f43a38d3
      Swann Perarnau authored
      This patch moves the tracking of applications clients of the downstream
      API into a ApplicationManager, that is able to track progress and thread
      management.
      
      This change is necessary in the long term to build a comprehensive
      downstream API and centralize the management of application tracking.
      
      Note that this tracking is currently independent of the container and
      pid tracking, and that might be a problem in the long term.
      f43a38d3
  10. 18 Dec, 2017 3 commits
    • Swann Perarnau's avatar
      [feature] Implement skeleton downstream API · 19c9eb54
      Swann Perarnau authored
      This patch refactors the downstream API to use pub/sub socket pair, like
      the upstream API. This is part of the effort to improve the downstream
      API. See #2.
      
      This patch doesn't touch the client module, which will be adapted in
      future commits.
      19c9eb54
    • Swann Perarnau's avatar
      [refactor] daemon should always bind on sockets · 1391a197
      Swann Perarnau authored
      The way 0MQ works on PUB/SUB sockets, publishers might drop
      messages if subscribers are not detected faster enough. One way to fix
      it is to have the "server" always bind sockets, and the "client" use
      connect. This way, the handshake is initiated properly, and the client
      can publish as soon as the connection is done.
      
      This patch makes the daemon bind on the upstream API and the CLI connect,
      fixing in the process the message dropping we were experiencing before.
      
      Long term, we might have a think of using 2 types of sockets for the
      upstream API: pub/sub for actual events published from the daemon, and
      a REQ/REP or ROUTER/DEALER pair for "commands".
      1391a197
    • Swann Perarnau's avatar
      Partial Revert of powercap API update · ec563afb
      Swann Perarnau authored
      Previous commit 0c93ce6a broke the
      sample code used by the daemon, by reverting the sample function to a
      json message generator. This is due to inconsistencies between the coolr
      code and the NRM import: we removed json generation from coolr, to push
      it on the messaging side, while upstream still does it on sensor
      reading.
      
      This commit fixes that, but doesn't touch the new test code embedded in
      clr_rapl.py
      We will move that the test infrastructure later.
      ec563afb
  11. 17 Dec, 2017 1 commit
  12. 15 Dec, 2017 2 commits
    • Swann Perarnau's avatar
      [feature] Properly handle run events in order · 957deb8d
      Swann Perarnau authored
      This patch implements a small finite state machine on the cmd side to be
      able to run a command, wait for all of its output, and then exit.
      
      As the daemon can send those message in any order, we need to wait them
      properly, in particular the closing of stdout/stderr before exiting.
      
      This patch also fixes the read_until_close callback creation to ensure
      that the stream EOF is handled as a distinct message.
      957deb8d
    • Swann Perarnau's avatar
      [refactor] Only track container inside the CM · f2bc8b80
      Swann Perarnau authored
      The daemon code was maintaining its own container tracker using pids,
      instead of using the one in the container manager. This patch removes
      this additional tracking, and let the daemon side deal with an actual
      namedtuple.
      f2bc8b80
  13. 14 Dec, 2017 7 commits
    • Swann Perarnau's avatar
      [feature] Add stdout/stderr streaming · 78f63cd4
      Swann Perarnau authored
      This patch adds stdout/stderr streaming capabilities, based on partial
      evaluation of a tornado.iostream callback. The bin/cmd CLI is updated to
      wait until an exit message, although that doesn't guaranty anything on
      message ordering...
      
      The next step is obviously to figure out a message flow that allows the
      CLI to send and receive the command IO properly, in order...
      78f63cd4
    • Swann Perarnau's avatar
      [refactor] Fix container namedtuple · 9afe59c7
      Swann Perarnau authored
      This patch propagates the process object into the container namedtuple,
      fix a couple of bad function calls and adapt the run command handler to
      use that process object instead of just the pid of it.
      9afe59c7
    • Swann Perarnau's avatar
      [feature] Use argo_nodeos_config --exec · edeb413b
      Swann Perarnau authored
      Use the new argo_nodeos_config --exec feature in development.
      Allow us to delegate fork+attach+exec to argo_nodeos_config, and
      simplifying the create command as a result.
      
      We use tornado.process to wrap this command, as we want to able to
      stream stdout/stderr in the future.
      
      This patch also misuse, the 'pid' field of the container namedtuple to
      save the tornado.process.Subprocess object itself, so some functions
      need to be adapted.
      edeb413b
    • Swann Perarnau's avatar
      [fix] Fix missing logging changes · ed33ef6d
      Swann Perarnau authored
      The logging improvement patch missed a few calls.
      ed33ef6d
    • Swann Perarnau's avatar
      [style] pep8 and other small style fixes · 3054b894
      Swann Perarnau authored
      Remove unused import, commas at the end of dictionaries.
      3054b894
    • Swann Perarnau's avatar
      [refactor] Use globally configured logger · b66c88ec
      Swann Perarnau authored
      The logging module allow us to configure logging facilities once per
      process using basicConfig, and then to use globally defined, named,
      logger objects. This simplifies access to logger objects, their
      configuration and remove pointers from all objects.
      
      This patch refactor all the logging calls to use a single 'nrm' logger
      object, using those facilities.
      b66c88ec
    • Swann Perarnau's avatar
      [refactor] Allow updates in resource tracking · d5f88a14
      Swann Perarnau authored
      Implement an update allocation function to be able to update resource
      tracking when containers are created and deleted.
      
      The commit should make it easier to improve the resource manager later
      on.
      d5f88a14
  14. 13 Dec, 2017 3 commits
    • Swann Perarnau's avatar
      [feature] Add kill command · 63c2dea8
      Swann Perarnau authored
      This patch adds a command to kill the parent process of a container
      based on the container uuid, triggering the death of the container.
      
      The os.kill command interacts pretty badly with the custom built
      children handling, causing us to catch unwanted exceptions in an effort
      to keep the code running. The waitpid code was also missing a bit about
      catching children exiting because of signals, so we fixed that.
      
      At this point, two things should be paid attention to:
        - we don't distinguish properly between a container and a command.
        This will probably cause issues later, as it should be possible to
        launch multiple programs in the same container, and for partitions to
        survive the death of the parent process.
        - the message format is growing more complex, but without any
        component having strong ownership over it. This will probably cause
        stability issues in the long term, as the format complexifies and we
        lose track of the fields expected from everyone.
      63c2dea8
    • Swann Perarnau's avatar
      [feature] Add command to list containers · 2f470afb
      Swann Perarnau authored
      This patch adds a very simple command to list the containers currently
      known by the NRM. There's no history or state tracking on the NRM, so
      the code is pretty simple.
      
      We expect that some of the container tracking doesn't need to be sent
      for such a command, so the listing also filters some of the fields.
      
      This patch also adds an 'event' field to container messages, as it would
      probably be needed further for other kind of operations.
      2f470afb
    • Swann Perarnau's avatar
      [feature] Implement simple RM for containers · 1c4645cc
      Swann Perarnau authored
      This patch refactor the resource management and hwloc code into a
      working, albeit very simple scheduling policy. Indeed, the previous code
      contained strong assumptions about the output of hwloc matching an Argo
      NodeOS configuration used during the previous phase of the project, that
      always contained enough CPUs and Mems to perform exclusive scheduling.
      
      The current version is simpler, but should work on more regular systems.
      The patch also improves code organization so that introducing more
      complex scheduling algorithms will be simpler.
      
      The testing of this code resulted in the discovery of simple bugs in the
      daemon children handling code, which should work now.
      1c4645cc
  15. 11 Dec, 2017 4 commits
    • Swann Perarnau's avatar
      [feature] Pull the Argus code into the NRM · 92290b22
      Swann Perarnau authored
      The Argus (globalos) launcher had prototype code to read a container
      manifest, create a container using Judi's code, and map resources using
      hwloc.
      
      This patch brings that code, almost intact, into the NRM repo. This code
      is quite ugly, and the resource mapping crashes if the kernel
      configuration isn't right. But it's still a good starting point, and we
      should be able to improve things little by little.
      
      One part in particular needs attention: SIGCHLD handling. We should
      think of using ioloop-provided facilities to avoid this mess.
      
      The patch also contains the associated CLI changes.
      
      Note: the messaging format is starting to be difficult to keep in check,
      as there's conversions and field checks all over the code. See #3 for
      a possible solution.
      92290b22
    • Swann Perarnau's avatar
      [feature] Add container run skeleton · 5f6f9415
      Swann Perarnau authored
      This is the first step in a series of patches to integrate the container
      launching code from Argus (globalos) into the NRM infrastructure.
      
      This patch creates a valid command on the CLI, and sends the necessary
      info to the NRM. We still need to take care of the actual container
      creation.
      
      Note that the CLI waits for an event indicating that the container was
      launched, at that at this point the event is never generated by the NRM.
      5f6f9415
    • Swann Perarnau's avatar
      [refactor] Improve power message format · 63db906e
      Swann Perarnau authored
      This commit changes the message format for the upstream API, to use a
      json-encoded dictionary. While the format is not set in stone at this
      point, the goal is to slowly move into a proper protocol, with
      well-defined fields to the messages, and proper mechanisms to send
      commands and receive notification of their completion.
      
      The only current user of this API is the power management piece, and
      this change breaks the GRM code maintained outside of this repo. We will
      need to reconcile the two implementation once the message protocol gets
      more stable.
      
      Related to #1 and #6.
      63db906e
    • Swann Perarnau's avatar
      [feature] Implement basic CLI · bc1b7fd2
      Swann Perarnau authored
      Only supports setpower for now, and while it should work in theory, the
      current code doesn't have a way to check if the command was received, as
      the daemon never advertise the current limit.
      
      We need to change the protocol at this point.
      
      This also fixes a bug in the daemon code, that was expecting a single
      string as a message, instead of a list of parts, as zmqstream always
      receives.
      bc1b7fd2