1. 17 Oct, 2018 1 commit
    • Sridutt Bhalachandra's avatar
      [Feature] Multi- node and process support · 5a41baba
      Sridutt Bhalachandra authored
      Added multi- node and process support that will allow launching of
      multiple processes within a container. This is important for enabling
      use of NRM with MPI applications with multiple processes in a container
      and thus enabling multi-node executions
      
      See Issue #17
      5a41baba
  2. 15 Aug, 2018 1 commit
  3. 14 Aug, 2018 1 commit
  4. 10 Aug, 2018 3 commits
  5. 17 Jul, 2018 1 commit
  6. 21 Dec, 2017 1 commit
  7. 19 Dec, 2017 3 commits
  8. 15 Dec, 2017 1 commit
    • Swann Perarnau's avatar
      [refactor] Only track container inside the CM · f2bc8b80
      Swann Perarnau authored
      The daemon code was maintaining its own container tracker using pids,
      instead of using the one in the container manager. This patch removes
      this additional tracking, and let the daemon side deal with an actual
      namedtuple.
      f2bc8b80
  9. 14 Dec, 2017 5 commits
    • Swann Perarnau's avatar
      [refactor] Fix container namedtuple · 9afe59c7
      Swann Perarnau authored
      This patch propagates the process object into the container namedtuple,
      fix a couple of bad function calls and adapt the run command handler to
      use that process object instead of just the pid of it.
      9afe59c7
    • Swann Perarnau's avatar
      [feature] Use argo_nodeos_config --exec · edeb413b
      Swann Perarnau authored
      Use the new argo_nodeos_config --exec feature in development.
      Allow us to delegate fork+attach+exec to argo_nodeos_config, and
      simplifying the create command as a result.
      
      We use tornado.process to wrap this command, as we want to able to
      stream stdout/stderr in the future.
      
      This patch also misuse, the 'pid' field of the container namedtuple to
      save the tornado.process.Subprocess object itself, so some functions
      need to be adapted.
      edeb413b
    • Swann Perarnau's avatar
      [fix] Fix missing logging changes · ed33ef6d
      Swann Perarnau authored
      The logging improvement patch missed a few calls.
      ed33ef6d
    • Swann Perarnau's avatar
      [refactor] Use globally configured logger · b66c88ec
      Swann Perarnau authored
      The logging module allow us to configure logging facilities once per
      process using basicConfig, and then to use globally defined, named,
      logger objects. This simplifies access to logger objects, their
      configuration and remove pointers from all objects.
      
      This patch refactor all the logging calls to use a single 'nrm' logger
      object, using those facilities.
      b66c88ec
    • Swann Perarnau's avatar
      [refactor] Allow updates in resource tracking · d5f88a14
      Swann Perarnau authored
      Implement an update allocation function to be able to update resource
      tracking when containers are created and deleted.
      
      The commit should make it easier to improve the resource manager later
      on.
      d5f88a14
  10. 13 Dec, 2017 3 commits
    • Swann Perarnau's avatar
      [feature] Add kill command · 63c2dea8
      Swann Perarnau authored
      This patch adds a command to kill the parent process of a container
      based on the container uuid, triggering the death of the container.
      
      The os.kill command interacts pretty badly with the custom built
      children handling, causing us to catch unwanted exceptions in an effort
      to keep the code running. The waitpid code was also missing a bit about
      catching children exiting because of signals, so we fixed that.
      
      At this point, two things should be paid attention to:
        - we don't distinguish properly between a container and a command.
        This will probably cause issues later, as it should be possible to
        launch multiple programs in the same container, and for partitions to
        survive the death of the parent process.
        - the message format is growing more complex, but without any
        component having strong ownership over it. This will probably cause
        stability issues in the long term, as the format complexifies and we
        lose track of the fields expected from everyone.
      63c2dea8
    • Swann Perarnau's avatar
      [feature] Add command to list containers · 2f470afb
      Swann Perarnau authored
      This patch adds a very simple command to list the containers currently
      known by the NRM. There's no history or state tracking on the NRM, so
      the code is pretty simple.
      
      We expect that some of the container tracking doesn't need to be sent
      for such a command, so the listing also filters some of the fields.
      
      This patch also adds an 'event' field to container messages, as it would
      probably be needed further for other kind of operations.
      2f470afb
    • Swann Perarnau's avatar
      [feature] Implement simple RM for containers · 1c4645cc
      Swann Perarnau authored
      This patch refactor the resource management and hwloc code into a
      working, albeit very simple scheduling policy. Indeed, the previous code
      contained strong assumptions about the output of hwloc matching an Argo
      NodeOS configuration used during the previous phase of the project, that
      always contained enough CPUs and Mems to perform exclusive scheduling.
      
      The current version is simpler, but should work on more regular systems.
      The patch also improves code organization so that introducing more
      complex scheduling algorithms will be simpler.
      
      The testing of this code resulted in the discovery of simple bugs in the
      daemon children handling code, which should work now.
      1c4645cc
  11. 11 Dec, 2017 1 commit
    • Swann Perarnau's avatar
      [feature] Pull the Argus code into the NRM · 92290b22
      Swann Perarnau authored
      The Argus (globalos) launcher had prototype code to read a container
      manifest, create a container using Judi's code, and map resources using
      hwloc.
      
      This patch brings that code, almost intact, into the NRM repo. This code
      is quite ugly, and the resource mapping crashes if the kernel
      configuration isn't right. But it's still a good starting point, and we
      should be able to improve things little by little.
      
      One part in particular needs attention: SIGCHLD handling. We should
      think of using ioloop-provided facilities to avoid this mess.
      
      The patch also contains the associated CLI changes.
      
      Note: the messaging format is starting to be difficult to keep in check,
      as there's conversions and field checks all over the code. See #3 for
      a possible solution.
      92290b22