- 08 Feb, 2019 2 commits
-
-
Swann Perarnau authored
For some reason this seems to be triggering a leak of a open file descriptor, even though it should be garbage collected. As for why it's there...well there was a time when I had hoped of implementing support for stdin.
-
Swann Perarnau authored
pyzmq recommends the use of a single context for all objects in the same application: https://pyzmq.readthedocs.io/en/latest/api/zmq.html#zmq.Context.instance More importantly, each context creates at least one thread, resulting in a lot of threads with a lot of open file descriptors. This path reduces that number by quite a bit.
-
- 07 Feb, 2019 1 commit
-
-
Swann Perarnau authored
-
- 06 Feb, 2019 6 commits
-
-
Swann Perarnau authored
Add a container runtime that doesn't create containers, but can still launch commands. That should make it possible, once we figure out how to change it in the configuration, to actually run the NRM without a setuid binary.
-
Swann Perarnau authored
Extract the container runtime interface from the container manager, and use a hierarchy of classes to enforce a runtime interface that makes sense. This will allow us to create alternative runtime implementations without major changes to the container manager code.
-
Swann Perarnau authored
Missing checks in aci resulted in features absent from the manifest to be counted as enabled.
-
Swann Perarnau authored
Just to make it easy to debug binding issues.
-
Swann Perarnau authored
!27 replaced the resource field of the `container` named tuple by a dictionary. Restore the old type, a `resources` named tuple and make sure to propagate this across all the code.
-
Swann Perarnau authored
Refactor the container creation code to isolate the building/retrieving of a container namedtuple out of the core container creation code. This simplifies quite a bit the different branches of this code, and makes the core create method almost entirely dedicated to just launching a command.
-
- 05 Feb, 2019 4 commits
-
-
Swann Perarnau authored
Some of these commands prefix were in a bit of a weird order. In particular, it makes sense to have perfwrapper at the end.
-
Swann Perarnau authored
-
Swann Perarnau authored
!27 caused the perfwrapper to only be activated for the command that creates a container, and never for the other commands launched in an already running container. This patch restore this feature.
-
Swann Perarnau authored
The container creation code was starting to repeat the same logic for testing that a container manifest was enabling a specific feature. Instead we add a method to the image manifest to check for that. Also adds tests for that feature.
-
- 04 Feb, 2019 1 commit
-
-
Valentin Reis authored
-
- 28 Jan, 2019 1 commit
-
-
Valentin Reis authored
The last merge changed the API visual style to increase readability. The manual merging was not done in a proper way however, and some of the changes from its previous merges were reverted. This commit fixes this.
-
- 23 Jan, 2019 1 commit
-
-
Valentin Reis authored
This commit does two things: - re-indents the message schema to be more readable - lets `cmd listen --filter` print any incoming message, without discriminating on container_uuid. This makes cmd listen usable until a proper application_uuid management is written into nrm.
-
- 21 Jan, 2019 7 commits
-
-
Swann Perarnau authored
Some inconsistencies in the CI let a merge request go through without stylechecking.
-
Sridutt Bhalachandra authored
Made necesseary fixes required to make the aggregative downstream api integration to work with the new downstream messaging layer. Also,fixed the case where daemon crashed when an application message (from libnrm using pmpi) was received after container was killed run_policy on all containers removed as the controller no longer has application manager info Any other refactoring and fixes required (check merge request discussion) See Issues #13, #20 and Merge !41
-
Sridutt Bhalachandra authored
Added support for pinning process/task to a core. This is important for allowing the use of power policies that use contextual information from an application phase and use it for computing frequency levels for the next phase. In absence of process/task pinning, the contextual information obtained does not serve any value as it is not representative of application phase behavior on a core as the processes and task can migrate during the next phase. See Issue #20
-
Sridutt Bhalachandra authored
Fixes NRM not returning the first N resources (cpu and memory). This is important for reproducibility and reducing variation
-
Sridutt Bhalachandra authored
Fixed the interaction of the multi-node support feature (#17) with the new messaging layer feature. Also, added any other fixes required to make the libnrm work with the Aggregative downstream API
-
Sridutt Bhalachandra authored
Adds support for aggregation of phase context information for an application. The damper value (in nanoseconds in the manifest file) decides the minimum phase length for which the phase context information is sent to the NRM (implemented in 'libnrm' repo [See Issue 2]). This will limit the number of msgs sent to the NRM. See Issue #13
-
Sridutt Bhalachandra authored
Refactored diff calculation code to work without needing changes in coolr module (Patch for Commit 36401a84)
-
- 09 Jan, 2019 1 commit
-
-
Valentin Reis authored
the gitlab-ci.yml file now points to argotest/gitlab/basic.yml on master.
-
- 04 Jan, 2019 2 commits
-
-
Valentin Reis authored
-
Valentin Reis authored
This includes renaming "progress" to performance in argo_perf_wrapper. There are two distincts keywords in the messaging layer: "performance" for all things related to hardware, and "progress", for all things relating to the application.
-
- 21 Dec, 2018 3 commits
-
-
Swann Perarnau authored
Doesn't work with the new downstream API.
-
Swann Perarnau authored
Replace the downstream API handling by the new messaging layer. Not that we don't have a clean way to deal with dynamic concurrency control using this API, so we disable the handling of it for now.
-
Swann Perarnau authored
Add downstream RPC client/server classes that are the same as the upstream ones. This is part of a series of changes to downstream to allow for more reliable communications between the daemon and applications. At this time, the daemon never replies, so the RPC_REQ is basically used as a way to publish events to the daemon.
-
- 18 Dec, 2018 1 commit
-
-
Swann Perarnau authored
Add a config option to specify the location of the PMPI LD_PRELOAD library available in libnrm. This should make it easier to use this library.
-
- 12 Dec, 2018 2 commits
-
-
Valentin Reis authored
Fixing a bug introduced by the 'progress-report' branch in a recent previous commit. The process object is the result of a tornado spawn, so the call has to be slightly different than what was there.
-
Valentin Reis authored
This commit adds a command-line interface to `daemon`: ``` usage: daemon [-h] [-c FILE] [-d] [--nrm_log NRM_LOG] [--hwloc HWLOC] [--argo_nodeos_config ARGO_NODEOS_CONFIG] [--perf PERF] [--argo_perf_wrapper ARGO_PERF_WRAPPER] optional arguments: -h, --help show this help message and exit -c FILE, --configuration FILE Specify a config json-formatted config file to override any of the available CLI options. If an option is actually provided on the command-line, it overrides its corresponding value from the configuration file. -d, --print_defaults Print the default configuration file. --nrm_log NRM_LOG Main log file. Override default with the NRM_LOG. environment variable --hwloc HWLOC Path to the hwloc to use. This path can be relative and makes uses of the $PATH if necessary. Override default with the HWLOC environment variable. --argo_nodeos_config ARGO_NODEOS_CONFIG Path to the argo_nodeos_config to use. This path can be relative and makes uses of the $PATH if necessary. Override default with the ARGO_NODEOS_CONFIG environment variable. --perf PERF Path to the linux perf tool to use. This path can be relative and makes uses of the $PATH if necessary. Override default with the PERF environment variable. --argo_perf_wrapper ARGO_PERF_WRAPPER Path to the linux perf tool to use. This path can be relative and makes uses of the $PATH if necessary. Override default with the PERFWRAPPER environment variable. ```
-
- 10 Dec, 2018 2 commits
-
-
Valentin Reis authored
- added correct SIGINT/process ending handling to cmd - fixed kill/list containers - added ZMQ_LINGER 0 to the socket options.
-
Valentin Reis authored
Related to #22
-
- 28 Nov, 2018 5 commits
-
-
Swann Perarnau authored
Make it so that the daemon will delete containers when all commands it is aware of are finished, instead of relying on a single owner that needs to be tracked. This simplifies the handling to multiple commands in the same container, and should not impact the rest.
-
Swann Perarnau authored
Move the container start/exit events to the upstream pub/sub event stream. As these are more of a global event now that we support multiple commands in the same container, it makes sense to move them to the more general event stream. This patch also remove the code in cmd waiting for container start or exit, making (temporarily) the cmd unable to report power metrics. We will fix that in a later commit. This patch fixes complicated issues we had with how a job running multiple commands in the container might not all wait for the end of the container: now none of them do.
-
Swann Perarnau authored
Add a upstream pub client, to be able to listen to messages coming from the daemon on the upstream pub/sub channel. Doesn't support any fancy filter, as that's not used by the daemon so far.
-
Swann Perarnau authored
Ensure that the client that created the container is considered as the one owning it, with the consequence that if its command exits, the container is destroyed. Also deals with the race issue we had on the cmd side.
-
Swann Perarnau authored
Current code sends start/exit events when a container is created and process_start/process_exit when its already there. Instead, have the container start/exit only care about container stuff, and always sends the process start/exit events around. That makes the cmd run fsm easier to work out. Changes the message format a tiny bit. Fixes some missing stdout/stderr issues we had before.
-
- 23 Oct, 2018 1 commit
-
-
Sridutt Bhalachandra authored
Handles container with no power profiling enabled in the manifest file. In such cases the 'exit' response on process termination would generate TypeError.
-