- 08 Aug, 2018 3 commits
-
-
Sridutt Bhalachandra authored
Added initialization of power policy using the manifest file and fixed appropriate TODOs in `PowerPolicyManager` (PPM) Class using container information (except adding new power policies) See Issue #10
-
Sridutt Bhalachandra authored
`ContainerManager` Class The container creation information added to `ContainerManager` class can be used to intiliaze power policy parameters in the present and more in the future See Issue #10
-
Sridutt Bhalachandra authored
Added damper and slowdown parameters to manifest file that can be used to initiliaze power policy parameters See Issue #10
-
- 25 Jul, 2018 2 commits
-
-
Sridutt Bhalachandra authored
The Power Policy Manager will allow invocation of all power policies by the NRM See Issue #11
-
Sridutt Bhalachandra authored
DDCM based power policy is aimed to mitigate workload imbalance in parallel applications that use barrier synchronizations (E.g. MPI). It reduces the duty cycle of CPUs not on the critical path of execution thereby reducing energy with little or no adverse impact on performance. See Issue #11
-
- 19 Jul, 2018 2 commits
-
-
Sridutt Bhalachandra authored
Added Duty Cycle module to set, reset and check Duty Cycle of a CPU. This module makes use of the MSR module See Issue #11
-
Sridutt Bhalachandra authored
Added Model Specific Register (MSR) module to allow access to MSRs. This module provides the interfaces to read and write msr through msr_safe kernel module. See Issue #11
-
- 17 Jul, 2018 2 commits
-
-
Sridutt Bhalachandra authored
On enabling the powepolicy manifest option and setting the policy parameter to any valid value (except "NONE") the application library providing contextual information is loaded using LD_PRELOAD See issue #10
-
Sridutt Bhalachandra authored
-
- 16 Jul, 2018 1 commit
-
-
Sridutt Bhalachandra authored
-
- 03 Jul, 2018 1 commit
-
-
Swann Perarnau authored
Trivial style corrections.
-
- 21 Dec, 2017 1 commit
-
-
Swann Perarnau authored
Small fixes to correct for wrong actions on power control over a real load.
-
- 20 Dec, 2017 3 commits
-
-
Swann Perarnau authored
Change the PowerActuator to be able to lower the power limit. Because RAPL doesn't provide an actual lower limit, we use 0 as the minimal power.
-
Swann Perarnau authored
This patch adds a poweractuator based on rapl settings available through the sensor manager. Adding this actuator forces us to use a list of actuators in the controller, changing a bit the structure of the code.
-
Swann Perarnau authored
This patch introduce one more level of abstraction to the controller: an actuator. Actuators will act as the middleman between specific managers and the controller, while providing enough info to implement actual models on top. For now, we only have the application threads actuator.
-
- 19 Dec, 2017 7 commits
-
-
Kamil Iskra authored
-
Swann Perarnau authored
The "control" part of the NRM is bound to change and become more complex in the near future, so move it in its own module. This refactor also introduce some controller logic. Control is split into 3 steps: planning, execution and updates. The goal is to use this new code organization as a way to abstract different control policies that could be implemented later. Note that we might at some point move into a "control manager" and a bunch of "policies" and "actuators", as a way of matching typical control theory vocabulary.
-
Kamil Iskra authored
-
Swann Perarnau authored
Fixes a copy/paste mistake on the name of the callback to trigger on stderr events.
-
Swann Perarnau authored
This patch fixes the daemon code to include the container uuid in the environment of the command, while changing that environment variable to use a better suited name.
-
Swann Perarnau authored
This patch replace the client code (bin/client and nrm/client) by a new application code that integrates progress reports and uses the new downstream API. While git is reporting that both codes are different, the app code is basically a refactoring and adaptation of the client code. This is directly related to issue #2.
-
Swann Perarnau authored
This patch moves the tracking of applications clients of the downstream API into a ApplicationManager, that is able to track progress and thread management. This change is necessary in the long term to build a comprehensive downstream API and centralize the management of application tracking. Note that this tracking is currently independent of the container and pid tracking, and that might be a problem in the long term.
-
- 18 Dec, 2017 3 commits
-
-
Swann Perarnau authored
This patch refactors the downstream API to use pub/sub socket pair, like the upstream API. This is part of the effort to improve the downstream API. See #2. This patch doesn't touch the client module, which will be adapted in future commits.
-
Swann Perarnau authored
The way 0MQ works on PUB/SUB sockets, publishers might drop messages if subscribers are not detected faster enough. One way to fix it is to have the "server" always bind sockets, and the "client" use connect. This way, the handshake is initiated properly, and the client can publish as soon as the connection is done. This patch makes the daemon bind on the upstream API and the CLI connect, fixing in the process the message dropping we were experiencing before. Long term, we might have a think of using 2 types of sockets for the upstream API: pub/sub for actual events published from the daemon, and a REQ/REP or ROUTER/DEALER pair for "commands".
-
Swann Perarnau authored
Previous commit 0c93ce6a broke the sample code used by the daemon, by reverting the sample function to a json message generator. This is due to inconsistencies between the coolr code and the NRM import: we removed json generation from coolr, to push it on the messaging side, while upstream still does it on sensor reading. This commit fixes that, but doesn't touch the new test code embedded in clr_rapl.py We will move that the test infrastructure later.
-
- 17 Dec, 2017 1 commit
-
-
Kazutomo Yoshii authored
-
- 15 Dec, 2017 2 commits
-
-
Swann Perarnau authored
This patch implements a small finite state machine on the cmd side to be able to run a command, wait for all of its output, and then exit. As the daemon can send those message in any order, we need to wait them properly, in particular the closing of stdout/stderr before exiting. This patch also fixes the read_until_close callback creation to ensure that the stream EOF is handled as a distinct message.
-
Swann Perarnau authored
The daemon code was maintaining its own container tracker using pids, instead of using the one in the container manager. This patch removes this additional tracking, and let the daemon side deal with an actual namedtuple.
-
- 14 Dec, 2017 7 commits
-
-
Swann Perarnau authored
This patch adds stdout/stderr streaming capabilities, based on partial evaluation of a tornado.iostream callback. The bin/cmd CLI is updated to wait until an exit message, although that doesn't guaranty anything on message ordering... The next step is obviously to figure out a message flow that allows the CLI to send and receive the command IO properly, in order...
-
Swann Perarnau authored
This patch propagates the process object into the container namedtuple, fix a couple of bad function calls and adapt the run command handler to use that process object instead of just the pid of it.
-
Swann Perarnau authored
Use the new argo_nodeos_config --exec feature in development. Allow us to delegate fork+attach+exec to argo_nodeos_config, and simplifying the create command as a result. We use tornado.process to wrap this command, as we want to able to stream stdout/stderr in the future. This patch also misuse, the 'pid' field of the container namedtuple to save the tornado.process.Subprocess object itself, so some functions need to be adapted.
-
Swann Perarnau authored
The logging improvement patch missed a few calls.
-
Swann Perarnau authored
Remove unused import, commas at the end of dictionaries.
-
Swann Perarnau authored
The logging module allow us to configure logging facilities once per process using basicConfig, and then to use globally defined, named, logger objects. This simplifies access to logger objects, their configuration and remove pointers from all objects. This patch refactor all the logging calls to use a single 'nrm' logger object, using those facilities.
-
Swann Perarnau authored
Implement an update allocation function to be able to update resource tracking when containers are created and deleted. The commit should make it easier to improve the resource manager later on.
-
- 13 Dec, 2017 3 commits
-
-
Swann Perarnau authored
This patch adds a command to kill the parent process of a container based on the container uuid, triggering the death of the container. The os.kill command interacts pretty badly with the custom built children handling, causing us to catch unwanted exceptions in an effort to keep the code running. The waitpid code was also missing a bit about catching children exiting because of signals, so we fixed that. At this point, two things should be paid attention to: - we don't distinguish properly between a container and a command. This will probably cause issues later, as it should be possible to launch multiple programs in the same container, and for partitions to survive the death of the parent process. - the message format is growing more complex, but without any component having strong ownership over it. This will probably cause stability issues in the long term, as the format complexifies and we lose track of the fields expected from everyone.
-
Swann Perarnau authored
This patch adds a very simple command to list the containers currently known by the NRM. There's no history or state tracking on the NRM, so the code is pretty simple. We expect that some of the container tracking doesn't need to be sent for such a command, so the listing also filters some of the fields. This patch also adds an 'event' field to container messages, as it would probably be needed further for other kind of operations.
-
Swann Perarnau authored
This patch refactor the resource management and hwloc code into a working, albeit very simple scheduling policy. Indeed, the previous code contained strong assumptions about the output of hwloc matching an Argo NodeOS configuration used during the previous phase of the project, that always contained enough CPUs and Mems to perform exclusive scheduling. The current version is simpler, but should work on more regular systems. The patch also improves code organization so that introducing more complex scheduling algorithms will be simpler. The testing of this code resulted in the discovery of simple bugs in the daemon children handling code, which should work now.
-
- 11 Dec, 2017 2 commits
-
-
Swann Perarnau authored
The Argus (globalos) launcher had prototype code to read a container manifest, create a container using Judi's code, and map resources using hwloc. This patch brings that code, almost intact, into the NRM repo. This code is quite ugly, and the resource mapping crashes if the kernel configuration isn't right. But it's still a good starting point, and we should be able to improve things little by little. One part in particular needs attention: SIGCHLD handling. We should think of using ioloop-provided facilities to avoid this mess. The patch also contains the associated CLI changes. Note: the messaging format is starting to be difficult to keep in check, as there's conversions and field checks all over the code. See #3 for a possible solution.
-
Swann Perarnau authored
This is the first step in a series of patches to integrate the container launching code from Argus (globalos) into the NRM infrastructure. This patch creates a valid command on the CLI, and sends the necessary info to the NRM. We still need to take care of the actual container creation. Note that the CLI waits for an event indicating that the container was launched, at that at this point the event is never generated by the NRM.
-