- 13 Dec, 2017 2 commits
-
-
Swann Perarnau authored
Container launching implementation See merge request !3
-
Swann Perarnau authored
This patch refactor the resource management and hwloc code into a working, albeit very simple scheduling policy. Indeed, the previous code contained strong assumptions about the output of hwloc matching an Argo NodeOS configuration used during the previous phase of the project, that always contained enough CPUs and Mems to perform exclusive scheduling. The current version is simpler, but should work on more regular systems. The patch also improves code organization so that introducing more complex scheduling algorithms will be simpler. The testing of this code resulted in the discovery of simple bugs in the daemon children handling code, which should work now.
-
- 11 Dec, 2017 5 commits
-
-
Swann Perarnau authored
The Argus (globalos) launcher had prototype code to read a container manifest, create a container using Judi's code, and map resources using hwloc. This patch brings that code, almost intact, into the NRM repo. This code is quite ugly, and the resource mapping crashes if the kernel configuration isn't right. But it's still a good starting point, and we should be able to improve things little by little. One part in particular needs attention: SIGCHLD handling. We should think of using ioloop-provided facilities to avoid this mess. The patch also contains the associated CLI changes. Note: the messaging format is starting to be difficult to keep in check, as there's conversions and field checks all over the code. See #3 for a possible solution.
-
Swann Perarnau authored
This is the first step in a series of patches to integrate the container launching code from Argus (globalos) into the NRM infrastructure. This patch creates a valid command on the CLI, and sends the necessary info to the NRM. We still need to take care of the actual container creation. Note that the CLI waits for an event indicating that the container was launched, at that at this point the event is never generated by the NRM.
-
Swann Perarnau authored
Basic implementation of the command line interface, power API improvements See merge request !2
-
Swann Perarnau authored
This commit changes the message format for the upstream API, to use a json-encoded dictionary. While the format is not set in stone at this point, the goal is to slowly move into a proper protocol, with well-defined fields to the messages, and proper mechanisms to send commands and receive notification of their completion. The only current user of this API is the power management piece, and this change breaks the GRM code maintained outside of this repo. We will need to reconcile the two implementation once the message protocol gets more stable. Related to #1 and #6.
-
Swann Perarnau authored
Only supports setpower for now, and while it should work in theory, the current code doesn't have a way to check if the command was received, as the daemon never advertise the current limit. We need to change the protocol at this point. This also fixes a bug in the daemon code, that was expecting a single string as a message, instead of a list of parts, as zmqstream always receives.
-
- 08 Dec, 2017 8 commits
-
-
Swann Perarnau authored
Remove unneeded module, clean up calls to the logger.
-
Swann Perarnau authored
GRM/NRM Integration See merge request !1
-
Swann Perarnau authored
-
Swann Perarnau authored
The previous commit added pub/sub communications in the wrong places, creating synchronizations in an asynchronous event loop. This commit fixes those issues, adding the upstream (GRM/Flux) flow to the event loop, and renaming objects here and there for clarity.
-
-
-
-
Swann Perarnau authored
The previous assert could not be valid, causing the test to fail all the time.
-
- 06 Sep, 2017 1 commit
-
-
Srinivasan Ramesh authored
-
- 05 Sep, 2017 2 commits
-
-
Srinivasan Ramesh authored
-
Srinivasan Ramesh authored
-
- 30 Aug, 2017 7 commits
-
-
Swann Perarnau authored
This patch is the last link between coolr and the daemon. We now create a dictionary of machine information inside the sensor manager, and give this dictionary back to the daemon. The daemon can then use the real data for control. We still need to receive the target power from somewhere, and that will come later.
-
Swann Perarnau authored
Based on the very useful github.com/github/gitignore repository.
-
Swann Perarnau authored
clr_hwmon was also merging sampling data generation and formatting that data into a json string. This patch removes the formatting, for the same reason than the previous patch. Also removes __main__ code from module.
-
Swann Perarnau authored
clr_rapl was merging sampling data generation and formatting that same data into a json string. This patch removes the json formatting, to let users of the module use the data in a python structure. The patch also removes __main__ code from clr_rapl, as it is unnecessary here. We might end up reusing some of this code in unit tests later, but not right now.
-
Swann Perarnau authored
This a partial import of github.com/coolr-hpc/pycoolr from master branch, commit id: 67e7aa4b89b67744922b5926cd1459adf650013b Coolr will provide us the capability to read power, topology and msr-based sensors. This patch links the sensor module with the coolr code as it is, which doesn't really work. The core issue is that the current coolr code is meant to be stand-alone, and the main functions to both sampling and formatting of the data in json, by hand. We will solve that in the next commit, removing the json-specific code from the sample function to create a dict of values instead.
-
Swann Perarnau authored
Create a `sensor` module that will handle the interaction with coolr and return updated machine information regularly. The code code is as dumb as the previous one, but the structure is improved, which should help for the next round of updates with coolr.
-
Swann Perarnau authored
Giving the deadlines we have, it's probably a good idea to keep things simple and keep the application facing protocol exactly as it is, and restrict it to just applications. This way we can keep using the argobots tests as valid benchmarks, and at the same start building a decent communication protocol on a different socket, with a better interface. To clarify that, the daemon now use the word application to refer to clients connecting on the "legacy" interface. We'll add a different socket and start building a real protocol in future commits.
-
- 29 Aug, 2017 1 commit
-
-
Swann Perarnau authored
The previous code was entirely inside the bin directory, which is not a good idea in the long term. This patch move everything inside the nrm package, so that we can start building a proper code.
-
- 25 Apr, 2017 1 commit
-
-
Swann Perarnau authored
We chose to rewrite the entire thing in python. The language should make it easy to interact will all the moving parts of the Argo landscape, and easy to prototype various control schemes. The communication protocol is exactly the same, but implemented with ZeroMQ + tornado. Power readings are not integrated yet, we are targeting using the Coolr project for that. This is a rough draft, all the code is in binary scripts instead of the package, and there are no unit tests. Nevertheless, it should be a decent starting point for future development.
-
- 14 Apr, 2017 1 commit
-
-
Swann Perarnau authored
The SC15/Chameleon experiments were based on a simple power management scheme built with the help of beacon for transport, RAPL for monitoring, and socket-based communications between the NRM and the argobots runtime. This is an import of the working code we had, in the state it was at the time. It is quite obvious that this kind of one time hack, and it probably doesn't work without the exact Chameleon setup.
-