- 23 Jan, 2019 1 commit
-
-
Valentin Reis authored
This commit does two things: - re-indents the message schema to be more readable - lets `cmd listen --filter` print any incoming message, without discriminating on container_uuid. This makes cmd listen usable until a proper application_uuid management is written into nrm.
-
- 09 Jan, 2019 2 commits
-
-
Valentin Reis authored
-
Valentin Reis authored
The option accepts a message type and prints the values in the csv format: msgtype, time, payload if the --filter option is recognized msgtype, time otherwise The print is force-flushed on stdout. This switches CI to the "refactored" CI identifier at argotests/tests
-
- 04 Jan, 2019 3 commits
-
-
Valentin Reis authored
-
Valentin Reis authored
-
Swann Perarnau authored
The option is called uuid, not container
-
- 14 Dec, 2018 1 commit
-
-
Valentin Reis authored
This is not as good as passing part of the manifest options forward, but it still fixes some of the practical problems when using the components together. The code makes sure that the manifests exists, though.
-
- 12 Dec, 2018 1 commit
-
-
Valentin Reis authored
`cmd` now sends a container kill message to the upstream api and exits whenever it receives SIGINT, via C-c for instance.
-
- 10 Dec, 2018 1 commit
-
-
Valentin Reis authored
- added correct SIGINT/process ending handling to cmd - fixed kill/list containers - added ZMQ_LINGER 0 to the socket options.
-
- 28 Nov, 2018 4 commits
-
-
Swann Perarnau authored
Add a listen command to get access to the event stream of the upstream pub/sub API. This patch gives back access from the command line to the power information of a container, including filtering the event stream to only have events relevent to this container. This changes the workflow a little bit for users, but should result in a cleaner access to profiling data in the future. Related to #18.
-
Swann Perarnau authored
Move the container start/exit events to the upstream pub/sub event stream. As these are more of a global event now that we support multiple commands in the same container, it makes sense to move them to the more general event stream. This patch also remove the code in cmd waiting for container start or exit, making (temporarily) the cmd unable to report power metrics. We will fix that in a later commit. This patch fixes complicated issues we had with how a job running multiple commands in the container might not all wait for the end of the container: now none of them do.
-
Swann Perarnau authored
Current code sends start/exit events when a container is created and process_start/process_exit when its already there. Instead, have the container start/exit only care about container stuff, and always sends the process start/exit events around. That makes the cmd run fsm easier to work out. Changes the message format a tiny bit. Fixes some missing stdout/stderr issues we had before.
-
Swann Perarnau authored
Previous merges let the cmd send an empty container uuid, resulting in some issues when the user doesn't provide one. Restore the previous behavior.
-
- 21 Oct, 2018 1 commit
-
-
Swann Perarnau authored
Replace the fragile upstream communications with the new messaging layer, improving the stability and performance of this API. NOTE: this breaks previous clients NOTE: this patch is missing client tracking, to handle children signals.
-
- 17 Oct, 2018 1 commit
-
-
Sridutt Bhalachandra authored
Added multi- node and process support that will allow launching of multiple processes within a container. This is important for enabling use of NRM with MPI applications with multiple processes in a container and thus enabling multi-node executions See Issue #17
-
- 18 Dec, 2017 1 commit
-
-
Swann Perarnau authored
The way 0MQ works on PUB/SUB sockets, publishers might drop messages if subscribers are not detected faster enough. One way to fix it is to have the "server" always bind sockets, and the "client" use connect. This way, the handshake is initiated properly, and the client can publish as soon as the connection is done. This patch makes the daemon bind on the upstream API and the CLI connect, fixing in the process the message dropping we were experiencing before. Long term, we might have a think of using 2 types of sockets for the upstream API: pub/sub for actual events published from the daemon, and a REQ/REP or ROUTER/DEALER pair for "commands".
-
- 15 Dec, 2017 1 commit
-
-
Swann Perarnau authored
This patch implements a small finite state machine on the cmd side to be able to run a command, wait for all of its output, and then exit. As the daemon can send those message in any order, we need to wait them properly, in particular the closing of stdout/stderr before exiting. This patch also fixes the read_until_close callback creation to ensure that the stream EOF is handled as a distinct message.
-
- 14 Dec, 2017 3 commits
-
-
Swann Perarnau authored
This patch adds stdout/stderr streaming capabilities, based on partial evaluation of a tornado.iostream callback. The bin/cmd CLI is updated to wait until an exit message, although that doesn't guaranty anything on message ordering... The next step is obviously to figure out a message flow that allows the CLI to send and receive the command IO properly, in order...
-
Swann Perarnau authored
The logging improvement patch missed a few calls.
-
Swann Perarnau authored
The logging module allow us to configure logging facilities once per process using basicConfig, and then to use globally defined, named, logger objects. This simplifies access to logger objects, their configuration and remove pointers from all objects. This patch refactor all the logging calls to use a single 'nrm' logger object, using those facilities.
-
- 13 Dec, 2017 2 commits
-
-
Swann Perarnau authored
This patch adds a command to kill the parent process of a container based on the container uuid, triggering the death of the container. The os.kill command interacts pretty badly with the custom built children handling, causing us to catch unwanted exceptions in an effort to keep the code running. The waitpid code was also missing a bit about catching children exiting because of signals, so we fixed that. At this point, two things should be paid attention to: - we don't distinguish properly between a container and a command. This will probably cause issues later, as it should be possible to launch multiple programs in the same container, and for partitions to survive the death of the parent process. - the message format is growing more complex, but without any component having strong ownership over it. This will probably cause stability issues in the long term, as the format complexifies and we lose track of the fields expected from everyone.
-
Swann Perarnau authored
This patch adds a very simple command to list the containers currently known by the NRM. There's no history or state tracking on the NRM, so the code is pretty simple. We expect that some of the container tracking doesn't need to be sent for such a command, so the listing also filters some of the fields. This patch also adds an 'event' field to container messages, as it would probably be needed further for other kind of operations.
-
- 11 Dec, 2017 4 commits
-
-
Swann Perarnau authored
The Argus (globalos) launcher had prototype code to read a container manifest, create a container using Judi's code, and map resources using hwloc. This patch brings that code, almost intact, into the NRM repo. This code is quite ugly, and the resource mapping crashes if the kernel configuration isn't right. But it's still a good starting point, and we should be able to improve things little by little. One part in particular needs attention: SIGCHLD handling. We should think of using ioloop-provided facilities to avoid this mess. The patch also contains the associated CLI changes. Note: the messaging format is starting to be difficult to keep in check, as there's conversions and field checks all over the code. See #3 for a possible solution.
-
Swann Perarnau authored
This is the first step in a series of patches to integrate the container launching code from Argus (globalos) into the NRM infrastructure. This patch creates a valid command on the CLI, and sends the necessary info to the NRM. We still need to take care of the actual container creation. Note that the CLI waits for an event indicating that the container was launched, at that at this point the event is never generated by the NRM.
-
Swann Perarnau authored
This commit changes the message format for the upstream API, to use a json-encoded dictionary. While the format is not set in stone at this point, the goal is to slowly move into a proper protocol, with well-defined fields to the messages, and proper mechanisms to send commands and receive notification of their completion. The only current user of this API is the power management piece, and this change breaks the GRM code maintained outside of this repo. We will need to reconcile the two implementation once the message protocol gets more stable. Related to #1 and #6.
-
Swann Perarnau authored
Only supports setpower for now, and while it should work in theory, the current code doesn't have a way to check if the command was received, as the daemon never advertise the current limit. We need to change the protocol at this point. This also fixes a bug in the daemon code, that was expecting a single string as a message, instead of a list of parts, as zmqstream always receives.
-