Multi-node and multi-process support for NRM
The current version of NRM only supports multi-threading and does not support launching multiple independent processes within a container. This is an issue especially with MPI applications wanting to run multiple ranks (processes) within a container.
For NRM to support multiple processes within a container we need to
- Allow a user to name containers - This will be used to attach incoming processes once a container is created by the first process
- For applications using MPI to work the environment of MPI launcher needs to be passed to each of the processes in the container
-
cmd
will need to handle individual processes in a container as separate clients - This is required to display output of individual processes -
ContainerManager
will need to check for existing containers before requesting resources and allow attaching of processes to a container followed by clean up on termination -
Daemon
needs to support not only creation of containers but attaching processes to existing containers and ensure a graceful clean up on termination by communicating right messages tocmd
- For supporting multi-node execution with MPI, the Zero MQ library needs to be updated to version >= 4.2.5 and MPICH 3.2 needs to built with PMI PORT and a patch