Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
N
nrm
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 16
    • Issues 16
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • argo
  • nrm
  • Issues
  • #17

Closed
Open
Opened Oct 17, 2018 by Sridutt Bhalachandra@sriduttb6 of 6 tasks completed6/6 tasks
  • Report abuse
  • New issue
Report abuse New issue

Multi-node and multi-process support for NRM

The current version of NRM only supports multi-threading and does not support launching multiple independent processes within a container. This is an issue especially with MPI applications wanting to run multiple ranks (processes) within a container.

For NRM to support multiple processes within a container we need to

  • Allow a user to name containers - This will be used to attach incoming processes once a container is created by the first process
  • For applications using MPI to work the environment of MPI launcher needs to be passed to each of the processes in the container
  • cmd will need to handle individual processes in a container as separate clients - This is required to display output of individual processes
  • ContainerManager will need to check for existing containers before requesting resources and allow attaching of processes to a container followed by clean up on termination
  • Daemon needs to support not only creation of containers but attaching processes to existing containers and ensure a graceful clean up on termination by communicating right messages to cmd
  • For supporting multi-node execution with MPI, the Zero MQ library needs to be updated to version >= 4.2.5 and MPICH 3.2 needs to built with PMI PORT and a patch
Edited Oct 17, 2018 by Sridutt Bhalachandra
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: argo/nrm#17