1. 30 Nov, 2012 1 commit
  2. 10 Oct, 2012 1 commit
  3. 12 Jan, 2011 2 commits
  4. 29 Nov, 2010 1 commit
  5. 24 Nov, 2010 1 commit
  6. 29 Apr, 2010 1 commit
  7. 28 Apr, 2010 2 commits
    • Pavan Balaji's avatar
      [svn-r6578] For checkpointing code, use PMI_PORT instead of PMI_FD. · 5160e94f
      Pavan Balaji authored
      With BLCR, since all the processes are restarted by the BLCR library,
      and not by Hydra directly, we cannot provide a new PMI_FD for the MPI
      processes to use. So, we instead use the PMI_PORT mechanism and ask
      the MPI processes to connect back.
      
      This commit also contains a cleanup of the socket fd maintenance
      within the code to distinguish between uninitialized sockets and
      closed sockets. While this part is independent, parts of it overlapped
      with getting the combined PMI_PORT/PMI_FD code working.
      5160e94f
    • Pavan Balaji's avatar
      35929ad8
  8. 24 Apr, 2010 1 commit
  9. 29 Dec, 2009 2 commits
  10. 22 Dec, 2009 1 commit
  11. 02 Dec, 2009 3 commits
  12. 15 Oct, 2009 2 commits
  13. 29 Jul, 2009 1 commit
  14. 01 Jul, 2009 1 commit
  15. 27 Mar, 2009 1 commit
    • Jayesh Krishna's avatar
      [svn-r4213] # Initial cut of distributed proxies support in hydra · e39297d8
      Jayesh Krishna authored
        - launch/shutdown proxies using job launcher
        - launch jobs using the standalone proxy
      # DMX engine can now handle user contexts. The user registers the context when registering the fd & the DMX engine provides the context in the callback.
      # Limitations (will be fixed soon...)
        - Code is a bit hackish... FIXMEs should cover a lot of them
        - Works only on localhost - debugging multiple hosts
        - Does not support MPMD
        - Supports only one job at a time
        - Need to provide complete path to executablea - to be fixed soon
      e39297d8
  16. 22 Mar, 2009 1 commit
  17. 20 Mar, 2009 1 commit
  18. 16 Mar, 2009 2 commits
  19. 12 Mar, 2009 1 commit
    • Pavan Balaji's avatar
      [svn-r4019] Added a connection setup between the job launcher and the proxies in · b5aad9ba
      Pavan Balaji authored
      case of an abnormal exit, so that the runaway processes can be cleaned
      up properly (based on the PID, instead of the executable name).
      
      Algorithm: Each proxy keeps track of the PIDs of the processes it
      launches and listens on a socket for incoming connections from the job
      launcher. If the exit is clean, this socket is not used at all. However,
      if the job launcher wants to kill the application (due to a timeout,
      or an abort by another process in the application), a connection is
      established on this socket and a message sent to the proxy to kill its
      corresponding processes. We only support one command right now (KILLALL).
      
      This should resolve ticket #447.
      b5aad9ba
  20. 10 Mar, 2009 2 commits
  21. 09 Mar, 2009 2 commits
  22. 14 Nov, 2008 1 commit
  23. 13 Nov, 2008 1 commit
  24. 12 Nov, 2008 2 commits
  25. 01 Nov, 2008 2 commits
  26. 29 Oct, 2008 2 commits
  27. 24 Oct, 2008 1 commit