Commit 6d795e18 authored by Rob Latham's avatar Rob Latham

writeup and job script for the "multi rail" challenge

parent b9628c2a
# Multi-Rail (mrail) on summit
The summit notes have two infiniband cards. It's possible for one process to drive both of those cards, but we haven't figure it out yet. Perhaps you can figure it out, or maybe you already know?
## Procedure
1. Observe inter-node performance. libfabric ships with `fi_pingpong` but the libfabric maintainers prefer the `fi_rma` test from libfabric's 'fabtests'
2. switch to the `ofi_mrail` provider
3. The [`fi_mrail` man page][1] ( mentions several ways to express which ports to use. See if you can figure out which one works on Summit
[1]https://ofiwg.github.io/libfabric/master/man/fi_mrail.7.html
## Resources
- The `fabtest.lsf` script shows how we ran the `fi_rma` test on summit between two nodes over one Infiniband link. Probably a good starting point for your experiments
- Several ORNL people publshed a paper at SC 2019 about Summit's networking: [https://dl.acm.org/doi/10.1145/3295500.3356166]
#!/bin/sh
#BSUB -P csc332
#BSUB -W 0:5
#BSUB -nnodes 2
#BSUB -step_cgroup n
#BSUB -J fabtest-rma
#export FI_LOG_LEVEL=debug
# you can use ~robl/soft/fabtests-1.8.1 if you have not installed it yourself.
# Unfortunately, fabtests is not installed with the rest of the libfabric spack
# package
FABTESTS=${HOME}/soft/fabtests-1.8.1
# output of 'fi_info'
#
#verbs:
# version: 1.0
#ofi_rxm:
# version: 1.0
#shm:
# version: 1.1
#ofi_perf_hook:
# version: 1.0
#ofi_noop_hook:
# version: 1.0
#ofi_mrail:
# version: 1.0
# the fabtest client needs a hostname to contact, so we have to do a little bit
# of legwork
HOST=$(grep -v 'batch' $LSB_DJOB_HOSTFILE | head -1)
echo "host: $HOST"
jsrun -n 1 -r 1 -c ALL_CPUS -g ALL_GPUS ${FABTESTS}/bin/fi_rma_bw -S all -p 'verbs' &
# give libfabric a chance to set up
sleep 3
jsrun -x $HOST -n 1 -r 1 -c ALL_CPUS -g ALL_GPUS ${FABTESTS}/bin/fi_rma_bw -m -s $HOST -S all -p 'verbs'
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment