案例验证:分析NCCL-Tests运行日志优化Scale-Out网络拓扑
背景:All-reduce 和 Ring 算法
GPU并行计算中需要大规模地在计算节点之间同步参数梯度,产生了大量的集合通信流量。为了优化集合通信性能,业界开发了不同的集合通信库(xCCL),其核心都是实现 All-Reduce,这也是分布式训练最主要的通信方式。
LLM训练中的 All Reduce 操作一般分为三个步骤:
- 把每个节点的数据切分成N份;
- 通过reduce-scatter,让每个节点都得到1/N的完整数据块;
- 通过all-gather,让所有节点的每个1/N数据块都变得完整
基于这种流量模式,Ring算法是目前实现该操作最常见的基础算法之一。
顾名思义,Ring算法构建了一个环形网络——每个节点的数据会被切分成N份数据在所有GPU之间移动,且每个GPU只和相邻的GPU通信。这种流水线模式能充分利用所有节点的发送和接收带宽,减少 GPU 等待数据的空闲时间,同时也改善了传输大数据块时的性能和时延抖动问题。(但对于小规模数据传输,Ring算法可能会表现出较高的延迟和低效。)
工具说明:NCCL-Tests
NVIDIA提供的NCCL是当前面向AI的集合通信事实标准,NCCL-Test 是 NVIDIA 开源的工具,我们可以在官方Github下载来进行不同算法的性能测试(例如:ring,trees…)。本次测试使用All reduce的ring算法来进行性能评估。
root@bm-2204kzq:~# /usr/local/openmpi/bin/mpirun #多机集群测试需要使用MPI方式执行
--allow-run-as-root
-bind-to none #不将进程绑定到特定的CPU核心
-H 172.17.0.215:8,172.17.0.81:8 # host列表,:后指定每台机器要用的GPU数量
-np 16 #指定要运行的进程数,等于总GPU数量
-x NCCL_SOCKET_NTHREADS=16
-mca btl_tcp_if_include bond0
-mca pml ^ucx -mca btl ^openib #指定BTL的value为'^openib'
-x NCCL_DEBUG=INFO #NCCL的调试级别为info
-x NCCL_IB_GID_INDEX=3
-x NCCL_IB_HCA=mlx5_0:1,mlx5_2:1,mlx5_3:1,mlx5_4:1
-x NCCL_SOCKET_IFNAME=bond0 #指定了 NCCL 使用的网络接口
-x UCX_TLS=sm,ud #调整MPI使用的传输模式
-x LD_LIBRARY_PATH -x PATH
-x NCCL_IBEXT_DISABLE=1 #如使用RoCE网络,此处应禁用
-x NCCL_ALGO=ring
/root/nccl-tests/build/all_reduce_perf -b 512 -e 18G -f 2 -g 1 #执行all reduce操作
NCCL-Tests常用参数及解释
- GPU 数量
-t
,--nthreads <num threads>
每个进程的线程数量配置, 默认 1;-g
,--ngpus <GPUs per thread>
每个线程的 GPU 数量,默认 1;
- 数据大小配置
-b
,--minbytes <min size in bytes>
开始的最小数据量,默认 32M;-e
,--maxbytes <max size in bytes>
结束的最大数据量,默认 32M;
- 数据步长设置
-i
,--stepbytes <increment size>
每次增加的数据量,默认: 1M;-f
,--stepfactor <increment factor>
每次增加的倍数,默认禁用;
- NCCL 操作相关配置
-o
,--op <sum/prod/min/max/avg/all>
指定哪种操作为reduce,仅适用于Allreduce、Reduce或ReduceScatter等操作。默认值为:求和(Sum);-d
,--datatype <nccltype/all>
指定使用哪种数据类型,默认 : Float;
- 性能相关配置
-n
,--iters <iteration count>
每次操作(一次发送)循环多少次,默认 : 20;-w
,--warmup_iters <warmup iteration count>
预热迭代次数(不计时),默认:5;-m
,--agg_iters <aggregation count>
每次迭代中要聚合在一起的操作数,默认:1;-a
,--average <0/1/2/3>
在所有 ranks 计算均值作为最终结果 (MPI=1 only). <0=Rank0,1=Avg,2=Min,3=Max>,默认:1;
- 测试相关配置
-p
,--parallel_init <0/1>
使用线程并行初始化 NCCL,默认: 0;-c
,--check <0/1>
检查结果的正确性。在大量GPU上可能会非常慢,默认:1;-
z,--blocking <0/1>
使NCCL集合阻塞,即在每个集合之后让CPU等待和同步,默认:0;-G
,--cudagraph <num graph launches>
将迭代作为CUDA图形捕获,然后重复指定的次数,默认:0;
案例验证:优化GPU互连拓扑
下图是一个未优化的双机8卡(H20)组网测试拓扑:
按照一般CPU云数据中心的连接方式,将同服务器的网卡连接到一台交换机上,两台交换机之间有4条400G链路相连。参与测试的为星融元(Asterfusion)交换机(CX732Q-N,32 x 400GE QSFP-DD, 2 x 10GE SFP+)。
NCCL-Test 性能测试结果
out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
512 128 float sum -1 56.12 0.01 0.02 0 54.54 0.01 0.02 0
1024 256 float sum -1 55.09 0.02 0.03 0 53.85 0.02 0.04 0
2048 512 float sum -1 55.67 0.04 0.07 0 54.84 0.04 0.07 0
4096 1024 float sum -1 55.70 0.07 0.14 0 55.05 0.07 0.14 0
8192 2048 float sum -1 56.36 0.15 0.27 0 56.53 0.14 0.27 0
16384 4096 float sum -1 57.21 0.29 0.54 0 57.02 0.29 0.54 0
32768 8192 float sum -1 60.74 0.54 1.01 0 59.87 0.55 1.03 0
65536 16384 float sum -1 67.42 0.97 1.82 0 68.41 0.96 1.80 0
131072 32768 float sum -1 109.6 1.20 2.24 0 108.8 1.20 2.26 0
262144 65536 float sum -1 108.3 2.42 4.54 0 108.3 2.42 4.54 0
524288 131072 float sum -1 115.0 4.56 8.55 0 112.8 4.65 8.72 0
1048576 262144 float sum -1 135.0 7.77 14.57 0 129.4 8.10 15.19 0
2097152 524288 float sum -1 144.6 14.51 27.20 0 142.9 14.67 27.51 0
4194304 1048576 float sum -1 222.0 18.89 35.43 0 220.0 19.07 35.75 0
8388608 2097152 float sum -1 396.5 21.15 39.66 0 392.1 21.40 40.12 0
16777216 4194304 float sum -1 736.3 22.78 42.72 0 904.7 18.55 34.77 0
33554432 8388608 float sum -1 1405.5 23.87 44.76 0 1542.0 21.76 40.80 0
67108864 16777216 float sum -1 2679.0 25.05 46.97 0 2721.0 24.66 46.24 0
134217728 33554432 float sum -1 5490.1 24.45 45.84 0 5291.6 25.36 47.56 0
268435456 67108864 float sum -1 10436 25.72 48.23 0 11788 22.77 42.70 0
536870912 134217728 float sum -1 25853 20.77 38.94 0 23436 22.91 42.95 0
1073741824 268435456 float sum -1 47974 22.38 41.97 0 54979 19.53 36.62 0
2147483648 536870912 float sum -1 117645 18.25 34.23 0 117423 18.29 34.29 0
4294967296 1073741824 float sum -1 248208 17.30 32.44 0 229171 18.74 35.14 0
8589934592 2147483648 float sum -1 474132 18.12 33.97 0 476988 18.01 33.77 0
17179869184 4294967296 float sum -1 949191 18.10 33.94 0 965703 17.79 33.36 0
# Out of bounds values : 0 OK
size (B)
:操作处理的数据的大小,以字节为单位;count (elements)
:操作处理的元素的数量;type
:元素的数据类型;redo p
:使用的归约操作;root
:-1 表示这个操作没有根节点(all-reduce 操作涉及到所有的节点);time (us)
:操作的执行时间,以微秒为单位;algbw (GB/s)
:算法带宽,以 GB/s 为单位;busbw (GB/s)
:总线带宽,以 GB/s 为单位;wrong
:错误的数量,如果这个值不是 0,那可能表示有一些错误发生。
查看结果时需要关注如下几点:
- 数据量增加时,带宽是否会下降(下降明显不符合预期);
- 带宽的峰值,每次算到的带宽峰值,可以只关注 in 或者 out;
- 平均值,在数据量递增的情况下,可能无法体现最终的结果;
- 请确保数据量足够大,可以压到带宽上限(通过调整 b、e 或者 n 选项)。
分析以上信息可以发现:平均总线带宽仅22GB/s,在达到47GB/s左右的峰值流量后,随着数据量越大带宽性能却在下降,与正常值相差甚远。
机内拓扑分析
通过 nvidia-smi topo -m
可以得知机内设备拓扑
将上表转化为如下示意图:
NCCL通信路径分析
NCCL中用Channel的概念表示一个通信路径,在初始化的过程会自动感知拓扑并计算最佳的通信路径。为了更好的利用带宽和网卡实现并发通信,NCCL会使用多channel。NCCL-test运行日志里列出了16组channel如下:
### ChannelNum:16
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 00/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 01/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 02/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 03/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 04/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 05/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 06/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 07/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 08/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 09/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 10/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 11/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 12/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 13/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 14/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 15/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
Device map 显示 Rank #0-7、#8-15在同一服务器
### Device maps
## GPU map
# Rank 0 Group 0 Pid 252978 on bm-2204kzq device 0 [0x0f] NVIDIA H20
# Rank 1 Group 0 Pid 252979 on bm-2204kzq device 1 [0x34] NVIDIA H20
# Rank 2 Group 0 Pid 252980 on bm-2204kzq device 2 [0x48] NVIDIA H20
# Rank 3 Group 0 Pid 252981 on bm-2204kzq device 3 [0x5a] NVIDIA H20
# Rank 4 Group 0 Pid 252982 on bm-2204kzq device 4 [0x87] NVIDIA H20
# Rank 5 Group 0 Pid 252983 on bm-2204kzq device 5 [0xae] NVIDIA H20
# Rank 6 Group 0 Pid 252984 on bm-2204kzq device 6 [0xc2] NVIDIA H20
# Rank 7 Group 0 Pid 252985 on bm-2204kzq device 7 [0xd7] NVIDIA H20
# Rank 8 Group 0 Pid 253834 on bm-2204qhn device 0 [0x0f] NVIDIA H20
# Rank 9 Group 0 Pid 253835 on bm-2204qhn device 1 [0x34] NVIDIA H20
# Rank 10 Group 0 Pid 253836 on bm-2204qhn device 2 [0x48] NVIDIA H20
# Rank 11 Group 0 Pid 253837 on bm-2204qhn device 3 [0x5a] NVIDIA H20
# Rank 12 Group 0 Pid 253838 on bm-2204qhn device 4 [0x87] NVIDIA H20
# Rank 13 Group 0 Pid 253839 on bm-2204qhn device 5 [0xae] NVIDIA H20
# Rank 14 Group 0 Pid 253840 on bm-2204qhn device 6 [0xc2] NVIDIA H20
# Rank 15 Group 0 Pid 253841 on bm-2204qhn device 7 [0xd7] NVIDIA H20
结合每个channel的具体路径信息(详见文末),在所有16条channel下的机间流量仅有以下8种固定的rank组合:10-0、2-8、1-10、9-2、6-12、14-4、5-14、13-6,对应的,产生通信的网卡有且仅有:
<bm-2204kzq> <bm-2204qhn>
NIC0: mlx5_0 <---> NIC2: mlx5_2
NIC2: mlx5_2
NIC3: mlx5_3 <---> NIC4: mlx5_4
NIC4: mlx5_4
优化前性能不佳的原因是: 所有跨节点的并行流量都需跨交换机在四条互联链路上负载均衡,而现有的ECMP负载均衡对大流不够友好,形成了性能瓶颈。
所以在设计Scale-out网络拓扑的时候,我们应让集群内所有同轨道的网卡连接在一台交换机上,使集群性能达到最优。
按此方式调整后,测得单机四卡模式跨RoCE交换机(CX732Q-N)的总线带宽与网卡直连数值相近,约195GB/s
。
附录
## NIC map
bm-2204kzq:252982:252982 [*] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0
bm-2204kzq:252982:252982 [*] NCCL INFO Bootstrap : Using bond0:172.17.0.215<0>
bm-2204kzq:252982:252982 [*] NCCL INFO NCCL version 2.22.3+cuda12.6
bm-2204kzq:252985:253055 [*] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [1]mlx5_3:1/RoCE [2]mlx5_4:1/RoCE [3]mlx5_0:1/RoCE [RO]; OOB bond0:172.17.0.215<0>
bm-2204qhn:253837:253837 [*] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0
bm-2204qhn:253837:253837 [*] NCCL INFO Bootstrap : Using bond0:172.17.0.81<0>
bm-2204qhn:253837:253837 [*] NCCL INFO NCCL version 2.22.3+cuda12.6
bm-2204qhn:253840:253908 [*] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [RO]; OOB bond0:172.17.0.81<0>
### ChannelNum:16
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 00/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 01/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 02/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 03/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 04/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 05/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 06/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 07/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 08/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 09/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 10/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 11/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 12/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 13/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 14/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 15/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
## Channel C0
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 00/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 00/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 00/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 00/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 00/0 : 2[2] -> 8[0] [send] via NET/IB/3(0)/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 00/0 : 2[2] -> 8[0] [receive] via NET/IB/0/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 00/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 00/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 00/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 00/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 00/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 00/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 00/0 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA mlx5_0:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 00/0 : 10[2] -> 0[0] [receive] via NET/IB/3/GDRDMA mlx5_0:1/RoCE
## Channel C1
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 01/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 01/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 01/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 01/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 01/0 : 1[1] -> 10[2] [send] via NET/IB/0(2)/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 01/0 : 1[1] -> 10[2] [receive] via NET/IB/1/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 01/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 01/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 01/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 01/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 01/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 01/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 01/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 01/0 : 9[1] -> 2[2] [send] via NET/IB/1(10)/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 01/0 : 9[1] -> 2[2] [receive] via NET/IB/0/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 01/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C2
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 02/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 02/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 02/0 : 6[6] -> 12[4] [send] via NET/IB/1(4)/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 02/0 : 6[6] -> 12[4] [receive] via NET/IB/2/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 02/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 02/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 02/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 02/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 02/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 02/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 02/0 : 14[6] -> 4[4] [send] via NET/IB/2(12)/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 02/0 : 14[6] -> 4[4] [receive] via NET/IB/1/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 02/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C3
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 03/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 03/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 03/0 : 5[5] -> 14[6] [send] via NET/IB/2(6)/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 03/0 : 5[5] -> 14[6] [receive] via NET/IB/3/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 03/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 03/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 03/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 03/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 03/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 03/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 03/0 : 13[5] -> 6[6] [send] via NET/IB/3(14)/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 03/0 : 13[5] -> 6[6] [receive] via NET/IB/2/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 03/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 03/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 03/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C4
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 04/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 04/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 04/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 04/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 04/0 : 2[2] -> 8[0] [send] via NET/IB/3(0)/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 04/0 : 2[2] -> 8[0] [receive] via NET/IB/0/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 04/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 04/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 04/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 04/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 04/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 04/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 04/0 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA mlx5_0:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 04/0 : 10[2] -> 0[0] [receive] via NET/IB/3/GDRDMA mlx5_0:1/RoCE
## Channel C5
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 05/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 05/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 05/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 05/0 : 1[1] -> 10[2] [send] via NET/IB/0(2)/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 05/0 : 1[1] -> 10[2] [receive] via NET/IB/1/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 05/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 05/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 05/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 05/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 05/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 05/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 05/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 05/0 : 9[1] -> 2[2] [send] via NET/IB/1(10)/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 05/0 : 9[1] -> 2[2] [receive] via NET/IB/0/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 05/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C6
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 06/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 06/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 06/0 : 6[6] -> 12[4] [send] via NET/IB/1(4)/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 06/0 : 6[6] -> 12[4] [receive] via NET/IB/2/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 06/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 06/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 06/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 06/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 06/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 06/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 06/0 : 14[6] -> 4[4] [send] via NET/IB/2(12)/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 06/0 : 14[6] -> 4[4] [receive] via NET/IB/1/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 06/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C7
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 07/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 07/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 07/0 : 5[5] -> 14[6] [send] via NET/IB/2(6)/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 07/0 : 5[5] -> 14[6] [receive] via NET/IB/3/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 07/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 07/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 07/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 07/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 07/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 07/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 07/0 : 13[5] -> 6[6] [send] via NET/IB/3(14)/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 07/0 : 13[5] -> 6[6] [receive] via NET/IB/2/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 07/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 07/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 07/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C8
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 08/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 08/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 08/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 08/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 08/0 : 2[2] -> 8[0] [send] via NET/IB/3(0)/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 08/0 : 2[2] -> 8[0] [receive] via NET/IB/0/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 08/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 08/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 08/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 08/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 08/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 08/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 08/0 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA mlx5_0:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 08/0 : 10[2] -> 0[0] [receive] via NET/IB/3/GDRDMA mlx5_0:1/RoCE
## Channel C9
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 09/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 09/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 09/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 09/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 09/0 : 1[1] -> 10[2] [send] via NET/IB/0(2)/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 09/0 : 1[1] -> 10[2] [receive] via NET/IB/1/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 09/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 09/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 09/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 09/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 09/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 09/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 09/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 09/0 : 9[1] -> 2[2] [send] via NET/IB/1(10)/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 09/0 : 9[1] -> 2[2] [receive] via NET/IB/0/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 09/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C10
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 10/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 10/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 10/0 : 6[6] -> 12[4] [send] via NET/IB/1(4)/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 10/0 : 6[6] -> 12[4] [receive] via NET/IB/2/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 10/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 10/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 10/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 10/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 10/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 10/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 10/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 10/0 : 14[6] -> 4[4] [send] via NET/IB/2(12)/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 10/0 : 14[6] -> 4[4] [receive] via NET/IB/1/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 10/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 10/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C11
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 11/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 11/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 11/0 : 5[5] -> 14[6] [send] via NET/IB/2(6)/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 11/0 : 5[5] -> 14[6] [receive] via NET/IB/3/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 11/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 11/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 11/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 11/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 11/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 11/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 11/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 11/0 : 13[5] -> 6[6] [send] via NET/IB/3(14)/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 11/0 : 13[5] -> 6[6] [receive] via NET/IB/2/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 11/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 11/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 11/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C12
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 12/16 : 0 7 5 6 4 3 1 2 8 15 13 14 12 11 9 10
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 12/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 12/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 12/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 12/0 : 2[2] -> 8[0] [send] via NET/IB/3(0)/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 12/0 : 2[2] -> 8[0] [receive] via NET/IB/0/GDRDMA mlx5_0:1/RoCE
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 12/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 12/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 12/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 12/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 12/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 12/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 12/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 12/0 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA mlx5_0:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 12/0 : 10[2] -> 0[0] [receive] via NET/IB/3/GDRDMA mlx5_0:1/RoCE
## Channel C13
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 13/16 : 0 7 5 6 4 3 1 10 8 15 13 14 12 11 9 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 13/0 : 2[2] -> 0[0] via P2P/CUMEM
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 13/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 13/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 13/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 13/0 : 1[1] -> 10[2] [send] via NET/IB/0(2)/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 13/0 : 1[1] -> 10[2] [receive] via NET/IB/1/GDRDMA mlx5_2:1/RoCE
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 13/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 13/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 13/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 13/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 13/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 13/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 13/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 13/0 : 9[1] -> 2[2] [send] via NET/IB/1(10)/GDRDMA mlx5_2:1/RoCE
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 13/0 : 9[1] -> 2[2] [receive] via NET/IB/0/GDRDMA mlx5_2:1/RoCE
## Channel C14
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 14/16 : 0 7 5 6 12 11 9 10 8 15 13 14 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 14/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 14/0 : 6[6] -> 12[4] [send] via NET/IB/1(4)/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 14/0 : 6[6] -> 12[4] [receive] via NET/IB/2/GDRDMA mlx5_3:1/RoCE
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 14/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 14/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 14/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 14/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 14/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 14/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 14/0 : 13[5] -> 14[6] via P2P/CUMEM
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 14/0 : 14[6] -> 4[4] [send] via NET/IB/2(12)/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 14/0 : 14[6] -> 4[4] [receive] via NET/IB/1/GDRDMA mlx5_3:1/RoCE
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 14/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 14/0 : 2[2] -> 0[0] via P2P/CUMEM
## Channel C15
bm-2204kzq:252978:253054 [0] NCCL INFO Channel 15/16 : 0 7 5 14 12 11 9 10 8 15 13 6 4 3 1 2
bm-2204kzq:
[0]mlx5_2:1/RoCE
[1]mlx5_3:1/RoCE
[2]mlx5_4:1/RoCE
[3]mlx5_0:1/RoCE
bm-2204qhn:
[0]mlx5_0:1/RoCE
[1]mlx5_2:1/RoCE
[2]mlx5_3:1/RoCE
[3]mlx5_4:1/RoCE
bm-2204kzq:252978:253115 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/CUMEM
bm-2204kzq:252985:253113 [7] NCCL INFO Channel 15/0 : 7[7] -> 5[5] via P2P/CUMEM
bm-2204kzq:252983:253114 [5] NCCL INFO Channel 15/0 : 5[5] -> 14[6] [send] via NET/IB/2(6)/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 15/0 : 5[5] -> 14[6] [receive] via NET/IB/3/GDRDMA mlx5_4:1/RoCE
bm-2204qhn:253840:253973 [6] NCCL INFO Channel 15/0 : 14[6] -> 12[4] via P2P/CUMEM
bm-2204qhn:253838:253972 [4] NCCL INFO Channel 15/0 : 12[4] -> 11[3] via P2P/CUMEM
bm-2204qhn:253837:253967 [3] NCCL INFO Channel 15/0 : 11[3] -> 9[1] via P2P/CUMEM
bm-2204qhn:253835:253971 [1] NCCL INFO Channel 15/0 : 9[1] -> 10[2] via P2P/CUMEM
bm-2204qhn:253836:253974 [2] NCCL INFO Channel 15/0 : 10[2] -> 8[0] via P2P/CUMEM
bm-2204qhn:253834:253970 [0] NCCL INFO Channel 15/0 : 8[0] -> 15[7] via P2P/CUMEM
bm-2204qhn:253841:253968 [7] NCCL INFO Channel 15/0 : 15[7] -> 13[5] via P2P/CUMEM
bm-2204qhn:253839:253969 [5] NCCL INFO Channel 15/0 : 13[5] -> 6[6] [send] via NET/IB/3(14)/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 15/0 : 13[5] -> 6[6] [receive] via NET/IB/2/GDRDMA mlx5_4:1/RoCE
bm-2204kzq:252984:253117 [6] NCCL INFO Channel 15/0 : 6[6] -> 4[4] via P2P/CUMEM
bm-2204kzq:252982:253118 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM
bm-2204kzq:252981:253116 [3] NCCL INFO Channel 15/0 : 3[3] -> 1[1] via P2P/CUMEM
bm-2204kzq:252979:253119 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM
bm-2204kzq:252980:253120 [2] NCCL INFO Channel 15/0 : 2[2] -> 0[0] via P2P/CUMEM