配置指导：存储场景性能指标与常用测试工具

1 存储场景介绍
2 常用的测试工具
3 存储性能指标解读
4 测试流程与使用到的软件
5 Fio使用介绍与测试结果说明
附录：常用测试工具的使用文档

1 存储场景介绍

如果以产品/架构划分，可分为NAS、SAN、软件定义存储，软件定义存储又可以根据业务场景和架构细分为：分布式存储架构、超融合架构、数据库一体机架构，架构图如下所示。

图1：分布式存储架构

图2：超融合架构

图3：数据库一体机架构

如果以存储最终用户的视角来划分，可以分为块（业务场景为虚拟机用的虚拟硬盘、数据库等）、文件（业务场景为AI、HPC、大数据等）、对象（海量数据存储场景）。对于存储场景的性能指标和常用测试工具的了解，我们就需要以最终用户视角的划分来理解，以上三个架构图仅作为背景信息补充。

2 常用的测试工具

应用场景	工具名称
基础性能测试/块	dd、fio、iostat
文件系统	filebench、iozone、mdtest
对象存储	cosbench
数据库	swingbench、hammerdb
云环境	vdbench

表1：常用的测试工具分类

3 存储性能指标解读

存储性能测试项整体上分为IO时延和IOPS两个纬度，每个维度中又会按照读/写、数据块的大小分别进行测试。一个IO就是单个读/写请求，IO时延指的是从发起请求到收到存储系统的响应消息所花费的时间，IOPS是指每秒存储系统能处理的IO请求数。

IO的大小对存储的性能表现也有直接的影响。当一次IO操作的数据量比较小的时候，称为小IO，比如1K、4K、8K这样的级别；当一次IO操作的数据量比较大的时候，称为大IO，比如32K、64K甚至更大。总体来说，较大的IO会带来更高的吞吐，较小的IO会产生更高的IOPS。大多数真实的业务场景中，IO的大小是混合的。

另外，IO还有顺序和随机之分，受存储主控的读写缓存策略、预读机制、存储介质的读写原理多方面因素影响，通常情况下随机IO的性能远低于顺序IO、写入性能远低于读取性能。顺序IO指大量的IO请求连续相邻的数据块，典型的业务有日志、数据备份恢复、流媒体等，顺序IO的性能通常就是最高性能；随机IO是指IO请求的是随机分布在存储介质各个区域的数据块，比如高并发读写大量小文件，就会导致IOPS和吞吐的性能下降，典型的业务有OLTP、交换分区、操作系统等，随机IO的性能通常是最低性能。

接下来我们看一个真实的存储性能测试结果，这是国内数据库一体机厂商分别使用Mellanox SB7700与星融元CX532P-N进行组网，使用测试工具fio对数据库一体机的存储系统进行测试后的结果，如下图所示。

	Mellanox SB7700 100G IB交换机	Asterfusion CX532P-N 低时延以太网交换机
latr（时延测试-4k随机读）	141.79us	132.84us
latw（时延测试-4k随机写）	79.67us	71.6us
latw-8k（时延测试-8k随机读）	150.64us	145.83us
latw-8k（时延测试-8k随机写）	80.89us	73.89us
4kr-1台压力服务器(IOPS)	1239k	1275k
4kw-1台压力服务器(IOPS)	493k	453k
8kr-1台压力服务器(IOPS)	1007k	939k
8kw-1台压力服务器(IOPS)	330k	310k
1024kr-1台压力服务器(IOPS)	11.7k	11.0k
1024kw-1台压力服务器(IOPS)	3709	3669
4kr-2台压力服务器(IOPS)	2548k	2633k
4kw-2台压力服务器(IOPS)	850k	916k
8kr-2台压力服务器(IOPS)	1992k	1877k
8kw-2台压力服务器(IOPS)	535k	591k
1024kr-2台压力服务器(IOPS)	17474	21.2k
1024kw-2台压力服务器(IOPS)	3673	4820

表2：存储性能测试报告

在测试时延时使用的是1v1的方式，测试存储系统IOPS时分别用1v1、2v1的方式进行压测。在衡量存储系统的性能时，时延越低越好，时延代表着存储系统的响应速度；IOPS则越高越好，IOPS x IO Size算出来的峰值，就是存储系统的最大吞吐能力。

4 测试流程与使用到的软件

通常，在存储业务场景中，涉及到网络的测试流程分为以下三个步骤：
首先，会进行存储网络的性能测试，这里会关注网络单链路的吞吐和时延，常用的工具是iperf、ib_read/write_bw、ib_read/write_lat；

第二步，会进行存储系统的基础性能测试，这里关注的是存储系统的时延和吞吐，常用的工具是fio；
第三步，会进行业务级别的兼容性、稳定性以及性能测试，兼容性方面主要测试交换机的API是否能满足业务系统的要求，稳定性方面的测试则是网络设备级和链路级别的高可靠，性能测试则会用业务场景专用的测试工具进行压测，比如：数据库一体机常用的工具是swingbench和hammerdb，对象存储场景中常用的工具是cosbench。

5 Fio使用介绍与测试结果说明

5.1 工具介绍

存储性能测试工具fio的全称为Flexible IO Tester，由Jens Axboe开发，Jens Axboe另一个比较有名的身份是Linux内核的块IO子系统的维护者。fio在存储测试中是瑞士军刀一般的存在，首先是诸多可灵活调整的测试参数，使其能够组合出非常多地测试样例，其次就是到现在fio仍处于活跃更新的状态，能根据存储的发展不断进行适配。

5.2 参数说明

本次测试演示，目标是测试服务器在假设的小IO业务场景中（100% 随机，70% 读，30% 写，IO size 4K）的性能表现。

[root@server ~]# fio \
-filename=/root/randrw_70read_4k.fio \
-direct=1 \
-iodepth 1 \
-thread \
-rw=randrw \
-rwmixread=70 \
-ioengine=psync \
-bs=4k \
-size=5G \
-numjobs=8 \
-runtime=300 \
-group_reporting \
-name=randrw_70read_4k_local

-filename=/root/randrw_70read_4k.fio

支持文件、裸盘、RBD image。这次要测的是文件系统，filename=<具体的文件名>；如果是RBD image，filename=<具体的image name>；如果是裸盘，filename=<具体的设备名>；该参数可以同时制定多个设备或文件，格式为：-filename=/dev/vdc:/dev/vdd（以冒号分割）。

-direct=1

direct即使用直接写入，绕过操作系统的page cache。

-iodepth=1

iodepth是设置IO队列深度，即单线程中一次给系统多少IO请求。如果使用同步方式，单线程中iodepth总是1；如果是异步方式，就可以提高iodepth，一次提交一批IO，使得底层IO调度算法可以进行合并操作。异步方式，一般设置为32或64。注意响应时间在可接受的范围内，来增加队列深度，因为队列深度增加了，IO在队列的等待时间也会增加，导致IO响应时间变大，这需要权衡。单路IO测试设置为1，多路IO测试设置为32。

-thread

fio默认是通过fork创建多个job，即多进程方式，如果指定thread，就是用POSIX的thread方式创建多个job，即使用pthread_create()方式创建线程。

-rw=randrw

设置读写模式，包括：write(顺序写)、read(顺序读)、rw(顺序读写)、randwrite(随机写)、randread(随机读)、randrw(随机读写)。

-rwmixread=70

设置读写IO的混合比例，在这个测试中，读占总IO的70%，写IO占比30%。

-ioengine=psync

设置fio下发IO的方式，包括sync(同步IO)、psync(同步IO，内部使用pwrite、pread方式，和write、read区别是：读写到某个位置时不会改变文件位置指针)、libaio(Linux异步IO，Linux只支持非buffer IO的异步排队，也就是direct需要设置为1)、posixaio(POSIX异步IO，是glibc在用户空间实现的，自己维护多个线程进行异步操作，比较耗费资源，扩展性差)、rados(直接使用libaio接口测试RADOS层IO)、rbd(直接使用librbd接口测试RBD Image IO)。本次测试使用的IO引擎为psync。

-bs=4k

bs即block size(块大小)，是指每个IO的数据大小。使用场景是数据库的时候，通常采用4k、8k等小数据块，主要关注IOPS指标；使用场景为视频存储、归档等大文件的时候，通常采用1m、4m等大数据块，主要关注带宽吞吐指标。默认情况下，单位小写代表换算基数为1024，大写代表换算基数为1000，即1m=1024k，1M=1000k。随机读写测试设置为4K，顺序读写吞吐测试设置为1M。

-size=5g

测试总数据量，该参数和runtime会同时限制fio的运行，任何一个目标先达到，fio都会终止运行。
在做性能测试时，尽量设置大点，比如设置2g、5g、10g或者更大，如果基于文件系统测试，则需要-size需要<4g。

-numjobs=8

本次作业同时进行测试的线程或进程数，线程还是进程由前面提到的thread参数控制。

-runtime=300

测试总时长，单位是s。和size一起控制fio的运行时长，在做一般性性能测试的时候，该时间也尽量设置长点，比如5分钟、10分钟。

-group_reporting

多个jobs测试的时候，测试结果默认是单独分开的，加上这个参数，会将所有jobs的测试结果汇总起来。

-name=randrw_70read_4k_local

本次测试作业的名称。

5.3 测试结果

图4：fio性能测试结果

5.4 结果解读

Line 16～22
软件版本、执行参数、任务名、运行过程输出等信息。

randrw_70read_4k_local: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.7
Starting 8 threads
randrw_70read_4k_local: Laying out IO file (1 file / 5120MiB)
Jobs: 8 (f=8): [m(8)][100.0%][r=404KiB/s,w=164KiB/s][r=101,w=41 IOPS][eta 00m:00s]
randrw_70read_4k_local: (groupid=0, jobs=8): err= 0: pid=49066: Wed Mar  8 14:33:21 2023

Line 23～33

此部分是读性能的测试结果，其中整体IO时延lat = 提交时延slat + 完成时延clat。slat(submission latency) 是提交IO花费的时间，指从fio创建IO到内核开始处理IO的时间，即在队列中排队的时间。fio会分别统计出最小延迟、最大延迟、平均延迟、标准方差延迟，因为同步IO没有队列，所以，选择同步模式的存储引擎时不显示slat。clat(completion latency)是完成IO花费的时间，从内核开始处理IO，到IO处理完成的时间，不包括提交IO时间。

另外，这部分报告还会统计整体时延分布状态，以99.99th=[ 1020]为例，它的含义是 99.99%的IO的时延都低于1020ms。最后两行分别时读取时带宽和IOPS的测试结果。

# What is the difference between kB s and KiB s?
# 1 kB = 1000 bytes. 1 KiB = 1024 bytes.
# 时间的换算关系：
# 1秒(s) ＝1000毫秒(ms, millisecond)
# 1毫秒(ms)＝1000微秒 (us, microsecond)
# 1微秒(us)＝1000纳秒 (ns, nanosecond)
# 1纳秒(ns)＝1000皮秒 (ps, picosecond)
# 读性能
   read: IOPS=96, BW=387KiB/s (396kB/s)(113MiB/300047msec)
    clat (usec): min=159, max=1206.1k, avg=81519.63, stdev=87349.89
     lat (usec): min=159, max=1206.1k, avg=81519.97, stdev=87349.89
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    8], 10.00th=[   14], 20.00th=[   21],
     | 30.00th=[   30], 40.00th=[   41], 50.00th=[   54], 60.00th=[   70],
     | 70.00th=[   93], 80.00th=[  127], 90.00th=[  184], 95.00th=[  245],
     | 99.00th=[  405], 99.50th=[  493], 99.90th=[  835], 99.95th=[  885],
     | 99.99th=[ 1020]
   bw (  KiB/s): min=    7, max=  143, per=12.66%, avg=48.99, stdev=20.31, samples=4730
   iops        : min=    1, max=   35, avg=12.17, stdev= 5.09, samples=4730

Line 34～44

此部分是写性能的测试结果，报告中各个指标项的含义与上文中的读性能测试结果一致，不再赘述。

# 写性能  
  write: IOPS=42, BW=169KiB/s (173kB/s)(49.5MiB/300047msec)
    clat (usec): min=155, max=956586, avg=2619.71, stdev=32750.22
     lat (usec): min=156, max=956586, avg=2620.25, stdev=32750.24
    clat percentiles (usec):
     |  1.00th=[   208],  5.00th=[   233], 10.00th=[   247], 20.00th=[   306],
     | 30.00th=[   330], 40.00th=[   453], 50.00th=[   529], 60.00th=[   857],
     | 70.00th=[   971], 80.00th=[  1156], 90.00th=[  1614], 95.00th=[  4047],
     | 99.00th=[ 14877], 99.50th=[ 18744], 99.90th=[750781], 99.95th=[817890],
     | 99.99th=[918553]
   bw (  KiB/s): min=    7, max=  120, per=14.85%, avg=24.95, stdev=15.98, samples=4044
   iops        : min=    1, max=   30, avg= 6.16, stdev= 4.00, samples=4044

Line 45～47

此部分是整体时延的分布统计，250=3.28%表示时延在0us ~ 250us的IO占比为3.28%，因此本次测试的时延分布情况为：0us ~ 250us 3.28%、250us ~ 500us 11.27%、…、750ms ~ 1000ms 0.24%、> 1000ms 0.13%。

lat (usec)   : 250=3.28%, 500=11.27%, 750=2.09%, 1000=5.49%
lat (msec)   : 2=6.16%, 4=1.48%, 10=3.85%, 20=9.96%, 50=19.76%
lat (msec)   : 100=17.51%, 250=15.85%, 500=2.91%, 750=0.24%, 1000=0.13%

Line 48

此部分是CPU的使用率，分别是：用户态CPU使用率、内核态CPU使用率、上下文切换次数、主要的页面错误数、次要页面错误数。

  cpu          : usr=0.01%, sys=0.05%, ctx=41717, majf=0, minf=380

Line 49～53

此部分是IO深度分布情况，反映了存储系统处理IO请求的速度，本次测试使用的depth为1，因此结果中显示1=100%。

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=29034,12664,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Line 55～57

此部分是读写带宽测试结果的汇总，分别有带宽（bw）、总IO数据量（io）、运行时间（run）。

Run status group 0 (all jobs):
   READ: bw=387KiB/s (396kB/s), 387KiB/s-387KiB/s (396kB/s-396kB/s), io=113MiB (119MB), run=300047-300047msec
  WRITE: bw=169KiB/s (173kB/s), 169KiB/s-169KiB/s (173kB/s-173kB/s), io=49.5MiB (51.9MB), run=300047-300047msec

Line 59～62

此部分是测试过程中，服务器上块设备的使用情况，包括：设备名称、总IO数（ios，以‘/’分割，前面为read ios、后面为write ios）、IO scheduler合并的IO数（merge，读合并数/写合并数）、设备处理的ticks数（ticks，读使用的ticks/写使用的ticks）、在设备队列中花费的总时间（in_queue）、设备使用率。

Disk stats (read/write):
    dm-0: ios=29013/12724, merge=0/0, ticks=2364321/50184, in_queue=2415058, util=100.00%, aggrios=14517/6380, aggrmerge=0/2, aggrticks=1183183/24592, aggrin_queue=1207762, aggrutil=100.00%
  sdc: ios=29034/12684, merge=0/1, ticks=2366367/48619, in_queue=2414960, util=100.00%
  sda: ios=0/76, merge=0/4, ticks=0/565, in_queue=565, util=0.06%

整个测试报告中，Line22～33和Line34～44分别是读写两项测试结果的汇总，基本上可通过这两部分的数据判断出存储系统的性能表现，整体时延越低、IOPS越高，即意味着存储性能越好。

附录：常用测试工具的使用文档

【1】dd.md

【2】fio.md

【3】iostat.md

【4】HammerDB.md

如有其它问题，请填写右侧需求表单联系我们。

点击了解Asterfusion CX-N数据中心交换机

配置指导：如何从0到1构建分布式GPU计算试验环境

1 硬件准备
- 1.1 GPU服务器选型
- 1.2 高性能计算网选型
2 软件准备
- 2.1 RoCE v2交换机
- 2.2 GPU服务器基础配置
2.3 安装GPU驱动和集合通讯库
3 实验测试
4 部署与使用相关Q &A

随着AI、大模型的快速发展，传统的集中式计算已无法应对激增的数据处理需求，而分布式计算是指将一个计算任务分解成多个子任务，由多个计算节点并行地进行计算，并将结果汇总得到最终结果的计算方式，能够更高效、更稳定、更灵活地处理大规模数据和复杂计算任务，在各行各业中得到了广泛的应用。

那如何从零到一搭建分布式计算的环境呢？本文将从硬件选型，到服务器侧的基础配置、GPU驱动安装和集合通讯库配置，以及无损以太网的启用，直至大模型导入和训练测试，带您跑通搭建分布式训练环境的全流程。

1 硬件准备

1.1 GPU服务器选型

GPU拥有大量的计算核心，可以同时处理多个数据任务，是构成智算中心的关键硬件。

从智算中心方案的整体设计层面来看：GPU服务器集群和存储服务器集群分别通过计算网络（Scale-out网络）和存储网络连接。另外两张管理网中，业务管理网用于GPU服务器互联，进行AIOS管理面通信，带外管理则连接整个智算中心的所有设备，用于运维接入管理。

图1：智算中心方案的概要设计拓扑

明确了智算中心的整体设计后，我们将对比通用计算服务器与GPU服务器的内部硬件连接拓扑图，来具体了解GPU服务器的选型逻辑：

图2（上）：通用计算服务器内部的硬件连接拓扑

图3（下）：GPU服务器内部的硬件连接拓扑

图2是一台通用计算服务器内部的硬件连接拓扑，这台服务器的核心是两块AMD的EPYC CPU，根据IO Chiplet扩展出了若干接口，辅助CPU充分释放通用计算能力。

图3是一台GPU服务器内部的硬件连接拓扑，这台服务器配备了8块A100 GPU，8张用于计算通信的RDMA网卡，以及2张用于存储通信的RDMA网卡，所有的IO组件设计，都是为了让这8块GPU充分释放算力。

通过上面两张硬件连接拓扑图可以看到，通用服务器和GPU服务器从基本的硬件构造上就有着非常大的差异，一个是围绕通用CPU来构建，另一个是围绕着GPU来构建的。因此，在硬件选型阶段，就需要注意差别，通常来讲通用服务器是没有办法复用改造成一台高性能的GPU服务器，PCIe接口数量、服务器空间、散热设计、电源等方面都不能满足要求。

当通过计算任务确定算力需求，进而确定了所需要的GPU型号和数量之后，我们也就可以再继续规划整个GPU集群的组网了。

由于资源限制，本次实验验证中，使用三台通用服务器稍加改造进行后续的并行训练和推理测试。

计算节点的硬件配置如下：

CPU：Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz * 2

GPU：NVIDIA GeForce RTX 4060 Ti 16G * 1

内存：128G

存储：10T HDD * 2

网卡：MGMT、CX5

其他部分：

散热：GPU为全高尺寸，但服务器只有2U，所以只能拆掉上盖板;

电源：通用服务器通常没有预留足够的供电接口，因此需要使用外置电源对GPU进行额外供电；

电源选择的是Great Wall 额定650W X6，功率上可以同时满足3块GPU（RTX4060Ti需要外接150W的供电）的供电要求，并且支持3个8pin接口，用来分别连接三块GPU。

图4：电源选型示意图

图5：GPU和RDMA网卡上机安装后的实拍图

1.2 高性能计算网选型

智算中心的管理网相较于传统的通用计算数据中心来说，没有太大差异。比较特殊的就是Scale-out计算网络和存储网络，这两张网络承载的业务流量决定了交换机设备的选型需求：支持RDMA、低时延、高吞吐。

如下图所示，在组网连接方面也有所不同，这里会通过将GPU分组（图中#L0～7一组，#L8～15一组），组成只有一跳的高带宽互联域（HB域），并通过针对智算场景优化的Rail交换机连接，实现了高效的数据传输和计算协同。

图6：组网连接示意

这次实验验证中，计算网的交换机选用星融元Asterfusion®️ CX-N系列超低时延交换机，具体型号为CX308P-48Y-N。

型号	业务接口	交换容量
CX864E-N	64 x 800GE OSFP，2 x 10GE SFP+	102.4Tbps
CX732Q-N	32 x 400GE QSFP-DD, 2 x 10GE SFP+	25.6Tbps
CX664D-N	64 x 200GE QSFP56, 2 x 10GE SFP+	25.6Tbps
CX564P-N	64 x 100GE QSFP28, 2 x 10GE SFP+	12.8Tbps
CX532P-N	32 x 100GE QSFP28, 2 x 10GE SFP+	6.4Tbps
CX308P-48Y-N	48 x 25GE SFP28, 8 x 100GE QSFP28	4.0Tbps

表1：具体型号规格示意

提升大模型训练效率

CX-N数据中心交换机的单机转发时延（400ns）低至业界平均水平的1/4~1/5，将网络时延在AI/ML应用端到端时延中的占比降至最低，同时多维度的高可靠设计确保网络在任何时候都不中断，帮助大模型的训练大幅度降低训练时间、提升整体效率。

全系列标配RoCEv2能力

区别于传统厂家多等级License权限管理方式，CX-N数据中心交换机所有应用场景License权限一致，全系列标配RoCEv2能力，提供PFC、ECN、Easy RoCE等一系列面向生产环境的增强网络特性，用户无须为此类高级特性额外付出网络建设成本，帮助用户获得更高的ROI。

开放、中立的AI/ML网络

星融元AI/ML网络解决方案的开放性确保用户能够重用已有的系统（K8s、Prometheus等）对网络进行管理，无需重复投入；星融元以“中立的网络供应商参与AI生态”的理念为用户提供专业的网络方案，帮助用户规避“全栈方案锁定”的风险。

最终，实验环节的组网拓扑和基础配置如下所示。

图7：实验拓扑和基础配置示意

2 软件准备

以上，我们已经完成了硬件选型，接下来我们将进行软件层面的配置：部署 RoCEv2 交换机、配置GPU 服务器、安装 GPU 驱动和集合通讯库。

2.1 RoCEv2交换机

图8：CX308P-48Y-N设备图

本次并行训练的环境中设备数量较少，组网相对简单：

1. 将CX5网卡的25GE业务接口连接到CX308P；

2. 在交换机上一键启用全局RoCE的无损配置；

3. 将三个25G业务口划分到一个VLAN下组成一个二层网络；

如前文提到，CX-N数据中心交换机全系列标配RoCEv2能力，配合星融元AsterNOS网络操作系统，只需要两行命令行便可配置所有必要的QoS规则和参数，具体命令行如下：

noone@MacBook-Air ~ % ssh admin@10.230.1.17
Linux AsterNOS 5.10.0-8-2-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03) x86_64
    _          _                _   _   ___   ____  
   / \    ___ | |_   ___  _ __ | \ | | / _ \ / ___| 
  / _ \  / __|| __| / _ \| '__||  \| || | | |\___ \ 
 / ___ \ \__ \| |_ |  __/| |   | |\  || |_| | ___) |
/_/   \_\|___/ \__| \___||_|   |_| \_| \___/ |____/ 

------- Asterfusion Network Operating System -------

Help:    http://www.asterfusion.com/

Last login: Sun Sep 29 17:10:46 2024 from 172.16.20.241

AsterNOS# configure terminal 
AsterNOS(config)# qos roce lossless   
AsterNOS(config)# qos service-policy roce_lossless 
AsterNOS(config)# end
AsterNOS# show qos roce
                    operational    description
------------------  -------------  ---------------------------------------------------
status              bind           qos roce binding status
mode                lossless       Roce Mode
cable-length        40m            Cable Length(in meters) for Roce Lossless Config
congestion-control
- congestion-mode   ECN            congestion-control
- enabled-tc        3,4            Congestion config enabled Traffic Class
- max-threshold     750000         Congestion config max-threshold
- min-threshold     15360          Congestion config max-threshold
pfc
- pfc-priority      3,4            switch-prio on which PFC is enabled
- rx-enabled        enable         PFC Rx Enabled status
- tx-enabled        enable         PFC Tx Enabled status
trust
- trust-mode        dscp           Trust Setting on the port for packet classification

 RoCE DSCP->SP mapping configurations
==========================================
dscp                       switch-prio
-----------------------  -------------
0,1,2,3,4,5,6,7                      0
10,11,12,13,14,15,8,9                1
16,17,18,19,20,21,22,23              2
24,25,26,27,28,29,30,31              3
32,33,34,35,36,37,38,39              4
40,41,42,43,44,45,46,47              5
48,49,50,51,52,53,54,55              6
56,57,58,59,60,61,62,63              7

 RoCE SP->TC mapping and ETS configurations
================================================
  switch-prio  mode    weight
-------------  ------  --------
            6  SP
            7  SP

 RoCE pool config
======================
name                     switch-prio
-----------------------  -------------
egress_lossy_profile     0 1 2 5 6
ingress_lossy_profile    0 1 2 5 6
egress_lossless_profile  3 4
roce_lossless_profile    3 4

2.2 GPU服务器基础配置

以下所有操作，在三台服务器上都需要执行，本文档中的配置步骤以server3为例。

2.2.1 关闭防火墙和SELinux

[root@server3 ~]# systemctl stop firewalld
[root@server3 ~]# systemctl disable firewalld
[root@server3 ~]# setenforce 0
[root@server3 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

2.2.2 配置服务器间免密登陆

[root@server3 ~]# ssh-keygen
[root@server3 ~]# ssh-copy-id root@server1
[root@server3 ~]# ssh-copy-id root@server2

2.2.3 配置服务器软件源

[root@server3 ~]# ll /etc/yum.repos.d/
总用量 80
-rw-r--r-- 1 root root 2278 9月  19 08:00 CentOS-Base.repo
-rw-r--r-- 1 root root  232 9月  19 08:00 cuda-rhel7.repo
-rw-r--r-- 1 root root  210 9月  19 08:00 cudnn-local-rhel7-8.9.7.29.repo
drwxr-xr-x 2 root root 4096 9月  19 07:58 disable.d
-rw-r--r-- 1 root root  664 9月  19 08:00 epel.repo
-rw-r--r-- 1 root root  381 9月  19 08:00 hashicorp.repo
-rw-r--r-- 1 root root  218 9月  19 08:00 kubernetes.repo
-rw-r--r-- 1 root root  152 9月  19 08:00 MariaDB.repo
-rw-r--r-- 1 root root  855 9月  19 08:00 remi-modular.repo
-rw-r--r-- 1 root root  456 9月  19 08:00 remi-php54.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php70.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php71.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php72.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php73.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php74.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php80.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php81.repo
-rw-r--r-- 1 root root 1314 9月  19 08:00 remi-php82.repo
-rw-r--r-- 1 root root 2605 9月  19 08:00 remi.repo
-rw-r--r-- 1 root root  750 9月  19 08:00 remi-safe.repo
[root@server3 ~]# more /etc/yum.repos.d/*.repo
::::::::::::::
/etc/yum.repos.d/CentOS-Base.repo
::::::::::::::
# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client.  You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the 
# remarked out baseurl= line instead.
#
#
 
[base]
name=CentOS-7 - Base - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/7/os/x86_64/
        http://mirrors.aliyuncs.com/centos/7/os/x86_64/
        http://mirrors.cloud.aliyuncs.com/centos/7/os/x86_64/
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#released updates 
[updates]
name=CentOS-7 - Updates - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/7/updates/x86_64/
        http://mirrors.aliyuncs.com/centos/7/updates/x86_64/
        http://mirrors.cloud.aliyuncs.com/centos/7/updates/x86_64/
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#additional packages that may be useful
[extras]
name=CentOS-7 - Extras - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/7/extras/x86_64/
        http://mirrors.aliyuncs.com/centos/7/extras/x86_64/
        http://mirrors.cloud.aliyuncs.com/centos/7/extras/x86_64/
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-7 - Plus - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/7/centosplus/x86_64/
        http://mirrors.aliyuncs.com/centos/7/centosplus/x86_64/
        http://mirrors.cloud.aliyuncs.com/centos/7/centosplus/x86_64/
gpgcheck=1
enabled=0
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#contrib - packages by Centos Users
[contrib]
name=CentOS-7 - Contrib - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/7/contrib/x86_64/
        http://mirrors.aliyuncs.com/centos/7/contrib/x86_64/
        http://mirrors.cloud.aliyuncs.com/centos/7/contrib/x86_64/
gpgcheck=1
enabled=0
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
::::::::::::::
/etc/yum.repos.d/cuda-rhel7.repo
::::::::::::::
[cuda-rhel7-x86_64]
name=cuda-rhel7-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub
::::::::::::::
/etc/yum.repos.d/cudnn-local-rhel7-8.9.7.29.repo
::::::::::::::
[cudnn-local-rhel7-8.9.7.29]
name=cudnn-local-rhel7-8.9.7.29
baseurl=file:///var/cudnn-local-repo-rhel7-8.9.7.29
enabled=1
gpgcheck=1
gpgkey=file:///var/cudnn-local-repo-rhel7-8.9.7.29/90F10142.pub
obsoletes=0
::::::::::::::
/etc/yum.repos.d/epel.repo
::::::::::::::
[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://mirrors.aliyun.com/epel/7/$basearch
failovermethod=priority
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
 
[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
 
[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=http://mirrors.aliyun.com/epel/7/SRPMS
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
::::::::::::::
/etc/yum.repos.d/hashicorp.repo
::::::::::::::
[hashicorp]
name=Hashicorp Stable - $basearch
baseurl=https://rpm.releases.hashicorp.com/RHEL/$releasever/$basearch/stable
enabled=0
gpgcheck=1
gpgkey=https://rpm.releases.hashicorp.com/gpg

[hashicorp-test]
name=Hashicorp Test - $basearch
baseurl=https://rpm.releases.hashicorp.com/RHEL/$releasever/$basearch/test
enabled=0
gpgcheck=1
gpgkey=https://rpm.releases.hashicorp.com/gpg
::::::::::::::
/etc/yum.repos.d/kubernetes.repo
::::::::::::::
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/repodata/repomd.xml.key
::::::::::::::
/etc/yum.repos.d/MariaDB.repo
::::::::::::::
[mariadb]
name = MariaDB
baseurl = https://mirror.mariadb.org/yum/11.2/centos74-amd64
gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck = 0
::::::::::::::
/etc/yum.repos.d/remi-modular.repo
::::::::::::::
# Repository: https://rpms.remirepo.net/
# Blog:       https://blog.remirepo.net/
# Forum:      https://forum.remirepo.net/

[remi-modular]
name=Remi's Modular repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/modular/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/modular/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/modular/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-modular-test]
name=Remi's Modular testing repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/modular-test/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/modular-test/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/modular-test/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

::::::::::::::
/etc/yum.repos.d/remi-php54.repo
::::::::::::::
# This repository only provides PHP 5.4 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php54]
name=Remi's PHP 5.4 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php54/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php54/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php54/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

::::::::::::::
/etc/yum.repos.d/remi-php70.repo
::::::::::::::
# This repository only provides PHP 7.0 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php70]
name=Remi's PHP 7.0 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php70/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php70/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php70/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php70-debuginfo]
name=Remi's PHP 7.0 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php70/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php70-test]
name=Remi's PHP 7.0 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test70/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test70/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test70/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php70-test-debuginfo]
name=Remi's PHP 7.0 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test70/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php71.repo
::::::::::::::
# This repository only provides PHP 7.1 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php71]
name=Remi's PHP 7.1 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php71/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php71/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php71/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php71-debuginfo]
name=Remi's PHP 7.1 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php71/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php71-test]
name=Remi's PHP 7.1 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test71/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test71/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test71/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php71-test-debuginfo]
name=Remi's PHP 7.1 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test71/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php72.repo
::::::::::::::
# This repository only provides PHP 7.2 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php72]
name=Remi's PHP 7.2 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php72/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php72/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php72/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php72-debuginfo]
name=Remi's PHP 7.2 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php72/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php72-test]
name=Remi's PHP 7.2 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test72/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test72/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test72/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php72-test-debuginfo]
name=Remi's PHP 7.2 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test72/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php73.repo
::::::::::::::
# This repository only provides PHP 7.3 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php73]
name=Remi's PHP 7.3 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php73/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php73/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php73/mirror
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php73-debuginfo]
name=Remi's PHP 7.3 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php73/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php73-test]
name=Remi's PHP 7.3 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test73/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test73/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test73/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php73-test-debuginfo]
name=Remi's PHP 7.3 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test73/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php74.repo
::::::::::::::
# This repository only provides PHP 7.4 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php74]
name=Remi's PHP 7.4 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php74/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php74/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php74/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php74-debuginfo]
name=Remi's PHP 7.4 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php74/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php74-test]
name=Remi's PHP 7.4 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test74/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test74/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test74/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php74-test-debuginfo]
name=Remi's PHP 7.4 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test74/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php80.repo
::::::::::::::
# This repository only provides PHP 8.0 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php80]
name=Remi's PHP 8.0 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php80/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php80/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php80/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php80-debuginfo]
name=Remi's PHP 8.0 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php80/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php80-test]
name=Remi's PHP 8.0 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test80/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test80/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test80/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php80-test-debuginfo]
name=Remi's PHP 8.0 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test80/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php81.repo
::::::::::::::
# This repository only provides PHP 8.1 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php81]
name=Remi's PHP 8.1 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php81/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php81/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php81/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php81-debuginfo]
name=Remi's PHP 8.1 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php81/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php81-test]
name=Remi's PHP 8.1 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test81/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test81/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test81/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php81-test-debuginfo]
name=Remi's PHP 8.1 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test81/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi-php82.repo
::::::::::::::
# This repository only provides PHP 8.2 and its extensions
# NOTICE: common dependencies are in "remi-safe"

[remi-php82]
name=Remi's PHP 8.2 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php82/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php82/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php82/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php82-debuginfo]
name=Remi's PHP 8.2 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php82/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php82-test]
name=Remi's PHP 8.2 test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test82/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test82/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test82/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php82-test-debuginfo]
name=Remi's PHP 8.2 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test82/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
::::::::::::::
/etc/yum.repos.d/remi.repo
::::::::::::::
# Repository: http://rpms.remirepo.net/
# Blog:       http://blog.remirepo.net/
# Forum:      http://forum.remirepo.net/

[remi]
name=Remi's RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/remi/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/remi/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/remi/mirror
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php55]
name=Remi's PHP 5.5 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php55/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php55/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php55/mirror
# NOTICE: common dependencies are in "remi-safe"
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php56]
name=Remi's PHP 5.6 RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/php56/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/php56/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/php56/mirror
# NOTICE: common dependencies are in "remi-safe"
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-test]
name=Remi's test RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/test/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/test/mirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/test/mirror
# WARNING: If you enable this repository, you must also enable "remi"
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-debuginfo]
name=Remi's RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-remi/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php55-debuginfo]
name=Remi's PHP 5.5 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php55/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-php56-debuginfo]
name=Remi's PHP 5.6 RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-php56/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-test-debuginfo]
name=Remi's test RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-test/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

::::::::::::::
/etc/yum.repos.d/remi-safe.repo
::::::::::::::
# This repository is safe to use with RHEL/CentOS base repository
# it only provides additional packages for the PHP stack
# all dependencies are in base repository or in EPEL

[remi-safe]
name=Safe Remi's RPM repository for Enterprise Linux 7 - $basearch
#baseurl=http://rpms.remirepo.net/enterprise/7/safe/$basearch/
#mirrorlist=https://rpms.remirepo.net/enterprise/7/safe/httpsmirror
mirrorlist=http://cdn.remirepo.net/enterprise/7/safe/mirror
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi

[remi-safe-debuginfo]
name=Remi's RPM repository for Enterprise Linux 7 - $basearch - debuginfo
baseurl=http://rpms.remirepo.net/enterprise/7/debug-remi/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi
[root@server3 ~]#

2.2.4 安装Python3

准备工作目录
[root@server3 lichao]# mkdir AIGC
[root@server3 lichao]# cd AIGC/

安装Python3

安装编译环境和依赖包
[root@server3 AIGC]# yum install wget gcc openssl-devel bzip2-devel libffi-devel
[root@server3 AIGC]# yum install openssl11 openssl11-devel openssl-devel
解压源码包
[root@server3 AIGC]# tar xvf Python-3.11.9.tar.xz 
[root@server3 AIGC]# cd Python-3.11.9
[root@server3 Python-3.11.9]# 
设置环境变量
[root@server3 Python-3.11.9]# export CFLAGS=$(pkg-config --cflags openssl11)
[root@server3 Python-3.11.9]# export LDFLAGS=$(pkg-config --libs openssl11)
进行编译安装
[root@server3 Python-3.11.9]# mkdir -p /home/lichao/opt/python3.11.9
[root@server3 Python-3.11.9]# ./configure --prefix=/home/lichao/opt/python3.11.9
[root@server3 Python-3.11.9]# make && make install
创建软链接，用于全局访问
[root@server3 Python-3.11.9]# cd /home/lichao/opt/python3.11.9/
[root@server3 python3.11.9]# ln -s /home/lichao/opt/python3.11.9/bin/python3 /usr/bin/python3
[root@server3 python3.11.9]# ln -s /home/lichao/opt/python3.11.9/bin/pip3 /usr/bin/pip3
[root@server3 python3.11.9]# ll /usr/bin/python3 
lrwxrwxrwx 1 root root 41 5月  16 08:32 /usr/bin/python3 -> /home/lichao/opt/python3.11.9/bin/python3
[root@server3 python3.11.9]# ll /usr/bin/pip3
lrwxrwxrwx 1 root root 38 5月  16 08:32 /usr/bin/pip3 -> /home/lichao/opt/python3.11.9/bin/pip3
验证测试
[root@server3 python3.11.9]# python3
Python 3.11.9 (main, May 16 2024, 08:23:00) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
[root@server3 python3.11.9]#

2.2.5 安装MLNX网卡驱动

下文以CentOS7为例，详细介绍了Mellanox网卡MLNX_OFED的驱动安装和固件升级方法。

本次下载的驱动版本为：MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64.tgz。

把下载好的Mellanox驱动解压缩
[root@server3 ~]# tar –zxvf MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64.tgz
[root@server3 ~]# cd MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64
查看当前系统的内核版本
[root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]# uname -r
3.10.0-957.el7.x86_64
查看当前驱动所支持的内核版本
[root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]# cat .supported_kernels 
3.10.0-957.el7.x86_64 
注：由以上可知下载的默认驱动支持当前的内核版本
如果当前内核与支持内核不匹配，手动编译适合内核的驱动，在编译之前首先安装gcc编译环境和kernel开发包
[root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]#yum  install gcc gcc-c++
libstdc++-devel kernel-default-devel 
添加针对当前内核版本的驱动
[root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]#./mlnx_add_kernel_support.sh -m /root/MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64  -v
注：完成后生成的驱动文件在/tmp目录下
[root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]# ls -l /tmp/MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext.tgz
-rw-r--r-- 1 root root 282193833 Dec 23 09:49 /tmp/MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext.tgz
安装驱动
[root@server3 tmp]# tar xzvf MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext.tgz
[root@server3 tmp]# cd MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext
[root@server3 tmp]# ./mlnxofedinstall
最后启动openibd服务
[root@server3 ~]#/etc/init.d/openibd start
[root@server3 ~]#chkconfig openibd on

2.3 安装GPU驱动和集合通讯库

2.3.1 安装配置

2.3.1.1 安装GPU驱动和CUDA、CUDNN

安装开始前，请根据自己的GPU型号、操作系统版本去英伟达官网下载相对应的软件包。

[root@server3 AIGC]# ll
总用量 1733448
-rw-r--r--  1 root root 1430373861 5月  16 08:55 cudnn-local-repo-rhel7-8.9.7.29-1.0-1.x86_64.rpm
drwxr-xr-x  7 root root        141 5月  17 13:45 nccl-tests
-rwxr-xr-x  1 root root  306736632 5月  16 08:43 NVIDIA-Linux-x86_64-550.67.run
drwxrwxr-x 10 1000 1000       4096 5月  17 13:21 openmpi-4.1.6
-rw-r--r--  1 root root   17751702 9月  30 2023 openmpi-4.1.6.tar.gz
drwxr-xr-x 17 root root       4096 5月  16 08:23 Python-3.11.9
-rw-r--r--  1 root root   20175816 4月   2 13:11 Python-3.11.9.tar.xz
[root@server3 AIGC]# ./NVIDIA-Linux-x86_64-550.67.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.67...................

[root@server3 AIGC]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
已加载插件：fastestmirror, nvidia
adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
grabbing file https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo to /etc/yum.repos.d/cuda-rhel7.repo
repo saved to /etc/yum.repos.d/cuda-rhel7.repo
[root@server3 AIGC]# yum install libnccl-2.21.5-1+cuda12.4 libnccl-devel-2.21.5-1+cuda12.4 libnccl-static-2.21.5-1+cuda12.4
[root@server3 AIGC]# yum install cudnn-local-repo-rhel7-8.9.7.29-1.0-1.x86_64.rpm

安装完成后，可以通过nvidia-smi查看驱动和CUDA版本。如果版本不匹配，则执行此命令行会报错。

[root@server3 AIGC]# nvidia-smi 
Mon Jun  3 11:59:36 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:02:00.0 Off |                  N/A |
|  0%   34C    P0             27W /  165W |       1MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
[root@server3 AIGC]#

2.3.1.2编译安装OpenMPI

[root@server3 AIGC]# tar xvf openmpi-4.1.6.tar.gz 
[root@server3 openmpi-4.1.6]# 
[root@server3 openmpi-4.1.6]# mkdir -p /home/lichao/lib/openmpi
[root@server3 openmpi-4.1.6]# ./configure --prefix=/home/lichao/lib/openmpi -with-cuda=/usr/local/cuda-12.4 -with-nccl=/usr/lib64

Open MPI configuration:
-----------------------
Version: 4.1.6
Build MPI C bindings: yes
Build MPI C++ bindings (deprecated): no
Build MPI Fortran bindings: mpif.h, use mpi
MPI Build Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)

Miscellaneous
-----------------------
CUDA support: yes
HWLOC support: internal
Libevent support: internal
Open UCC: no
PMIx support: Internal
 
Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
Intel TrueScale (PSM): no
Mellanox MXM: no
Open UCX: yes
OpenFabrics OFI Libfabric: no
OpenFabrics Verbs: yes
Portals4: no
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
 
Resource Managers
-----------------------
Cray Alps: no
Grid Engine: no
LSF: no
Moab: no
Slurm: yes
ssh/rsh: yes
Torque: no
 
OMPIO File Systems
-----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no
Lustre: no
PVFS2/OrangeFS: no
 
[root@server3 openmpi-4.1.6]#

2.3.1.3 编译安装NCCL-Test

[root@server3 lichao]# cd AIGC/
[root@server3 AIGC]# git clone https://github.com/NVIDIA/nccl-tests.git
[root@server3 AIGC]# cd nccl-tests/
[root@server3 nccl-tests]# make clean
[root@server3 nccl-tests]# make MPI=1 MPI_HOME=/home/lichao/opt/openmpi/ CUDA_HOME=/usr/local/cuda-12.4/ NCCL_HOME=/usr/lib64/

2.3.2 集合通信性能测试方法（all_reduce）

[root@server1 lichao]# cat run_nccl-test.sh 
/home/lichao/opt/openmpi/bin/mpirun --allow-run-as-root \
-np 3 \
-host "server1,server2,server3" \
-mca btl ^openib \
-x NCCL_DEBUG=INFO \
-x NCCL_ALGO=ring \
-x NCCL_IB_DISABLE=0 \
-x NCCL_IB_GID_INDEX=3 \
-x NCCL_SOCKET_IFNAME=ens11f1 \
-x NCCL_IB_HCA=mlx5_1:1 \
/home/lichao/AIGC/nccl-tests/build/all_reduce_perf -b 128 -e 8G -f 2 -g 1
[root@server1 lichao]# ./run_nccl-test.sh 
# nThread 1 nGpus 1 minBytes 128 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid  18697 on    server1 device  0 [0x02] NVIDIA GeForce RTX 4060 Ti
#  Rank  1 Group  0 Pid  20893 on    server2 device  0 [0x02] NVIDIA GeForce RTX 4060 Ti
#  Rank  2 Group  0 Pid   2458 on    server3 device  0 [0x02] NVIDIA GeForce RTX 4060 Ti
#
# Reducing maxBytes to 5261099008 due to memory limitation
server1:18697:18697 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server1:18697:18697 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.11<0>
server1:18697:18697 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
server1:18697:18697 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
server1:18697:18697 [0] NCCL INFO NET/Plugin: Using internal network plugin.
server2:20893:20893 [0] NCCL INFO cudaDriverVersion 12040
server2:20893:20893 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server2:20893:20893 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.12<0>
server2:20893:20893 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
server2:20893:20893 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
server2:20893:20893 [0] NCCL INFO NET/Plugin: Using internal network plugin.
server1:18697:18697 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.21.5+cuda12.4
server3:2458:2458 [0] NCCL INFO cudaDriverVersion 12040
server3:2458:2458 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server3:2458:2458 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.13<0>
server3:2458:2458 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
server3:2458:2458 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
server3:2458:2458 [0] NCCL INFO NET/Plugin: Using internal network plugin.
server2:20893:20907 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
server2:20893:20907 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server2:20893:20907 [0] NCCL INFO NCCL_IB_HCA set to mlx5_1:1
server2:20893:20907 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [RO]; OOB ens11f1:172.16.0.12<0>
server2:20893:20907 [0] NCCL INFO Using non-device net plugin version 0
server2:20893:20907 [0] NCCL INFO Using network IB
server3:2458:2473 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
server3:2458:2473 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server3:2458:2473 [0] NCCL INFO NCCL_IB_HCA set to mlx5_1:1
server1:18697:18712 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
server1:18697:18712 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server3:2458:2473 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [RO]; OOB ens11f1:172.16.0.13<0>
server1:18697:18712 [0] NCCL INFO NCCL_IB_HCA set to mlx5_1:1
server3:2458:2473 [0] NCCL INFO Using non-device net plugin version 0
server3:2458:2473 [0] NCCL INFO Using network IB
server1:18697:18712 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [RO]; OOB ens11f1:172.16.0.11<0>
server1:18697:18712 [0] NCCL INFO Using non-device net plugin version 0
server1:18697:18712 [0] NCCL INFO Using network IB
server1:18697:18712 [0] NCCL INFO ncclCommInitRank comm 0x23622c0 rank 0 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init START
server3:2458:2473 [0] NCCL INFO ncclCommInitRank comm 0x346ffc0 rank 2 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init START
server2:20893:20907 [0] NCCL INFO ncclCommInitRank comm 0x2a1af20 rank 1 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init START
server3:2458:2473 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff
server2:20893:20907 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff
server1:18697:18712 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff
server1:18697:18712 [0] NCCL INFO comm 0x23622c0 rank 0 nRanks 3 nNodes 3 localRanks 1 localRank 0 MNNVL 0
server1:18697:18712 [0] NCCL INFO Channel 00/02 :    0   1   2
server1:18697:18712 [0] NCCL INFO Channel 01/02 :    0   1   2
server1:18697:18712 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] 2/-1/-1->0->1
server1:18697:18712 [0] NCCL INFO P2P Chunksize set to 131072
server3:2458:2473 [0] NCCL INFO comm 0x346ffc0 rank 2 nRanks 3 nNodes 3 localRanks 1 localRank 0 MNNVL 0
server2:20893:20907 [0] NCCL INFO comm 0x2a1af20 rank 1 nRanks 3 nNodes 3 localRanks 1 localRank 0 MNNVL 0
server3:2458:2473 [0] NCCL INFO Trees [0] 1/-1/-1->2->0 [1] -1/-1/-1->2->0
server3:2458:2473 [0] NCCL INFO P2P Chunksize set to 131072
server2:20893:20907 [0] NCCL INFO Trees [0] -1/-1/-1->1->2 [1] 0/-1/-1->1->-1
server2:20893:20907 [0] NCCL INFO P2P Chunksize set to 131072
server3:2458:2473 [0] NCCL INFO Channel 00/0 : 1[0] -> 2[0] [receive] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Channel 01/0 : 1[0] -> 2[0] [receive] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Channel 00/0 : 2[0] -> 0[0] [send] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Channel 01/0 : 2[0] -> 0[0] [send] via NET/IB/0
server2:20893:20907 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[0] [receive] via NET/IB/0
server2:20893:20907 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[0] [receive] via NET/IB/0
server2:20893:20907 [0] NCCL INFO Channel 00/0 : 1[0] -> 2[0] [send] via NET/IB/0
server2:20893:20907 [0] NCCL INFO Channel 01/0 : 1[0] -> 2[0] [send] via NET/IB/0
server1:18697:18712 [0] NCCL INFO Channel 00/0 : 2[0] -> 0[0] [receive] via NET/IB/0
server1:18697:18712 [0] NCCL INFO Channel 01/0 : 2[0] -> 0[0] [receive] via NET/IB/0
server1:18697:18712 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[0] [send] via NET/IB/0
server1:18697:18712 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[0] [send] via NET/IB/0
server3:2458:2475 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
server1:18697:18714 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
server2:20893:20909 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
server1:18697:18712 [0] NCCL INFO Connected all rings
server1:18697:18712 [0] NCCL INFO Channel 01/0 : 1[0] -> 0[0] [receive] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Connected all rings
server2:20893:20907 [0] NCCL INFO Connected all rings
server1:18697:18712 [0] NCCL INFO Channel 00/0 : 0[0] -> 2[0] [send] via NET/IB/0
server2:20893:20907 [0] NCCL INFO Channel 00/0 : 2[0] -> 1[0] [receive] via NET/IB/0
server1:18697:18712 [0] NCCL INFO Channel 01/0 : 0[0] -> 2[0] [send] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Channel 00/0 : 0[0] -> 2[0] [receive] via NET/IB/0
server2:20893:20907 [0] NCCL INFO Channel 01/0 : 1[0] -> 0[0] [send] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Channel 01/0 : 0[0] -> 2[0] [receive] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Channel 00/0 : 2[0] -> 1[0] [send] via NET/IB/0
server3:2458:2473 [0] NCCL INFO Connected all trees
server1:18697:18712 [0] NCCL INFO Connected all trees
server1:18697:18712 [0] NCCL INFO NCCL_ALGO set by environment to ring
server3:2458:2473 [0] NCCL INFO NCCL_ALGO set by environment to ring
server3:2458:2473 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512
server3:2458:2473 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
server2:20893:20907 [0] NCCL INFO Connected all trees
server2:20893:20907 [0] NCCL INFO NCCL_ALGO set by environment to ring
server2:20893:20907 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512
server2:20893:20907 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
server1:18697:18712 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512
server1:18697:18712 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
server2:20893:20907 [0] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
server2:20893:20907 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
server2:20893:20907 [0] NCCL INFO ncclCommInitRank comm 0x2a1af20 rank 1 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init COMPLETE
server3:2458:2473 [0] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
server3:2458:2473 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
server3:2458:2473 [0] NCCL INFO ncclCommInitRank comm 0x346ffc0 rank 2 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init COMPLETE
server1:18697:18712 [0] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
server1:18697:18712 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
server1:18697:18712 [0] NCCL INFO ncclCommInitRank comm 0x23622c0 rank 0 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init COMPLETE
#
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
         128            32     float     sum      -1    28.39    0.00    0.01      0    27.35    0.00    0.01      0
         256            64     float     sum      -1    29.44    0.01    0.01      0    28.54    0.01    0.01      0
         512           128     float     sum      -1    29.99    0.02    0.02      0    29.66    0.02    0.02      0
        1024           256     float     sum      -1    32.89    0.03    0.04      0    30.64    0.03    0.04      0
        2048           512     float     sum      -1    34.81    0.06    0.08      0    31.87    0.06    0.09      0
        4096          1024     float     sum      -1    37.32    0.11    0.15      0    36.09    0.11    0.15      0
        8192          2048     float     sum      -1    45.11    0.18    0.24      0    43.12    0.19    0.25      0
       16384          4096     float     sum      -1    57.92    0.28    0.38      0    56.98    0.29    0.38      0
       32768          8192     float     sum      -1    72.68    0.45    0.60      0    70.79    0.46    0.62      0
       65536         16384     float     sum      -1    95.77    0.68    0.91      0    93.73    0.70    0.93      0
      131072         32768     float     sum      -1    162.7    0.81    1.07      0    161.5    0.81    1.08      0
      262144         65536     float     sum      -1    177.3    1.48    1.97      0    177.4    1.48    1.97      0
      524288        131072     float     sum      -1    301.4    1.74    2.32      0    302.0    1.74    2.31      0
     1048576        262144     float     sum      -1    557.9    1.88    2.51      0    559.2    1.88    2.50      0
     2097152        524288     float     sum      -1   1089.8    1.92    2.57      0   1092.2    1.92    2.56      0
     4194304       1048576     float     sum      -1   2165.7    1.94    2.58      0   2166.6    1.94    2.58      0
     8388608       2097152     float     sum      -1   4315.7    1.94    2.59      0   4316.1    1.94    2.59      0
    16777216       4194304     float     sum      -1   8528.8    1.97    2.62      0   8529.3    1.97    2.62      0
    33554432       8388608     float     sum      -1    16622    2.02    2.69      0    16610    2.02    2.69      0
    67108864      16777216     float     sum      -1    32602    2.06    2.74      0    32542    2.06    2.75      0
   134217728      33554432     float     sum      -1    63946    2.10    2.80      0    63831    2.10    2.80      0
   268435456      67108864     float     sum      -1   126529    2.12    2.83      0   126412    2.12    2.83      0
   536870912     134217728     float     sum      -1   251599    2.13    2.85      0   251327    2.14    2.85      0
  1073741824     268435456     float     sum      -1   500664    2.14    2.86      0   501911    2.14    2.85      0
  2147483648     536870912     float     sum      -1  1001415    2.14    2.86      0  1000178    2.15    2.86      0
  4294967296    1073741824     float     sum      -1  1999361    2.15    2.86      0  1997380    2.15    2.87      0
server1:18697:18697 [0] NCCL INFO comm 0x23622c0 rank 0 nranks 3 cudaDev 0 busId 2000 - Destroy COMPLETE
server2:20893:20893 [0] NCCL INFO comm 0x2a1af20 rank 1 nranks 3 cudaDev 0 busId 2000 - Destroy COMPLETE
server3:2458:2458 [0] NCCL INFO comm 0x346ffc0 rank 2 nranks 3 cudaDev 0 busId 2000 - Destroy COMPLETE
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 1.66163 
#

结果详解：

– size (B)：操作处理的数据的大小，以字节为单位；

– count (elements)：操作处理的元素的数量；

– type：元素的数据类型；

– redop：使用的归约操作；

– root：对于某些操作（如 reduce 和 broadcast），这列指定了根节点的编号，值是 -1 表示这个操作没有根节点（all-reduce 操作涉及到所有的节点）；

– time (us)：操作的执行时间，以微秒为单位；

– algbw (GB/s)：算法带宽，以每秒吉字节（GB/s）为单位；

– busbw (GB/s)：总线带宽，以每秒吉字节（GB/s）为单位；

– wrong：错误的数量，如果这个值不是 0，那可能表示有一些错误发生。

在这个例子中，你可以看到，当处理的数据量增大时，算法带宽和总线带宽都有所提高，这可能表示 NCCL 能够有效地利用大量的数据。

查看结果时，需要关注如下几点：

1. 数据量增加时，带宽是否会下降（下降明显不符合预期）；

2. 更关注带宽的峰值，每次算到的带宽峰值，可以只关注 in 或者 out；

3. 平均值，在数据量递增的情况下，可能无法体现最终的结果；

4. 请确保数据量足够大，可以压到带宽上限（通过调整 b、e 或者 n 选项）。

2.3.3 常用参数及解释

– GPU 数量

– -t,–nthreads <num threads> 每个进程的线程数量配置，默认 1；

– -g,–ngpus <GPUs per thread> 每个线程的 GPU 数量，默认 1；

– 数据大小配置

– -b,–minbytes <min size in bytes> 开始的最小数据量，默认 32M；

– -e,–maxbytes <max size in bytes> 结束的最大数据量，默认 32M；

– 数据步长设置

– -i,–stepbytes <increment size> 每次增加的数据量，默认: 1M；

– -f,–stepfactor <increment factor> 每次增加的倍数，默认禁用；

– NCCL 操作相关配置

– -o,–op <sum/prod/min/max/avg/all>指定那种操作为reduce，仅适用于Allreduce、Reduce或ReduceScatter等缩减操作。默认值为：求和（Sum）；

– -d,–datatype <nccltype/all>指定使用哪种数据类型，默认 : Float；

– 性能相关配置

– -n,–iters <iteration count> 每次操作（一次发送）循环多少次，默认 : 20；

– -w,–warmup_iters <warmup iteration count> 预热迭代次数（不计时），默认：5；

– -m,–agg_iters <aggregation count> 每次迭代中要聚合在一起的操作数，默认：1；

– -a,–average <0/1/2/3> 在所有 ranks 计算均值作为最终结果 (MPI=1 only). <0=Rank0,1=Avg,2=Min,3=Max>，默认：1；

– 测试相关配置

– -p,–parallel_init <0/1> 使用线程并行初始化 NCCL，默认: 0；

– -c,–check <0/1> 检查结果的正确性。在大量GPU上可能会非常慢，默认：1；

– -z,–blocking <0/1> 使NCCL集合阻塞，即在每个集合之后让CPU等待和同步，默认：0；

– -G,–cudagraph <num graph launches> 将迭代作为CUDA图形捕获，然后重复指定的次数，默认：0；

3 实验测试

完成硬件、软件的选型和配置后，下一步将进行实践测试。

3.1.1 获取LLaMA-Factory源码包

因为网络问题很难直接通过git clone命令行拉取，建议通过打包下载后自己上传的方式进行：

noone@MacBook-Air Downloads % scp LLaMA-Factory-0.8.3.zip root@10.230.1.13:/tmp

[root@server3 AIGC]# pwd
/home/lichao/AIGC
[root@server3 AIGC]# cp /tmp/LLaMA-Factory-0.8.3.zip ./
[root@server3 AIGC]# unzip LLaMA-Factory-0.8.3.zip
[root@server3 AIGC]# cd LLaMA-Factory-0.8.3
[root@server3 LLaMA-Factory-0.8.3]# ll
总用量 128
drwxr-xr-x  2 root root    83 9月  13 05:04 assets
drwxr-xr-x  2 root root   122 9月   6 08:26 cache
-rw-r--r--  1 root root  1378 7月  18 19:36 CITATION.cff
drwxr-xr-x  6 root root  4096 9月  13 05:03 data
drwxr-xr-x  4 root root    43 7月  18 19:36 docker
drwxr-xr-x  5 root root    44 7月  18 19:36 evaluation
drwxr-xr-x 10 root root   182 7月  18 19:36 examples
-rw-r--r--  1 root root 11324 7月  18 19:36 LICENSE
-rw-r--r--  1 root root   242 7月  18 19:36 Makefile
-rw-r--r--  1 root root    33 7月  18 19:36 MANIFEST.in
-rw-r--r--  1 root root   645 7月  18 19:36 pyproject.toml
-rw-r--r--  1 root root 44424 7月  18 19:36 README.md
-rw-r--r--  1 root root 44093 7月  18 19:36 README_zh.md
-rw-r--r--  1 root root   245 7月  18 19:36 requirements.txt
drwxr-xr-x  3 root root    16 9月   6 18:48 saves
drwxr-xr-x  2 root root   219 7月  18 19:36 scripts
-rw-r--r--  1 root root  3361 7月  18 19:36 setup.py
drwxr-xr-x  4 root root   101 9月   6 08:22 src
drwxr-xr-x  5 root root    43 7月  18 19:36 tests
[root@server3 LLaMA-Factory-0.8.3]#

3.1.2 安装LLaMA-Factory，并进行验证

[root@server3 LLaMA-Factory-0.8.3]# pip install -e ".[torch,metrics]"
[root@server3 LLaMA-Factory-0.8.3]# llamafactory-cli version
[2024-09-23 08:51:28,722] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
----------------------------------------------------------
| Welcome to LLaMA Factory, version 0.8.3                |
|                                                        |
| Project page: https://github.com/hiyouga/LLaMA-Factory |
----------------------------------------------------------
[root@server3 LLaMA-Factory-0.8.3]#

3.1.3 下载训练时所需的预训练模型和数据集

根据当前GPU服务器所配置的GPU硬件规格，选择适合的训练方法、模型和数据集。

GPU型号：NVIDIA GeForce RTX 4060 Ti 16GB

预训练模型：Qwen/Qwen1.5-0.5B-Chat

数据集：identity、alpaca_zh_demo

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://hf-mirror.com/Qwen/Qwen1.5-0.5B-Chat
# If you want to clone without large files - just their pointers
GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/Qwen/Qwen1.5-0.5B-Chat

因为网络问题通过命令行很难直接下载，这里使用huggingface的国内镜像站拉取预训练模型数据，并使用“GIT_LFS_SKIP_SMUDGE=1”变量跳过大文件，随后手工下载后再上传。

如果觉得麻烦，也可以安装使用huggingface的命令行工具，下载预训练模型和数据集。同样地，安装完成后，需要配置一些环境变量（使用镜像站hf-mirror.com）来解决网络问题。

下载预训练模型：
[root@server3 AIGC]# mkdir models
[root@server3 AIGC]# cd models/
[root@server3 models]# GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/Qwen/Qwen1.5-0.5B-Chat
[root@server3 models]# tree -h Qwen1.5-0.5B-Chat/
Qwen1.5-0.5B-Chat/
├── [ 656]  config.json
├── [ 661]  config.json.raw
├── [ 206]  generation_config.json
├── [7.1K]  LICENSE
├── [1.6M]  merges.txt
├── [1.2G]  model.safetensors
├── [4.2K]  README.md
├── [1.3K]  tokenizer_config.json
├── [6.7M]  tokenizer.json
└── [2.6M]  vocab.json

0 directories, 10 files
[root@server3 models]# 

下载数据集：默认情况下，LLaMA-Factory项目文件下的data目录，自带了一些本地数据集可直接使用。
[root@server3 LLaMA-Factory-0.8.3]# tree -h data/
data/
├── [841K]  alpaca_en_demo.json
├── [621K]  alpaca_zh_demo.json
├── [  32]  belle_multiturn
│   └── [2.7K]  belle_multiturn.py
├── [733K]  c4_demo.json
├── [ 13K]  dataset_info.json
├── [1.5M]  dpo_en_demo.json
├── [833K]  dpo_zh_demo.json
├── [722K]  glaive_toolcall_en_demo.json
├── [665K]  glaive_toolcall_zh_demo.json
├── [  27]  hh_rlhf_en
│   └── [3.3K]  hh_rlhf_en.py
├── [ 20K]  identity.json
├── [892K]  kto_en_demo.json
├── [  45]  mllm_demo_data
│   ├── [ 12K]  1.jpg
│   ├── [ 22K]  2.jpg
│   └── [ 16K]  3.jpg
├── [3.1K]  mllm_demo.json
├── [9.8K]  README.md
├── [9.2K]  README_zh.md
├── [  27]  ultra_chat
│   └── [2.3K]  ultra_chat.py
└── [1004K]  wiki_demo.txt

4 directories, 20 files
[root@server3 LLaMA-Factory-0.8.3]#

3.1.4 使用准备好的模型与数据集，在单机上进行训练测试

LLaMA-Factory支持通过WebUI微调大语言模型。在完成安装后，我们可以使用WebUI进行快速调测验证，没问题后可使用命令行工具进行多机分布式训练。

[root@server3 LLaMA-Factory-0.8.3]# llamafactory-cli webui
[2024-09-23 17:54:45,786] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Running on local URL:  http://0.0.0.0:7861

To create a public link, set `share=True` in `launch()`.

3.1.5 使用命令行运行多机分布式训练任务

1. 准备目录
[root@server3 LLaMA-Factory-0.8.3]# mkdir asterun
[root@server3 LLaMA-Factory-0.8.3]# mkdir -p asterun/saves/qwen/full/sft
2. 根据集群环境和训练任务，准备分布式训练的配置文件
[root@server3 LLaMA-Factory-0.8.3]# cat asterun/qwen_full_sft_ds2.yaml 
### model
model_name_or_path: /home/lichao/AIGC/models/Qwen1.5-0.5B-Chat

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_config.json

### dataset
dataset: identity,alpaca_zh_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: asterun/saves/qwen/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

report_to: tensorboard
logging_dir: asterun/saves/qwen/full/sft/runs


### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
[root@server3 LLaMA-Factory-0.8.3]# 
3. 用同样的方式，在Server1和Server2上准备运行环境
步骤略。
4. 依次在集群中的3个GPU节点上启动分布式训练任务
主节点rank0：
[root@server3 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=0 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml
从节点rank1：
[root@server2 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=1 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml
从节点rank2：
[root@server1 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=2 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml

附件-分布式训练全流程的终端打印日志：

[root@server3 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=0 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml 
[2024-09-23 10:01:33,036] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
09/23/2024 10:01:37 - INFO - llamafactory.cli - Initializing distributed tasks at: 172.16.0.13:29500
[2024-09-23 10:01:52,891] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-23 10:01:56,575] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-23 10:01:56,575] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
09/23/2024 10:01:56 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2267] 2024-09-23 10:01:56,613 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2267] 2024-09-23 10:01:56,613 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2267] 2024-09-23 10:01:56,613 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2267] 2024-09-23 10:01:56,614 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2267] 2024-09-23 10:01:56,614 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2267] 2024-09-23 10:01:56,614 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2513] 2024-09-23 10:01:56,941 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
09/23/2024 10:01:56 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
09/23/2024 10:01:56 - WARNING - llamafactory.data.template - New tokens have been added, make sure `resize_vocab` is True.
09/23/2024 10:01:56 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 347.58 examples/s]
09/23/2024 10:01:58 - INFO - llamafactory.data.loader - Loading dataset alpaca_zh_demo.json...
Converting format of dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 4042.14 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:02<00:00, 476.63 examples/s]
training example:
input_ids:
[27, 91, 2468, 8757, 842, 91, 29, 872, 27, 91, 408, 8757, 842, 91, 1339, 6023, 151646, 27, 91, 2468, 8757, 842, 91, 29, 77091, 27, 91, 408, 8757, 842, 91, 1339, 9707, 0, 358, 1079, 5867, 606, 38154, 458, 15235, 17847, 7881, 553, 5867, 3094, 3417, 13, 2585, 646, 358, 7789, 498, 3351, 30, 151646]
inputs:
<|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9707, 0, 358, 1079, 5867, 606, 38154, 458, 15235, 17847, 7881, 553, 5867, 3094, 3417, 13, 2585, 646, 358, 7789, 498, 3351, 30, 151646]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-09-23 10:02:03,983 >> loading configuration file /home/lichao/AIGC/models/Qwen1.5-0.5B-Chat/config.json
[INFO|configuration_utils.py:800] 2024-09-23 10:02:03,986 >> Model config Qwen2Config {
  "_name_or_path": "/home/lichao/AIGC/models/Qwen1.5-0.5B-Chat",
  "architectures": [
    "Qwen2Config"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0.dev0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

[INFO|modeling_utils.py:3654] 2024-09-23 10:02:04,036 >> loading weights file /home/lichao/AIGC/models/Qwen1.5-0.5B-Chat/model.safetensors
[INFO|modeling_utils.py:1585] 2024-09-23 10:02:04,058 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-09-23 10:02:04,062 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}

[INFO|modeling_utils.py:4489] 2024-09-23 10:02:05,417 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4497] 2024-09-23 10:02:05,417 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /home/lichao/AIGC/models/Qwen1.5-0.5B-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-09-23 10:02:05,421 >> loading configuration file /home/lichao/AIGC/models/Qwen1.5-0.5B-Chat/generation_config.json
[INFO|configuration_utils.py:1038] 2024-09-23 10:02:05,421 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "top_p": 0.8
}

09/23/2024 10:02:05 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
09/23/2024 10:02:05 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
09/23/2024 10:02:05 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
09/23/2024 10:02:05 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
09/23/2024 10:02:05 - INFO - llamafactory.model.loader - trainable params: 463,987,712 || all params: 463,987,712 || trainable%: 100.0000
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:655] 2024-09-23 10:02:05,593 >> Using auto half precision backend
[2024-09-23 10:02:06,167] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.15.1, git-hash=unknown, git-branch=unknown
[2024-09-23 10:02:06,167] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 3
[2024-09-23 10:02:06,406] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-09-23 10:02:06,408] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-09-23 10:02:06,408] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-09-23 10:02:06,424] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2024-09-23 10:02:06,424] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2024-09-23 10:02:06,424] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-09-23 10:02:06,424] [INFO] [stage_1_and_2.py:148:__init__] Reduce bucket size 500000000
[2024-09-23 10:02:06,424] [INFO] [stage_1_and_2.py:149:__init__] Allgather bucket size 500000000
[2024-09-23 10:02:06,424] [INFO] [stage_1_and_2.py:150:__init__] CPU Offload: False
[2024-09-23 10:02:06,424] [INFO] [stage_1_and_2.py:151:__init__] Round robin gradient partitioning: True
[2024-09-23 10:02:08,342] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2024-09-23 10:02:08,343] [INFO] [utils.py:782:see_memory_usage] MA 1.63 GB         Max_MA 1.63 GB         CA 1.75 GB         Max_CA 2 GB 
[2024-09-23 10:02:08,343] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 6.67 GB, percent = 5.3%
[2024-09-23 10:02:08,568] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2024-09-23 10:02:08,569] [INFO] [utils.py:782:see_memory_usage] MA 1.63 GB         Max_MA 2.2 GB         CA 2.33 GB         Max_CA 2 GB 
[2024-09-23 10:02:08,570] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 6.67 GB, percent = 5.3%
[2024-09-23 10:02:08,570] [INFO] [stage_1_and_2.py:543:__init__] optimizer state initialized
[2024-09-23 10:02:08,792] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2024-09-23 10:02:08,793] [INFO] [utils.py:782:see_memory_usage] MA 1.63 GB         Max_MA 1.63 GB         CA 2.33 GB         Max_CA 2 GB 
[2024-09-23 10:02:08,793] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 6.67 GB, percent = 5.3%
[2024-09-23 10:02:08,794] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2024-09-23 10:02:08,794] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = None
[2024-09-23 10:02:08,794] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-09-23 10:02:08,795] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2024-09-23 10:02:08,796] [INFO] [config.py:999:print] DeepSpeedEngine configuration:
[2024-09-23 10:02:08,796] [INFO] [config.py:1003:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2024-09-23 10:02:08,796] [INFO] [config.py:1003:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2024-09-23 10:02:08,796] [INFO] [config.py:1003:print]   amp_enabled .................. False
[2024-09-23 10:02:08,796] [INFO] [config.py:1003:print]   amp_params ................... False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   bfloat16_enabled ............. True
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   bfloat16_immediate_grad_update  False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   checkpoint_parallel_write_pipeline  False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   checkpoint_tag_validation_enabled  True
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   checkpoint_tag_validation_fail  False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f0d52b5d3d0>
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   communication_data_type ...... None
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   curriculum_enabled_legacy .... False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   curriculum_params_legacy ..... False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   data_efficiency_enabled ...... False
[2024-09-23 10:02:08,797] [INFO] [config.py:1003:print]   dataloader_drop_last ......... False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   disable_allgather ............ False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   dump_state ................... False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   dynamic_loss_scale_args ...... None
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_enabled ........... False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_gas_boundary_resolution  1
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_layer_num ......... 0
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_max_iter .......... 100
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_stability ......... 1e-06
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_tol ............... 0.01
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   eigenvalue_verbose ........... False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   elasticity_enabled ........... False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   fp16_auto_cast ............... None
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   fp16_enabled ................. False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   fp16_master_weights_and_gradients  False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   global_rank .................. 0
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   grad_accum_dtype ............. None
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   gradient_accumulation_steps .. 2
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   gradient_clipping ............ 1.0
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   gradient_predivide_factor .... 1.0
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   graph_harvesting ............. False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   initial_dynamic_scale ........ 1
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   load_universal_checkpoint .... False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   loss_scale ................... 1.0
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   memory_breakdown ............. False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   mics_hierarchial_params_gather  False
[2024-09-23 10:02:08,798] [INFO] [config.py:1003:print]   mics_shard_size .............. -1
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName')
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   optimizer_legacy_fusion ...... False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   optimizer_name ............... None
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   optimizer_params ............. None
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   pld_enabled .................. False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   pld_params ................... False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   prescale_gradients ........... False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   scheduler_name ............... None
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   scheduler_params ............. None
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   seq_parallel_communication_data_type  torch.float32
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   sparse_attention ............. None
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   sparse_gradients_enabled ..... False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   steps_per_print .............. inf
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   timers_config ................ enabled=True synchronized=True
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   train_batch_size ............. 6
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   train_micro_batch_size_per_gpu  1
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   use_data_before_expert_parallel_  False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   use_node_local_storage ....... False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   wall_clock_breakdown ......... False
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   weight_quantization_config ... None
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   world_size ................... 3
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   zero_allow_untested_optimizer  True
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=True zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   zero_enabled ................. True
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   zero_force_ds_cpu_optimizer .. True
[2024-09-23 10:02:08,799] [INFO] [config.py:1003:print]   zero_optimization_stage ...... 2
[2024-09-23 10:02:08,800] [INFO] [config.py:989:print_user_config]   json = {
    "train_batch_size": 6, 
    "train_micro_batch_size_per_gpu": 1, 
    "gradient_accumulation_steps": 2, 
    "gradient_clipping": 1.0, 
    "zero_allow_untested_optimizer": true, 
    "fp16": {
        "enabled": false, 
        "loss_scale": 0, 
        "loss_scale_window": 1000, 
        "initial_scale_power": 16, 
        "hysteresis": 2, 
        "min_loss_scale": 1
    }, 
    "bf16": {
        "enabled": true
    }, 
    "zero_optimization": {
        "stage": 2, 
        "allgather_partitions": true, 
        "allgather_bucket_size": 5.000000e+08, 
        "overlap_comm": true, 
        "reduce_scatter": true, 
        "reduce_bucket_size": 5.000000e+08, 
        "contiguous_gradients": true, 
        "round_robin_gradients": true
    }, 
    "steps_per_print": inf
}
[INFO|trainer.py:2141] 2024-09-23 10:02:08,800 >> ***** Running training *****
[INFO|trainer.py:2142] 2024-09-23 10:02:08,800 >>   Num examples = 981
[INFO|trainer.py:2143] 2024-09-23 10:02:08,800 >>   Num Epochs = 3
[INFO|trainer.py:2144] 2024-09-23 10:02:08,800 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2147] 2024-09-23 10:02:08,800 >>   Total train batch size (w. parallel, distributed & accumulation) = 6
[INFO|trainer.py:2148] 2024-09-23 10:02:08,800 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2149] 2024-09-23 10:02:08,800 >>   Total optimization steps = 489
[INFO|trainer.py:2150] 2024-09-23 10:02:08,801 >>   Number of trainable parameters = 463,987,712
  0%|                                                                                                                                             | 0/489 [00:00<?, ?it/s]/home/lichao/opt/python3.11.9/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
{'loss': 2.3658, 'grad_norm': 25.19988250732422, 'learning_rate': 2.0408163265306123e-05, 'epoch': 0.06}                                                                  
{'loss': 2.6136, 'grad_norm': 9.38448429107666, 'learning_rate': 4.0816326530612245e-05, 'epoch': 0.12}                                                                   
{'loss': 2.2796, 'grad_norm': 13.728240013122559, 'learning_rate': 6.122448979591838e-05, 'epoch': 0.18}                                                                  
{'loss': 2.1511, 'grad_norm': 18.125511169433594, 'learning_rate': 8.163265306122449e-05, 'epoch': 0.24}                                                                  
{'loss': 2.3712, 'grad_norm': 22.641611099243164, 'learning_rate': 9.999872552137497e-05, 'epoch': 0.31}                                                                  
{'loss': 2.3982, 'grad_norm': 19.40285301208496, 'learning_rate': 9.98458666866564e-05, 'epoch': 0.37}                                                                    
{'loss': 2.5063, 'grad_norm': 11.834580421447754, 'learning_rate': 9.943900474099748e-05, 'epoch': 0.43}                                                                  
{'loss': 2.4219, 'grad_norm': 11.096634864807129, 'learning_rate': 9.878021295961217e-05, 'epoch': 0.49}                                                                  
{'loss': 2.5318, 'grad_norm': 11.01838493347168, 'learning_rate': 9.787284839440982e-05, 'epoch': 0.55}                                                                   
{'loss': 2.6357, 'grad_norm': 15.102975845336914, 'learning_rate': 9.672153476722816e-05, 'epoch': 0.61}                                                                  
{'loss': 2.5858, 'grad_norm': 11.936942100524902, 'learning_rate': 9.533213890840657e-05, 'epoch': 0.67}                                                                  
{'loss': 2.3013, 'grad_norm': 10.956372261047363, 'learning_rate': 9.371174086076363e-05, 'epoch': 0.73}                                                                  
{'loss': 2.443, 'grad_norm': 11.979649543762207, 'learning_rate': 9.186859780132164e-05, 'epoch': 0.8}                                                                    
{'loss': 2.4357, 'grad_norm': 7.360419273376465, 'learning_rate': 8.981210196462533e-05, 'epoch': 0.86}                                                                   
{'loss': 2.5534, 'grad_norm': 14.005857467651367, 'learning_rate': 8.755273278206749e-05, 'epoch': 0.92}                                                                  
{'loss': 2.5753, 'grad_norm': 9.832633018493652, 'learning_rate': 8.510200348110868e-05, 'epoch': 0.98}                                                                   
{'loss': 1.7594, 'grad_norm': 10.028552055358887, 'learning_rate': 8.247240241650918e-05, 'epoch': 1.04}                                                                  
{'loss': 1.4025, 'grad_norm': 12.267614364624023, 'learning_rate': 7.967732943253571e-05, 'epoch': 1.1}                                                                   
{'loss': 1.1433, 'grad_norm': 7.551489353179932, 'learning_rate': 7.673102758042653e-05, 'epoch': 1.16}                                                                   
{'loss': 1.2479, 'grad_norm': 8.397479057312012, 'learning_rate': 7.364851053906718e-05, 'epoch': 1.22}                                                                   
{'loss': 1.1978, 'grad_norm': 9.697928428649902, 'learning_rate': 7.044548610872434e-05, 'epoch': 1.28}                                                                   
{'loss': 1.1877, 'grad_norm': 14.016590118408203, 'learning_rate': 6.713827616769614e-05, 'epoch': 1.35}                                                                  
{'loss': 1.2349, 'grad_norm': 11.697397232055664, 'learning_rate': 6.374373349976169e-05, 'epoch': 1.41}                                                                  
{'loss': 1.214, 'grad_norm': 8.01415729522705, 'learning_rate': 6.027915591625804e-05, 'epoch': 1.47}                                                                     
{'loss': 1.1724, 'grad_norm': 8.013666152954102, 'learning_rate': 5.6762198110398444e-05, 'epoch': 1.53}                                                                  
{'loss': 1.2709, 'grad_norm': 10.372663497924805, 'learning_rate': 5.3210781693002754e-05, 'epoch': 1.59}                                                                 
{'loss': 1.1069, 'grad_norm': 14.193530082702637, 'learning_rate': 4.964300386807653e-05, 'epoch': 1.65}                                                                  
{'loss': 1.3013, 'grad_norm': 14.019328117370605, 'learning_rate': 4.607704521360776e-05, 'epoch': 1.71}                                                                  
{'loss': 1.2138, 'grad_norm': 11.885704040527344, 'learning_rate': 4.253107703750875e-05, 'epoch': 1.77}                                                                  
{'loss': 1.1027, 'grad_norm': 8.35533332824707, 'learning_rate': 3.9023168780796294e-05, 'epoch': 1.83}                                                                   
{'loss': 1.1346, 'grad_norm': 12.683867454528809, 'learning_rate': 3.557119593986208e-05, 'epoch': 1.9}                                                                   
{'loss': 1.0305, 'grad_norm': 7.334381580352783, 'learning_rate': 3.219274897704053e-05, 'epoch': 1.96}                                                                   
{'loss': 0.9327, 'grad_norm': 4.699033737182617, 'learning_rate': 2.8905043683644872e-05, 'epoch': 2.02}                                                                  
{'loss': 0.5392, 'grad_norm': 5.634421348571777, 'learning_rate': 2.5724833452240792e-05, 'epoch': 2.08}                                                                  
{'loss': 0.5446, 'grad_norm': 5.442759990692139, 'learning_rate': 2.2668323905198108e-05, 'epoch': 2.14}                                                                  
{'loss': 0.4084, 'grad_norm': 5.1523966789245605, 'learning_rate': 1.9751090314553878e-05, 'epoch': 2.2}                                                                  
{'loss': 0.4885, 'grad_norm': 6.668193340301514, 'learning_rate': 1.698799823399628e-05, 'epoch': 2.26}                                                                   
{'loss': 0.4697, 'grad_norm': 5.780378818511963, 'learning_rate': 1.4393127747410417e-05, 'epoch': 2.32}                                                                  
{'loss': 0.4652, 'grad_norm': 4.824888706207275, 'learning_rate': 1.1979701719998453e-05, 'epoch': 2.39}                                                                  
{'loss': 0.4356, 'grad_norm': 12.217597961425781, 'learning_rate': 9.760018417589334e-06, 'epoch': 2.45}                                                                  
{'loss': 0.4252, 'grad_norm': 5.763933181762695, 'learning_rate': 7.745388837495188e-06, 'epoch': 2.51}                                                                   
{'loss': 0.4486, 'grad_norm': 8.276981353759766, 'learning_rate': 5.946079070261773e-06, 'epoch': 2.57}                                                                   
{'loss': 0.4308, 'grad_norm': 12.236105918884277, 'learning_rate': 4.371257986024202e-06, 'epoch': 2.63}                                                                  
{'loss': 0.4139, 'grad_norm': 5.1657185554504395, 'learning_rate': 3.0289505120464743e-06, 'epoch': 2.69}                                                                 
{'loss': 0.3718, 'grad_norm': 6.259467124938965, 'learning_rate': 1.925996739531577e-06, 'epoch': 2.75}                                                                   
{'loss': 0.3833, 'grad_norm': 8.667612075805664, 'learning_rate': 1.0680170680846259e-06, 'epoch': 2.81}                                                                  
{'loss': 0.4498, 'grad_norm': 7.922170639038086, 'learning_rate': 4.593835654447709e-07, 'epoch': 2.87}                                                                   
{'loss': 0.4422, 'grad_norm': 5.631829261779785, 'learning_rate': 1.0319768843018996e-07, 'epoch': 2.94}                                                                  
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 489/489 [26:28<00:00,  3.26s/it][INFO|trainer.py:3510] 2024-09-23 10:28:37,461 >> Saving model checkpoint to asterun/saves/qwen/full/sft/checkpoint-489
[INFO|configuration_utils.py:472] 2024-09-23 10:28:37,464 >> Configuration saved in asterun/saves/qwen/full/sft/checkpoint-489/config.json
[INFO|configuration_utils.py:807] 2024-09-23 10:28:37,464 >> Configuration saved in asterun/saves/qwen/full/sft/checkpoint-489/generation_config.json
[INFO|modeling_utils.py:2778] 2024-09-23 10:28:43,244 >> Model weights saved in asterun/saves/qwen/full/sft/checkpoint-489/model.safetensors
[INFO|tokenization_utils_base.py:2684] 2024-09-23 10:28:43,251 >> tokenizer config file saved in asterun/saves/qwen/full/sft/checkpoint-489/tokenizer_config.json
[INFO|tokenization_utils_base.py:2693] 2024-09-23 10:28:43,252 >> Special tokens file saved in asterun/saves/qwen/full/sft/checkpoint-489/special_tokens_map.json
[2024-09-23 10:28:43,459] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step489 is about to be saved!
[2024-09-23 10:28:43,470] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: asterun/saves/qwen/full/sft/checkpoint-489/global_step489/mp_rank_00_model_states.pt
[2024-09-23 10:28:43,470] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving asterun/saves/qwen/full/sft/checkpoint-489/global_step489/mp_rank_00_model_states.pt...
[2024-09-23 10:28:48,175] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved asterun/saves/qwen/full/sft/checkpoint-489/global_step489/mp_rank_00_model_states.pt.
[2024-09-23 10:28:48,178] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving asterun/saves/qwen/full/sft/checkpoint-489/global_step489/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-09-23 10:28:57,930] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved asterun/saves/qwen/full/sft/checkpoint-489/global_step489/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-09-23 10:28:57,931] [INFO] [engine.py:3536:_save_zero_checkpoint] zero checkpoint saved asterun/saves/qwen/full/sft/checkpoint-489/global_step489/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-09-23 10:28:57,931] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step489 is ready now!
[INFO|trainer.py:2401] 2024-09-23 10:28:57,940 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 1609.1394, 'train_samples_per_second': 1.829, 'train_steps_per_second': 0.304, 'train_loss': 1.3682080348820287, 'epoch': 2.99}                         
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 489/489 [26:49<00:00,  3.29s/it]
[INFO|trainer.py:3510] 2024-09-23 10:28:58,466 >> Saving model checkpoint to asterun/saves/qwen/full/sft
[INFO|configuration_utils.py:472] 2024-09-23 10:28:58,470 >> Configuration saved in asterun/saves/qwen/full/sft/config.json
[INFO|configuration_utils.py:807] 2024-09-23 10:28:58,470 >> Configuration saved in asterun/saves/qwen/full/sft/generation_config.json
[INFO|modeling_utils.py:2778] 2024-09-23 10:29:04,536 >> Model weights saved in asterun/saves/qwen/full/sft/model.safetensors
[INFO|tokenization_utils_base.py:2684] 2024-09-23 10:29:04,552 >> tokenizer config file saved in asterun/saves/qwen/full/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2693] 2024-09-23 10:29:04,552 >> Special tokens file saved in asterun/saves/qwen/full/sft/special_tokens_map.json
***** train metrics *****
  epoch                    =     2.9908
  total_flos               =   772542GF
  train_loss               =     1.3682
  train_runtime            = 0:26:49.13
  train_samples_per_second =      1.829
  train_steps_per_second   =      0.304
Figure saved at: asterun/saves/qwen/full/sft/training_loss.png
09/23/2024 10:29:05 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
09/23/2024 10:29:05 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3826] 2024-09-23 10:29:05,042 >> 
***** Running Evaluation *****
[INFO|trainer.py:3828] 2024-09-23 10:29:05,042 >>   Num examples = 110
[INFO|trainer.py:3831] 2024-09-23 10:29:05,042 >>   Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 19.78it/s]
***** eval metrics *****
  epoch                   =     2.9908
  eval_loss               =     2.7517
  eval_runtime            = 0:00:01.92
  eval_samples_per_second =     57.029
  eval_steps_per_second   =     19.182
[INFO|modelcard.py:449] 2024-09-23 10:29:06,975 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
[root@server3 LLaMA-Factory-0.8.3]#

3.1.6 推理测试

安装GGUF库

下载llama.cpp源码包到服务器，解压到工作目录
[root@server3 AIGC]# unzip llama.cpp-master.zip
[root@server3 AIGC]# cd llama.cpp-master
[root@server3 llama.cpp-master]# ll
总用量 576
-rw-r--r--  1 root root  33717 9月  26 11:38 AUTHORS
drwxr-xr-x  2 root root     37 9月  26 11:38 ci
drwxr-xr-x  2 root root    164 9月  26 11:38 cmake
-rw-r--r--  1 root root   6591 9月  26 11:38 CMakeLists.txt
-rw-r--r--  1 root root   3164 9月  26 11:38 CMakePresets.json
drwxr-xr-x  3 root root   4096 9月  26 11:38 common
-rw-r--r--  1 root root   2256 9月  26 11:38 CONTRIBUTING.md
-rwxr-xr-x  1 root root 199470 9月  26 11:38 convert_hf_to_gguf.py
-rwxr-xr-x  1 root root  15993 9月  26 11:38 convert_hf_to_gguf_update.py
-rwxr-xr-x  1 root root  19106 9月  26 11:38 convert_llama_ggml_to_gguf.py
-rwxr-xr-x  1 root root  14901 9月  26 11:38 convert_lora_to_gguf.py
drwxr-xr-x  4 root root    109 9月  26 11:38 docs
drwxr-xr-x 43 root root   4096 9月  26 11:38 examples
-rw-r--r--  1 root root   1556 9月  26 11:38 flake.lock
-rw-r--r--  1 root root   7469 9月  26 11:38 flake.nix
drwxr-xr-x  5 root root     85 9月  26 11:38 ggml
drwxr-xr-x  6 root root    116 9月  26 11:38 gguf-py
drwxr-xr-x  2 root root    154 9月  26 11:38 grammars
drwxr-xr-x  2 root root     21 9月  26 11:38 include
-rw-r--r--  1 root root   1078 9月  26 11:38 LICENSE
-rw-r--r--  1 root root  50865 9月  26 11:38 Makefile
drwxr-xr-x  2 root root    163 9月  26 11:38 media
drwxr-xr-x  2 root root   4096 9月  26 11:38 models
-rw-r--r--  1 root root    163 9月  26 11:38 mypy.ini
-rw-r--r--  1 root root   2044 9月  26 11:38 Package.swift
drwxr-xr-x  3 root root     40 9月  26 11:38 pocs
-rw-r--r--  1 root root 124786 9月  26 11:38 poetry.lock
drwxr-xr-x  2 root root   4096 9月  26 11:38 prompts
-rw-r--r--  1 root root   1280 9月  26 11:38 pyproject.toml
-rw-r--r--  1 root root    528 9月  26 11:38 pyrightconfig.json
-rw-r--r--  1 root root  28481 9月  26 11:38 README.md
drwxr-xr-x  2 root root   4096 9月  26 11:38 requirements
-rw-r--r--  1 root root    505 9月  26 11:38 requirements.txt
drwxr-xr-x  2 root root   4096 9月  26 11:38 scripts
-rw-r--r--  1 root root   5090 9月  26 11:38 SECURITY.md
drwxr-xr-x  2 root root     97 9月  26 11:38 spm-headers
drwxr-xr-x  2 root root    289 9月  26 11:38 src
drwxr-xr-x  2 root root   4096 9月  26 11:38 tests
[root@server3 llama.cpp-master]# 

进入gguf-py子目录，安装GGUF库
[root@server3 llama.cpp-master]# cd gguf-py
[root@server3 gguf-py]# ll
总用量 12
drwxr-xr-x 2 root root   40 9月  26 11:38 examples
drwxr-xr-x 2 root root  230 9月  26 11:38 gguf
-rw-r--r-- 1 root root 1072 9月  26 11:38 LICENSE
-rw-r--r-- 1 root root 1049 9月  26 11:38 pyproject.toml
-rw-r--r-- 1 root root 2719 9月  26 11:38 README.md
drwxr-xr-x 2 root root  151 9月  26 11:38 scripts
drwxr-xr-x 2 root root   71 9月  26 11:38 tests
[root@server3 gguf-py]# pip install --editable .
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Obtaining file:///home/lichao/AIGC/llama.cpp-master/gguf-py
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.17 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (1.26.4)
Requirement already satisfied: pyyaml>=5.1 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (6.0.2)
Requirement already satisfied: sentencepiece<=0.2.0,>=0.1.98 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (0.2.0)
Requirement already satisfied: tqdm>=4.27 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (4.66.5)
Building wheels for collected packages: gguf
  Building editable for gguf (pyproject.toml) ... done
  Created wheel for gguf: filename=gguf-0.10.0-py3-none-any.whl size=3403 sha256=4a0851426e263076c64c9854be9dfe95493844062484d001fddb08c1be5fa2ca
  Stored in directory: /tmp/pip-ephem-wheel-cache-iiq8ofh3/wheels/80/80/9b/c6c23d750f4bd20fc0c2c75e51253d89c61a2369247fb694db
Successfully built gguf
Installing collected packages: gguf
Successfully installed gguf-0.10.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
[root@server3 gguf-py]#

模型格式转换

将之前微调训练生成的safetensors格式的模型，转换为gguf格式
[root@server3 gguf-py]# cd .. 
[root@server3 llama.cpp-master]# python3 convert_hf_to_gguf.py /home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft
INFO:hf-to-gguf:Loading model: sft
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,             torch.bfloat16 --> F16, shape = {1024, 151936}
INFO:hf-to-gguf:token_embd.weight,         torch.bfloat16 --> F16, shape = {1024, 151936}
INFO:hf-to-gguf:blk.0.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.0.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.0.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.0.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.0.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.0.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.0.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.0.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.0.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.1.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.1.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.1.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.1.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.1.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.1.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.1.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.1.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.10.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.10.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.10.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.10.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.10.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.10.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.10.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.10.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.10.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.11.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.11.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.11.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.11.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.11.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.11.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.11.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.11.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.11.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.12.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.12.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.12.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.12.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.12.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.12.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.12.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.12.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.13.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.13.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.13.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.13.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.13.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.13.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.13.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.13.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.13.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.14.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.14.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.14.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.14.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.14.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.14.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.14.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.14.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.14.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.15.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.15.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.15.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.15.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.15.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.15.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.15.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.15.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.15.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.16.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.16.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.16.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.16.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.16.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.16.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.16.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.16.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.16.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.17.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.17.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.17.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.17.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.17.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.17.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.17.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.17.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.17.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.18.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.18.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.18.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.18.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.18.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.18.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.18.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.18.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.19.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.19.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.19.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.19.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.19.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.19.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.19.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.19.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.19.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.2.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.2.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.2.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.2.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.2.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.2.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.2.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.2.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.20.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.20.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.20.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.20.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.20.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.20.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.20.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.20.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.20.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.21.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.21.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.21.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.21.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.21.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.21.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.21.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.21.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.21.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.22.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.22.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.22.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.22.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.22.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.22.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.22.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.22.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.22.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.23.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.23.ffn_down.weight,    torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.23.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.23.attn_k.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.23.attn_k.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.23.attn_q.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.23.attn_q.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.23.attn_v.bias,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.23.attn_v.weight,      torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.3.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.3.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.3.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.3.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.3.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.3.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.3.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.3.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.4.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.4.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.4.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.4.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.4.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.4.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.4.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.4.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.4.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.5.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.5.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.5.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.5.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.5.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.5.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.5.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.5.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.6.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.6.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.6.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.6.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.6.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.6.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.6.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.6.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.7.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.7.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.7.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.7.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.7.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.7.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.7.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.7.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.7.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.8.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.8.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.8.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.8.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.8.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.8.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.8.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.8.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.8.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.bfloat16 --> F16, shape = {2816, 1024}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1024, 2816}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.9.attn_k.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.9.attn_k.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.9.attn_q.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.9.attn_q.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:blk.9.attn_v.bias,         torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:blk.9.attn_v.weight,       torch.bfloat16 --> F16, shape = {1024, 1024}
INFO:hf-to-gguf:output_norm.weight,        torch.bfloat16 --> F32, shape = {1024}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 1024
INFO:hf-to-gguf:gguf: feed forward length = 2816
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 151387 merge(s).
INFO:gguf.vocab:Setting special token type eos to 151646
INFO:gguf.vocab:Setting special token type pad to 151643
INFO:gguf.vocab:Setting special token type bos to 151643
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|start_header_id|>system<|end_header_id|>

' + system_message + '<|eot_id|>' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|start_header_id|>user<|end_header_id|>

' + content + '<|eot_id|><|start_header_id|>assistant<|end_header_id|>

' }}{% elif message['role'] == 'assistant' %}{{ content + '<|eot_id|>' }}{% endif %}{% endfor %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft/Sft-620M-F16.gguf: n_tensors = 291, total_size = 1.2G
Writing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.24G/1.24G [00:03<00:00, 338Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft/Sft-620M-F16.gguf
[root@server3 llama.cpp-master]# cd /home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft
转换成功后，修改gguf格式的模型名称，方便后需使用辨认
[root@server3 sft]# ll
总用量 2883588
-rw-r--r-- 1 root root        104 9月  23 10:29 added_tokens.json
-rw-r--r-- 1 root root        358 9月  23 10:29 all_results.json
drwxr-xr-x 3 root root       4096 9月  19 09:59 checkpoint-1000
drwxr-xr-x 3 root root       4096 9月  19 10:05 checkpoint-1470
drwxr-xr-x 3 root root       4096 9月  13 11:02 checkpoint-489
drwxr-xr-x 3 root root       4096 9月  19 09:51 checkpoint-500
-rw-r--r-- 1 root root        731 9月  23 10:28 config.json
-rw-r--r-- 1 root root        175 9月  23 10:29 eval_results.json
-rw-r--r-- 1 root root        210 9月  23 10:28 generation_config.json
-rw-r--r-- 1 root root    1671853 9月  23 10:29 merges.txt
-rw-r--r-- 1 root root 1239173352 9月  23 10:28 model.safetensors
-rw-r--r-- 1 root root       1398 9月  23 10:29 README.md
drwxr-xr-x 2 root root        222 9月  23 10:29 runs
-rw-r--r-- 1 root root 1245334112 9月  26 11:58 Sft-620M-F16.gguf
-rw-r--r-- 1 root root        367 9月  23 10:29 special_tokens_map.json
-rw-r--r-- 1 root root       1720 9月  23 10:29 tokenizer_config.json
-rw-r--r-- 1 root root    7028230 9月  23 10:29 tokenizer.json
-rw-r--r-- 1 root root      11984 9月  23 10:28 trainer_log.jsonl
-rw-r--r-- 1 root root       9284 9月  23 10:29 trainer_state.json
-rw-r--r-- 1 root root       6584 9月  23 10:29 training_args.bin
-rw-r--r-- 1 root root      38333 9月  19 10:06 training_eval_loss.png
-rw-r--r-- 1 root root      37022 9月  23 10:29 training_loss.png
-rw-r--r-- 1 root root        218 9月  23 10:29 train_results.json
-rw-r--r-- 1 root root    2776833 9月  23 10:29 vocab.json
[root@server3 sft]# mv Sft-620M-F16.gguf qwen-sft-620M-F16.gguf

安装Ollama

下载ollama源码包到服务器，解压到工作目录
[root@server3 AIGC]# tar -C /usr -xzf ollama-linux-amd64.tgz
通过命令行方式启动ollama服务
[root@server3 AIGC]# ollama serve
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILZVS+rUG5x5wd6issBvGuj3YYzMnPUUOmVbEz4iZFCt

2024/09/26 12:04:20 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-09-26T12:04:20.753+02:00 level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-09-26T12:04:20.754+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-26T12:04:20.754+02:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.3.12)"
time=2024-09-26T12:04:20.755+02:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama316805737/runners
time=2024-09-26T12:04:39.145+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
time=2024-09-26T12:04:39.145+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-09-26T12:04:39.283+02:00 level=INFO source=types.go:107 msg="inference compute" id=GPU-2d337ad0-020d-0464-2d00-715b0d00c7ba library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4060 Ti" total="15.7 GiB" available="15.6 GiB"

注册模型

打开一个新的terminal

[root@server3 AIGC]# cd LLaMA-Factory-0.8.3/asterun/
[root@server3 asterun]# ll
总用量 4
-rw-r--r-- 1 root root 817 9月  19 09:33 qwen_full_sft_ds2.yaml
drwxr-xr-x 3 root root  18 9月  13 10:28 saves
创建模型的Modelfile文件
[root@server3 asterun]# touch qwen_full_sft_ds2.ollama.Modelfile
[root@server3 asterun]# vim qwen_full_sft_ds2.ollama.Modelfile 
[root@server3 asterun]# cat qwen_full_sft_ds2.ollama.Modelfile 
FROM /home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft/qwen-sft-620M-F16.gguf
[root@server3 asterun]# cd ../..
使用Modelfile注册模型
[root@server3 AIGC]# ollama create qwen-full-sft -f ./LLaMA-Factory-0.8.3/asterun/qwen_full_sft_ds2.ollama.Modelfile 
transferring model data 100% 
using existing layer sha256:19d794be57081c1a5aa7e03c4045a0fdc5b8a40f080f0c550ab38033cf0d5d58 
creating new layer sha256:c33681b055686143e7d6e0bb0f1054c9910c05c3f4ab16932fbc567a8961929a 
writing manifest 
success 
[root@server3 AIGC]#

推理测试

使用注册好的模型运行推理服务
[root@server3 AIGC]# ollama run qwen-full-sft
>>> who are you?
<|im_end|>
我是 {{name}}，一个由 {{author}} 开发的人工智能助手，我可以帮助用户查询信息、安排日程、提供建议等。

>>> can you speak english?
 I am an AI assistant developed by {{author}}.

>>> 好吧，用中文交流吧。
没问题。

>>> 你喜欢中国哪个城市？
每个城市都有其独特的魅力，各具特色，比如：
成都：美食之都，生活悠闲。
北京：历史悠久，文化丰富。
杭州：风景优美，以西湖闻名。
上海：现代化大都市，经济繁荣。

>>> 感谢，再见
好的，我是个人工智能助手，很高兴见到您。

>>> exit
[root@server3 AIGC]#

至此，已完成分布式计算环境的搭建与测试。

4 部署与使用相关Q&A

问题1：

使用如下参数单机运行nccl-test测试任务，会提示“No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port.”，测试任务能够正常进行下去，暂不清楚会有什么影响。

[root@server3 ~]# /home/lichao/opt/openmpi/bin/mpirun --allow-run-as-root -np 1 /home/lichao/AIGC/nccl-tests/build/all_reduce_perf -b 512 -e 8G -f 2 -g 1
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           server3
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
# nThread 1 nGpus 1 minBytes 512 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid   8080 on    server3 device  0 [0x02] NVIDIA GeForce RTX 4060 Ti
#
# Reducing maxBytes to 5261099008 due to memory limitation
#
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
         512           128     float     sum      -1     3.77    0.14    0.00      0     0.34    1.50    0.00      0
        1024           256     float     sum      -1     3.96    0.26    0.00      0     0.34    3.04    0.00      0
        2048           512     float     sum      -1     3.63    0.56    0.00      0     0.34    6.03    0.00      0
        4096          1024     float     sum      -1     3.63    1.13    0.00      0     0.34   12.06    0.00      0
        8192          2048     float     sum      -1     3.65    2.25    0.00      0     0.34   24.17    0.00      0
       16384          4096     float     sum      -1     3.63    4.51    0.00      0     0.34   48.23    0.00      0
       32768          8192     float     sum      -1     3.61    9.08    0.00      0     0.34   97.21    0.00      0
       65536         16384     float     sum      -1     3.60   18.18    0.00      0     0.34  193.52    0.00      0
      131072         32768     float     sum      -1     3.67   35.72    0.00      0     0.34  389.86    0.00      0
      262144         65536     float     sum      -1     3.66   71.54    0.00      0     0.35  757.97    0.00      0
      524288        131072     float     sum      -1     4.38  119.60    0.00      0     0.34  1542.25    0.00      0
     1048576        262144     float     sum      -1     6.66  157.41    0.00      0     0.33  3164.08    0.00      0
     2097152        524288     float     sum      -1    15.73  133.29    0.00      0     0.34  6233.18    0.00      0
     4194304       1048576     float     sum      -1    31.38  133.66    0.00      0     0.34  12457.10    0.00      0
     8388608       2097152     float     sum      -1    65.34  128.37    0.00      0     0.34  24467.28    0.00      0
    16777216       4194304     float     sum      -1    132.4  126.70    0.00      0     0.34  49156.80    0.00      0
    33554432       8388608     float     sum      -1    275.5  121.81    0.00      0     0.34  99258.78    0.00      0
    67108864      16777216     float     sum      -1    549.5  122.13    0.00      0     0.34  199728.76    0.00      0
   134217728      33554432     float     sum      -1   1101.8  121.81    0.00      0     0.34  398863.98    0.00      0
   268435456      67108864     float     sum      -1   2203.6  121.81    0.00      0     0.34  785128.56    0.00      0
   536870912     134217728     float     sum      -1   4414.9  121.60    0.00      0     0.34  1567735.18    0.00      0
  1073741824     268435456     float     sum      -1   8819.1  121.75    0.00      0     0.34  3121342.51    0.00      0
  2147483648     536870912     float     sum      -1    17639  121.75    0.00      0     0.35  6218281.88    0.00      0
  4294967296    1073741824     float     sum      -1    35280  121.74    0.00      0     0.30  14144466.64    0.00      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 0 
#

[server3:08076] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[server3:08076] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[root@server3 ~]#

原因分析/解决方法

在mpirun命令行中，增加参数“-mca btl ‘^openib’”指定BTL的value为’^openib’，可解决。

[root@server3 ~]# /home/lichao/opt/openmpi/bin/mpirun --allow-run-as-root -np 1 -mca btl '^openib' /home/lichao/AIGC/nccl-tests/build/all_reduce_perf -b 512 -e 8G -f 2 -g 1
# nThread 1 nGpus 1 minBytes 512 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid   8106 on    server3 device  0 [0x02] NVIDIA GeForce RTX 4060 Ti
#
# Reducing maxBytes to 5261099008 due to memory limitation
#
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
         512           128     float     sum      -1     3.43    0.15    0.00      0     0.31    1.64    0.00      0
        1024           256     float     sum      -1     6.29    0.16    0.00      0     0.30    3.39    0.00      0
        2048           512     float     sum      -1     4.07    0.50    0.00      0     0.32    6.36    0.00      0
        4096          1024     float     sum      -1     4.00    1.02    0.00      0     0.33   12.59    0.00      0
        8192          2048     float     sum      -1     3.97    2.06    0.00      0     0.32   25.24    0.00      0
       16384          4096     float     sum      -1     3.97    4.13    0.00      0     0.30   54.30    0.00      0
       32768          8192     float     sum      -1     4.00    8.20    0.00      0     0.30  108.49    0.00      0
       65536         16384     float     sum      -1     3.94   16.64    0.00      0     0.30  215.22    0.00      0
      131072         32768     float     sum      -1     4.64   28.23    0.00      0     0.31  424.32    0.00      0
      262144         65536     float     sum      -1     4.12   63.65    0.00      0     0.31  848.09    0.00      0
      524288        131072     float     sum      -1     4.36  120.27    0.00      0     0.30  1719.26    0.00      0
     1048576        262144     float     sum      -1     6.44  162.86    0.00      0     0.30  3451.53    0.00      0
     2097152        524288     float     sum      -1    15.74  133.21    0.00      0     0.30  6880.42    0.00      0
     4194304       1048576     float     sum      -1    31.58  132.83    0.00      0     0.31  13688.98    0.00      0
     8388608       2097152     float     sum      -1    64.95  129.15    0.00      0     0.30  27799.86    0.00      0
    16777216       4194304     float     sum      -1    132.0  127.09    0.00      0     0.30  55849.59    0.00      0
    33554432       8388608     float     sum      -1    274.4  122.29    0.00      0     0.31  109834.47    0.00      0
    67108864      16777216     float     sum      -1    550.3  121.94    0.00      0     0.31  218845.15    0.00      0
   134217728      33554432     float     sum      -1   1101.1  121.89    0.00      0     0.31  439409.82    0.00      0
   268435456      67108864     float     sum      -1   2204.8  121.75    0.00      0     0.31  867459.87    0.00      0
   536870912     134217728     float     sum      -1   4411.4  121.70    0.00      0     0.31  1728774.47    0.00      0
  1073741824     268435456     float     sum      -1   8822.3  121.71    0.00      0     0.31  3515278.52    0.00      0
  2147483648     536870912     float     sum      -1    17639  121.75    0.00      0     0.31  6842388.56    0.00      0
  4294967296    1073741824     float     sum      -1    35284  121.73    0.00      0     0.31  13942435.63    0.00      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 0 
#

[root@server3 ~]#

参考文档：

https://www.open-mpi.org/video/internals/Sandia_BrianBarrett-1up.pdf

https://github.com/open-mpi/ompi/issues/11063

https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php

问题2：

三节点运行多机nccl-test，会提示路由相关的错误，卡在初始阶段无法继续进行。

[root@server1 lichao]# ./run_nccl-test.sh 
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           server1
  Local device:         mlx5_1
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
[1716789553.453110] [server1:7255 :0]            sock.c:325  UCX  ERROR   connect(fd=54, dest_addr=200.200.0.2:49112) failed: No route to host

原因分析/解决方法

排查三个节点上的网络配置，发现是server3多启用了一个mlnx接口并配置了200.200.0.0网段的地址，用于nccl-test的IP地址段是172.16.0.0，所以导致任务初始化阶段在server1和2上找不到200的路由进而通信测试失败。

添加参数指定网口“-x NCCL_SOCKET_IFNAME=ens11f1 -x NCCL_IB_HCA=mlx5_1:1”，不能解决，仍旧提示无法找到200网段的路由。最终关闭ens11f0接口，重新测试，恢复正常。

[root@server3 ~]# ibdev2netdev 
mlx5_0 port 1 ==> ens11f0 (Up)
mlx5_1 port 1 ==> ens11f1 (Up)
[root@server3 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ac:1f:6b:dd:1b:f2 brd ff:ff:ff:ff:ff:ff
    inet 10.230.1.13/24 brd 10.230.1.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:fedd:1bf2/64 scope link 
       valid_lft forever preferred_lft forever
3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ac:1f:6b:dd:1b:f3 brd ff:ff:ff:ff:ff:ff
6: ens11f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b8:59:9f:3b:57:b6 brd ff:ff:ff:ff:ff:ff
    inet 200.200.0.2/30 brd 200.200.0.3 scope global ens11f0
       valid_lft forever preferred_lft forever
    inet6 fe80::ba59:9fff:fe3b:57b6/64 scope link 
       valid_lft forever preferred_lft forever
7: ens11f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b8:59:9f:3b:57:b7 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.13/24 brd 172.16.0.255 scope global ens11f1
       valid_lft forever preferred_lft forever
    inet6 fe80::ba59:9fff:fe3b:57b7/64 scope link 
       valid_lft forever preferred_lft forever
[root@server3 ~]#

问题3：

提示“NET/Plugin: No plugin found (libnccl-net.so)”。

server1:41185:41185 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1
server1:41185:41185 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.11<0>
server1:41185:41185 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
server1:41185:41185 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
server1:41185:41185 [0] NCCL INFO NET/Plugin: Using internal network plugin.
server1:41185:41185 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.21.5+cuda12.4

原因分析/解决方法

这个是正常行为，因为 NCCL 中新增了外部网络插件支持。它允许第三方厂商创建自己的外部网络传输插件供 NCCL 使用，例如：https://github.com/aws/aws-ofi-nccl。这个提示是不影响正常运行的。

在该消息之后，会看到另一条 INFO 消息“NET/Plugin: Using internal network plugin”，这表示 NCCL 已退回到使用其内部网络传输的状态。

参考文档：

https://github.com/NVIDIA/nccl/issues/162。

问题4：

GPU驱动和相关加速库安装好后，nvidia工具和nccl-test集合通信测试一切正常，但是重启服务器后，运行nvidia-smi提示驱动/库的版本不匹配。

[root@server3 ~]# nvidia-smi 
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.67
[root@server3 ~]#

原因分析/解决方法

按照工具给出的错误提示，应该就是某个组件，在后续安装其他应用时，被覆盖了版本。

逐一排查，发现GPU驱动确实存在一个通过yum安装的版本“nvidia-driver-latest-dkms-NVML 550.54.15-1.el7”，和提示的版本不匹配“NVML library version: 550.67”。删除后重新通过二进制包安装驱动，恢复正常。

[root@server3 ~]# yum remove nvidia* libnvidia*
已加载插件：fastestmirror, nvidia
参数 libnvidia* 没有匹配
正在解决依赖关系
--> 正在检查事务
---> 软件包 nvidia-driver-latest-dkms.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-driver-latest-dkms-NVML.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-driver-latest-dkms-NvFBCOpenGL.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-driver-latest-dkms-cuda.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-driver-latest-dkms-cuda-libs.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-driver-latest-dkms-devel.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-driver-latest-dkms-libs.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-kmod-common.x86_64.3.550.54.15-1.el7 将被 删除
--> 正在处理依赖关系 nvidia-kmod-common = 3:550.54.15，它被软件包 3:kmod-nvidia-open-dkms-550.54.15-1.el7.x86_64 需要
--> 正在处理依赖关系 nvidia-kmod-common = 3:550.54.15，它被软件包 3:kmod-nvidia-open-dkms-550.54.15-1.el7.x86_64 需要
---> 软件包 nvidia-modprobe-latest-dkms.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-persistenced-latest-dkms.x86_64.3.550.54.15-1.el7 将被 删除
---> 软件包 nvidia-xconfig-latest-dkms.x86_64.3.550.54.15-1.el7 将被 删除
--> 正在检查事务
---> 软件包 kmod-nvidia-open-dkms.x86_64.3.550.54.15-1.el7 将被 删除
--> 解决依赖关系完成

依赖关系解决

==========================================================================================================================================================
 Package                                               架构                   版本                               源                                  大小
==========================================================================================================================================================
正在删除:
 nvidia-driver-latest-dkms                             x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 175 M
 nvidia-driver-latest-dkms-NVML                        x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 2.0 M
 nvidia-driver-latest-dkms-NvFBCOpenGL                 x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 135 k
 nvidia-driver-latest-dkms-cuda                        x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 1.3 M
 nvidia-driver-latest-dkms-cuda-libs                   x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 222 M
 nvidia-driver-latest-dkms-devel                       x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 0.0  
 nvidia-driver-latest-dkms-libs                        x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 305 M
 nvidia-kmod-common                                    x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 1.3 k
 nvidia-modprobe-latest-dkms                           x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                  70 k
 nvidia-persistenced-latest-dkms                       x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                  65 k
 nvidia-xconfig-latest-dkms                            x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                 222 k
为依赖而移除:
 kmod-nvidia-open-dkms                                 x86_64                 3:550.54.15-1.el7                  @cuda-rhel7-x86_64                  21 M

事务概要
==========================================================================================================================================================
移除  11 软件包 (+1 依赖软件包)

安装大小：727 M
是否继续？[y/N]：y

[root@server3 ~]# cd /home/lichao/AIGC/
[root@server3 AIGC]# sh NVIDIA-Linux-x86_64-550.67.run 
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.67........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
[root@server3 AIGC]# nvidia-smi
Thu May 16 09:28:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:02:00.0 Off |                  N/A |
|  0%   36C    P8              5W /  165W |       2MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
[root@server3 AIGC]#

功能验证：AsterNOS支持PXE与MC-LAG的Fallback机制

PXE简介
PXE与SONiC LAG Fallback
PXE与AsterNOS MC-LAG Fallback
AsterNOS的MC-LAG Fallback功能验证

1.PXE简介

PXE（Preboot Execution Environment）是一种网络启动协议，它允许计算机通过网络从远程服务器上获取操作系统镜像并进行安装。PXE装机的基本原理是在服务器启动时，通过网络请求IP地址和PXE相关配置信息。然后，服务器通过TFTP（Trivial File Transfer Protocol）从PXE服务器下载启动文件，启动文件负责进一步的操作系统安装过程。最终，操作系统镜像文件通过网络传输到服务器，并完成操作系统的安装。

2.PXE与SONiC LAG Fallback

在服务器通过PXE启动的过程中，是无操作系统的状态，无法和交换机之间建立LAG连接、无法发送LACP报文，此时交换机的LAG成员端口都是Inactive状态，也就不会转发DHCP Discover广播报文，PXE流程也就无法继续进行下去。

SONiC LAG Fallback就是解决这个问题的，通过对LAG开启Fallback配置，使其在没有收到LACP报文的情况下，LAG组中的一个成员口会被设为Active状态，使得PXE启动过程能顺利完成。收到LACP后会自动退出Fallback状态。

3.PXE与AsterNOS MC-LAG Fallback

MC-LAG（Multi Chassis Link Aggregation Group，跨设备链路聚合组）是一种实现跨设备链路聚合的机制，通过将一台设备与另外两台设备进行跨设备链路聚合，保留了普通链路聚合的所有优点，同时提供了设备级别的冗余。

MC-LAG将两台物理设备虚拟成单台逻辑设备，这台虚拟出来的“单个设备”与其相连的上行或下行设备实现“一对一”链路聚合。因此，在MC-LAG场景中也会存在LAG场景下PXE装机遇到的问题，AsterNOS目前在LAG和MC-LAG场景都已经支持了Fallback功能。

4.AsterNOS的MC-LAG Fallback功能验证

在Centos76-1上完成mode4 bond配置、DHCP Server配置：

[root@server1 dhcp]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 0c:0a:0e:54:00:01
Active Aggregator Info:
        Aggregator ID: 3
        Number of ports: 2
        Actor Key: 9
        Partner Key: 0
        Partner Mac Address: 52:54:00:12:34:56

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:0a:0e:54:00:01
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 0c:0a:0e:54:00:01
    port key: 9
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 52:54:00:12:34:56
    oper key: 0
    port priority: 255
    port number: 2
    port state: 63

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:0a:0e:54:00:02
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 0c:0a:0e:54:00:01
    port key: 9
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 52:54:00:12:34:56
    oper key: 0
    port priority: 255
    port number: 2
    port state: 63
[root@server1 dhcp]# 
[root@server1 dhcp]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:0a:0e:54:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.240.3.121/24 brd 10.240.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::e0a:eff:fe54:0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 0c:0a:0e:54:00:01 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 0c:0a:0e:54:00:01 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:0a:0e:54:00:03 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0c:0a:0e:54:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.1/24 brd 172.16.10.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::e0a:eff:fe54:1/64 scope link 
       valid_lft forever preferred_lft forever
[root@server1 dhcp]# cat dhcpd.conf 
#
# DHCP Server Configuration file.
#   see /usr/share/doc/dhcp*/dhcpd.conf.example
#   see dhcpd.conf(5) man page
#

subnet 172.16.10.0 netmask 255.255.255.0 {
  range 172.16.10.100 172.16.10.200;
  #option routers 172.16.10.254;
  #option domain-name-servers 223.5.5.5;
}
[root@server1 dhcp]#

在Leaf1和2上完成MC-LAG配置，并确认状态正常

leaf1# show mclag state 
The MCLAG's keepalive is: OK
MCLAG info sync is: completed
Domain id: 1
MCLAG session Channel: Primary channel
VRF Name: default
consistency Check Action: idle
Local Ip: 12.12.12.1
Peer Ip: 12.12.12.2
Dad Local Ip: 
Dad Peer Ip: 
Peer Link Interface: lag 99
Keepalive time: 1
Dad Detection Delay: 15
Dad Recovery Delay Mlag Intf: 60
Dad Recovery Delay Non Mlag Intf: 0
Dad VRF Name: default
Dad Status: disable
session Timeout : 15
Peer Link Mac: 52:54:00:12:34:56 
Admin Role: None
Role: Active
MCLAG Interface: lag 2,lag 1
Loglevel: NOTICE   
leaf1# show link-aggregation summary
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Dw)  0/2      (D)   N/A
 0099  lag 99           LACP(A)(Up)  0/9      (S)   N/A
                                     0/8      (S)
leaf1# 

leaf2# show mclag state 
The MCLAG's keepalive is: OK
MCLAG info sync is: completed
Domain id: 1
MCLAG session Channel: Primary channel
VRF Name: default
consistency Check Action: idle
Local Ip: 12.12.12.2
Peer Ip: 12.12.12.1
Dad Local Ip: 
Dad Peer Ip: 
Peer Link Interface: lag 99
Keepalive time: 1
Dad Detection Delay: 15
Dad Recovery Delay Mlag Intf: 60
Dad Recovery Delay Non Mlag Intf: 0
Dad VRF Name: default
Dad Status: disable
session Timeout : 15
Peer Link Mac: 52:54:00:12:34:57 
Admin Role: None
Role: Standby
MCLAG Interface: lag 2,lag 1
Loglevel: NOTICE
leaf2# show link-aggregation summary
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Dw)  0/2      (D)   N/A
 0099  lag 99           LACP(A)(Up)  0/9      (S)   N/A
                                     0/8      (S)
leaf2#

在Centos76-2的两个业务口上，通过DHCP无法获取IP地址

[root@server2 ~]# ifup eth1

正在确定 eth1 的 IP 信息... 完成。
[root@server2 ~]# ifup eth2

正在确定 eth2 的 IP 信息... 完成。
[root@server2 network-scripts]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.240.3.122/24 brd 10.240.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ea8:80ff:fe2f:0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ea8:80ff:fe2f:1/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ea8:80ff:fe2f:2/64 scope link 
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:03 brd ff:ff:ff:ff:ff:ff
[root@server2 network-scripts]#

在Leaf1和2上的LAG2接口组上，启用Fallback功能，AsterNOS会暂时保持一侧端口被激活，在收到LACP协商报文后恢复动态聚合模式

leaf1# show link-aggregation summary 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Dw)  0/2      (D)   N/A
 0099  lag 99           LACP(A)(Up)  0/8      (S)   N/A
                                     0/9      (S)
启用Fallback：
leaf1# configure terminal 
leaf1(config)# interface link-aggregation 2
leaf1(config-lagif-2)# show this
!
interface link-aggregation 2
 lacp fallback
 lacp fast-rate
 commit
 switchport access vlan 512
leaf1(config-lagif-2)# end
leaf1# show link-aggregation summary 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Up)  0/2      (S)   N/A
 0099  lag 99           LACP(A)(Up)  0/9      (S)   N/A
                                     0/8      (S)
leaf1# 


leaf2# show link-aggregation summary 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Dw)  0/2      (D)   N/A
 0099  lag 99           LACP(A)(Up)  0/8      (S)   N/A
                                     0/9      (S)
启用Fallback：
leaf2# configure terminal 
leaf2(config)# interface link-aggregation 2
leaf2(config-lagif-2)# show this
!
interface link-aggregation 2
 lacp fallback
 lacp fast-rate
 commit
 switchport access vlan 512
leaf2(config-lagif-2)# end
leaf2# show link-aggregation summary 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Dw)  0/2      (D)   N/A
 0099  lag 99           LACP(A)(Up)  0/8      (S)   N/A
                                     0/9      (S)
leaf2#

在Centos76-2的两个业务口上，其中一个口能通过DHCP获取到IP地址

[root@server2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.240.3.122/24 brd 10.240.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ea8:80ff:fe2f:0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:02 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:03 brd ff:ff:ff:ff:ff:ff
[root@server2 ~]# ifup eth1

正在确定 eth1 的 IP 信息... 完成。
[root@server2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.240.3.122/24 brd 10.240.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ea8:80ff:fe2f:0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.100/24 brd 172.16.10.255 scope global dynamic eth1
       valid_lft 43197sec preferred_lft 43197sec
    inet6 fe80::ea8:80ff:fe2f:1/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:02 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:03 brd ff:ff:ff:ff:ff:ff
[root@server2 ~]# ifup eth2

正在确定 eth2 的 IP 信息... 完成。
[root@server2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.240.3.122/24 brd 10.240.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ea8:80ff:fe2f:0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.100/24 brd 172.16.10.255 scope global dynamic eth1
       valid_lft 42370sec preferred_lft 42370sec
    inet6 fe80::ea8:80ff:fe2f:1/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ea8:80ff:fe2f:2/64 scope link 
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:03 brd ff:ff:ff:ff:ff:ff
[root@server2 ~]#

在DHCP Server上能看到租约信息

[root@server1 dhcp]# cat /var/lib/dhcpd/dhcpd.leases
# The format of this file is documented in the dhcpd.leases(5) manual page.
# This lease file was written by isc-dhcp-4.2.5

server-duid "\000\001\000\001.,\333g\014\012\016T\000\001";

lease 172.16.10.100 {
  starts 5 2024/07/19 08:08:19;
  ends 5 2024/07/19 20:08:19;
  cltt 5 2024/07/19 08:08:19;
  binding state active;
  next binding state free;
  rewind binding state free;
  hardware ethernet 0c:a8:80:2f:00:01;
  client-hostname "server2";
}
[root@server1 dhcp]# systemctl status dhcpd
● dhcpd.service - DHCPv4 Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; disabled; vendor preset: disabled)
   Active: active (running) since 五 2024-07-19 08:11:09 UTC; 1h 13min ago
     Docs: man:dhcpd(8)
           man:dhcpd.conf(5)
 Main PID: 4036 (dhcpd)
   Status: "Dispatching packets..."
   CGroup: /system.slice/dhcpd.service
           └─4036 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid

7月 19 08:11:09 server1 dhcpd[4036]: 
7月 19 08:11:09 server1 dhcpd[4036]: No subnet declaration for eth0 (10.240.3.121).
7月 19 08:11:09 server1 dhcpd[4036]: ** Ignoring requests on eth0.  If this is not what
7月 19 08:11:09 server1 dhcpd[4036]:    you want, please write a subnet declaration
7月 19 08:11:09 server1 dhcpd[4036]:    in your dhcpd.conf file for the network segment
7月 19 08:11:09 server1 dhcpd[4036]:    to which interface eth0 is attached. **
7月 19 08:11:09 server1 dhcpd[4036]: 
7月 19 08:11:09 server1 dhcpd[4036]: Sending on   Socket/fallback/fallback-net
7月 19 08:11:58 server1 dhcpd[4036]: DHCPREQUEST for 172.16.10.100 from 0c:a8:80:2f:00:01 (server2) via bond0
7月 19 08:11:58 server1 dhcpd[4036]: DHCPACK on 172.16.10.100 to 0c:a8:80:2f:00:01 (server2) via bond0
[root@server1 dhcp]#

在Centos76-2上对两个业务口做bond，观察到Leaf1和2上LAG2的成员口都进入Active状态，Fallback功能生效，LAG2恢复动态聚合模式

[root@server2 network-scripts]# ifup bond0
[root@server2 network-scripts]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 0c:a8:80:2f:00:01
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 2
        Actor Key: 9
        Partner Key: 0
        Partner Mac Address: 52:54:00:12:34:56

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:a8:80:2f:00:02
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 0c:a8:80:2f:00:01
    port key: 9
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 52:54:00:12:34:56
    oper key: 0
    port priority: 255
    port number: 3
    port state: 63

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:a8:80:2f:00:01
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 0c:a8:80:2f:00:01
    port key: 9
    port priority: 255
    port number: 3
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 52:54:00:12:34:56
    oper key: 0
    port priority: 255
    port number: 3
    port state: 63
[root@server2 network-scripts]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.240.3.122/24 brd 10.240.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ea8:80ff:fe2f:0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:a8:80:2f:00:03 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0c:a8:80:2f:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.101/24 brd 172.16.10.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::ea8:80ff:fe2f:1/64 scope link 
       valid_lft forever preferred_lft forever
[root@server2 network-scripts]# ping 172.16.10.1 -c 4
PING 172.16.10.1 (172.16.10.1) 56(84) bytes of data.
64 bytes from 172.16.10.1: icmp_seq=1 ttl=64 time=5.38 ms
64 bytes from 172.16.10.1: icmp_seq=2 ttl=64 time=3.29 ms
64 bytes from 172.16.10.1: icmp_seq=3 ttl=64 time=3.97 ms
64 bytes from 172.16.10.1: icmp_seq=4 ttl=64 time=3.11 ms

--- 172.16.10.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 3.115/3.943/5.389/0.895 ms
[root@server2 network-scripts]# 

leaf1# show link-aggregation summary 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Up)  0/2      (S)   N/A
 0099  lag 99           LACP(A)(Up)  0/9      (S)   N/A
                                     0/8      (S)
leaf1# 
leaf2# show link-aggregation summary 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports          Description
-----  ---------------  -----------  -------------  -------------
 0001  lag 1            LACP(A)(Up)  0/1      (S)   N/A
 0002  lag 2            LACP(A)(Up)  0/2      (S)   N/A
 0099  lag 99           LACP(A)(Up)  0/8      (S)   N/A
                                     0/9      (S)
leaf2#

配置指导：二进制包手动部署Kubernetes 集群

1 目标与物理网络拓扑
2 硬件与软件环境
3 Kubernetes简介
4 安装步骤
- 4.1 准备环境
- 4.2 部署Etcd集群
5 结果验证
6 参考资料

Kubernetes部署方案

1 目标

K8S集群部署有几种方式：kubeadm、minikube和二进制包。前两者属于自动部署，简化部署操作。但自动部署屏蔽了很多细节，使得对各个模块感知很少，本文主要采用二进制包手动部署。

部署过程中所涉及到的设备、接口及管理网口的IP地址如下表所示：

设备名称	IP地址	组件
master	192.168.4.154	etcd kube-apiserver kube-controller-manager kube-scheduler
node1	192.168.4.155	etcd kubelet kube-proxy docker
node2	192.168.4.156	etcd kubelet kube-proxy docker

表1：设备管理口IP及组件列表

2 硬件与软件环境

部署环境中涉及到的硬件和软件如表2和表3所示：

名称	型号	硬件指标	数量	备注
服务器		CPU：至少2核内存：最低2 磁盘：不少于20G	3

表2：硬件环境

软件	版本	备注
操作系统	Centos7.6	安装时选择Compute Node 模式
Kubernetes	1.18.0
Docker	Docker-ce19.03
Etcd	3.3.11

表3：软件环境

3 Kubernetes简介

Kubernetes 是是一个基于容器技术的分布式架构领先方案。Kubernetes(k8s)是Google开源的容器集群管理系统，Kubernetes简称K8S，K8S用于容器化应用程序的部署,扩展和管理。K8S提供了容器编排,资源调度,弹性伸缩,部署管理,服务发现等一系列功能。

Kubernetes集群架构

Kubernetes核心概念

cluster

cluster是计算、存储和网络资源的集合，k8s利用这些资源运行各种基于容器的应用。

master

master是cluster的大脑，他的主要职责是调度，即决定将应用放在那里运行。master运行linux操作系统，可以是物理机或者虚拟机。为了实现高可用，可以运行多个master。

node

node的职责是运行容器应用。node由master管理，node负责监控并汇报容器的状态，同时根据master的要求管理容器的生命周期。node运行在linux的操作系统上，可以是物理机或者是虚拟机。

pod

pod是k8s的最小工作单元。每个pod包含一个或者多个容器。pod中的容器会作为一个整体被master调度到一个node上运行。

controller-manager

k8s通常不会直接创建pod,而是通过controller-manager来管理pod的。controller-manager中定义了pod的部署特性，比如有几个副本，在什么样的node上运行等。为了满足不同的业务场景，k8s提供了多种controller-manager，包括deployment、replicaset、daemonset、statefulset、job等。

1) deployment
是最常用的controller。deployment可以管理pod的多个副本，并确保pod按照期望的状态运行。

2) replicaset
实现了pod的多副本管理。使用deployment时会自动创建replicaset，也就是说deployment是通过replicaset来管理pod的多个副本的，我们通常不需要直接使用replicaset。

3) daemonset
用于每个node最多只运行一个pod副本的场景。正如其名称所示的，daemonset通常用于运行daemon。

4) statefuleset
能够保证pod的每个副本在整个生命周期中名称是不变的，而其他controller不提供这个功能。当某个pod发生故障需要删除并重新启动时，pod的名称会发生变化，同时statefulset会保证副本按照固定的顺序启动、更新或者删除。、

5) job
用于运行结束就删除的应用，而其他controller中的pod通常是长期持续运行的。

service

deployment可以部署多个副本，每个pod 都有自己的IP，外界通过service访问这些副本。

k8s的 service定义了外界访问一组特定pod的方式。service有自己的IP和端口，service为pod提供了负载均衡。
k8s运行容器pod与访问容器这两项任务分别由controller和service执行。

namespace

可以将一个物理的cluster逻辑上划分成多个虚拟cluster，每个cluster就是一个namespace。不同的namespace里的资源是完全隔离的。

lable

标签用于区分对象（比如pod、service）,键/值对存在；每个对象可以有多个标签，通过标签关联对象。

K8s的Master组件

kube-apiserver

Kubernetes API,集群的统一入口,各组件协调者,以RESTful API提供接口服务,所有对象资源的增删改查和监听操作都交给APIServer处理后再提交给Etcd存储。

kube-controller-manager

处理集群中常规后台任务,一个资源对应一个控制器，而ControllerManager就是负责管理这些控制器的

kube-scheduler

根据调度算法为新创建的Pod选择一个Node节点,可以任意部署，可以部署在同一个节点上,也可以部署在不同的节点上。

Etcd

分布式键值存储系统。用于保存集群状态数据,比如Pod、Service等对象信息。

K8s的Node组件

Kubelet

kubelet是Master在Node节点上的Agent,管理本机运行容器的生命周期，比如创建容器、Pod挂载数据卷、下载secret. 获取容器和节点状态等工作。kubelet将每个Pod转换成一组容器。

kube-proxy

在Node节点上实现Pod网络代理，维护网络规则和四层负载均衡工作。

docker或rocket

容器引擎，运行容器。

4 安装步骤

4.1 准备环境

注意：以下步骤各节点统一执行

关闭交换分区：

[root@localhost ~]#swapoff -a && sysctl -w vm.swappiness=0

注释掉开机启动交换分区：

[root@localhost ~]sed -i 's/.*swap.*/#&/g' /etc/fstab

关闭防火墙：

[root@localhost ~]systemctl stop firewalld 
[root@localhost ~] systemctl disable firewalld

禁用Selinux：

[root@localhost ~]# setenforce 0
[root@localhost ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

设置透明网桥：

[root@localhost ~]#cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf

安装Docker：

[root@localhost ~]# yum -y install yum-utils 
[root@localhost ~]# yum-config-manager --add-repo \
https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo 
[root@localhost ~]#yum install docker-ce
[root@localhost ~]#curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://bc437cce.m.daocloud.io
[root@localhost ~]# systemctl enable docker
[root@localhost ~]#systemctl start docker

同步时间：

[root@localshot  ~]# yum -y install ntp
[root@localhost  ~]# vi /etc/ntp.conf
修改server ntp1.aliyun.com iburst 
[root@localhost  ~]# systemctl restart ntpd
[root@localhost  ~]# timedatectl set-timezone Asia/Shanghai
[root@localhost  ~]#cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

修改系统打开文件最大数量：

[root@localhost  ~]# vi /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535

4.2 部署Etcd集群

下载cfssl工具并生成证书：

下载cfssl工具

[root@k8s-master ~]#wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
[root@k8s-master ~]#wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
[root@k8s-master ~]#wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
[root@k8s-master ~]#chmod +x cfssl_linux-amd64 cfssljson_linux-amd64 cfssl-certinfo_linux-amd64
[root@k8s-master ~]#mv cfssl_linux-amd64 /usr/local/bin/cfssl
[root@k8s-master ~]#mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
[root@k8s-master ~]#mv cfssl-certinfo_linux-amd64 /usr/bin/cfssl-certinfo

创建以下三个文件：

[root@k8s-master ~]#vim ca-config.json
{
  "signing": {
    "default": {
      "expiry": "87600h"
    },
    "profiles": {
      "www": {
         "expiry": "87600h",
         "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ]
      }
    }
  }
}
[root@k8s-master ~]#vim ca-csr.json
{
    "CN": "etcd CA",
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "Beijing",
            "ST": "Beijing"
        }
    ]
} 
[root@k8s-master ~]#vim server-csr.json
{
    "CN": "etcd",
    "hosts": [
    "192.168.4.154",
"192.168.4.155",
"192.168.4.156"
],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "BeiJing",
            "ST": "BeiJing"
        }
    ]
}

生成证书

[root@k8s-master ~]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
[root@k8s-master ~]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=www server-csr.json | cfssljson -bare server
[root@k8s-master ~]# ls *pem
ca-key.pem  ca.pem  server-key.pem  server.pem

安装Etcd3：

创建文件夹

[root@k8s-master ~]# mkdir /opt/etcd/{bin,cfg,ssl} -p

下载etcd相关包

[root@k8s-master ~]#wget https://github.com/etcd-io/etcd/releases/download/v3.2.12/etcd-v3.2.12-linux-amd64.tar.gz 
[root@k8s-master ~]#tar zxvf etcd-v3.2.12-linux-amd64.tar.gz 
[root@k8s-master ~]#mv etcd-v3.2.12-linux-amd64/{etcd,etcdctl} /opt/etcd/bin/

创建etcd配置⽂件

[root@k8s-master ~]#vim /opt/etcd/cfg/etcd
#[Member]
ETCD_NAME="etcd01"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://192.168.4.154:2380"
ETCD_LISTEN_CLIENT_URLS="https://192.168.4.154:2379"
#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.4.154:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.4.154:2379"
ETCD_INITIAL_CLUSTER="etcd01=https://192.168.4.154:2380,etcd02=https://192.168.4.155:2380,etcd03=https://192.168.4.156:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

ETCD_NAME 节点名称
ETCD_DATA_DIR 数据⽬录
ETCD_LISTEN_PEER_URLS 集群通信监听地址
ETCD_LISTEN_CLIENT_URLS 客户端访问监听地址
ETCD_INITIAL_ADVERTISE_PEER_URLS 集群通告地址
ETCD_ADVERTISE_CLIENT_URLS 客户端通告地址
ETCD_INITIAL_CLUSTER 集群节点地址
ETCD_INITIAL_CLUSTER_TOKEN 集群Token
ETCD_INITIAL_CLUSTER_STATE 加⼊集群的当前状态，new是新集群，existing表⽰加⼊已有集群

修改etcd开机启动

[root@k8s-master ~]# vim  /usr/lib/systemd/system/etcd.service 
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd
ExecStart=/opt/etcd/bin/etcd \
--name=${ETCD_NAME} \
--data-dir=${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER_TOKEN} \
--initial-cluster-state=new \
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/server.pem \
--peer-key-file=/opt/etcd/ssl/server-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

把刚才生成的证书拷贝到配置文件中的位置：

[root@k8s-master ~]# cp ca*pem server*pem /opt/etcd/ssl

启动etcd并设置开机启动：

[root@k8s-master ~]# systemctl enable etcd
[root@k8s-master ~]# systemctl start etcd

注：三台机器的etcd要同时启动，否则会失败。

检查etcd集群状态：

[root@k8s-master ~]# /opt/etcd/bin/etcdctl \
--ca-file=/opt/etcd/ssl/ca.pem --cert-file=/opt/etcd/ssl/server.pem  \
--key-file=/opt/etcd/ssl/server-key.pem  \
--endpoints="https://192.168.4.154:2379,https://192.168.4.155:2379,https://192.168.4.156:2379"  \
cluster-health

4.3 安装运行Master节点组件

下载组件：

[root@ k8s-master ~]#wget https://dl.k8s.io/v1.18.8/kubernetes-server-linux-amd64.tar.gz

解压组件包：

[root@k8s-master ~]# tar zxvf kubernetes-server-linux-amd64.tar.gz
[root@k8s-master ~]#mkdir /opt/kubernetes/{bin,cfg,ssl,logs}  -p
[root@k8s-master ~]#cp kubernetes/server/bin/{kube-apiserver,kube-scheduler,kube-controller-manager,kubectl,kubelet}  /opt/kubernetes/bin

生成证书

创建CA证书：

[root@k8s-master ~]# vim  ca-config.json                 
{
  "signing": {
    "default": {
      "expiry": "87600h"
    },
    "profiles": {
      "kubernetes": {
         "expiry": "87600h",
         "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ]
      }
    }
  }
}

[root@k8s-master ~]#vim ca-csr.json 
{
    "CN": "kubernetes",
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "Beijing",
            "ST": "Beijing",
            "O": "k8s",
            "OU": "System"
        }
    ]
}
[root@k8s-master ~]#cfssl gencert -initca ca-csr.json | cfssljson -bare ca –

生成apiserver证书：

[root@k8s-master ~]#vim server-csr.json
{
    "CN": "kubernetes",
    "hosts": [
      "10.0.0.1",//这是后面dns要使用的虚拟网络的网关，不用改，就用这个 切忌(删除这行)
      "127.0.0.1",
      "192.168.4.154",
      "192.168.4.155",
      "192.168.4.156",
      "kubernetes",
      "kubernetes.default",
      "kubernetes.default.svc",
      "kubernetes.default.svc.cluster",
      "kubernetes.default.svc.cluster.local"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "BeiJing",
            "ST": "BeiJing",
            "O": "k8s",
            "OU": "System"
        }
    ]
}
[root@k8s-master ~]#cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes server-csr.json | cfssljson -bare server

生成kube-proxy证书：

[root@k8s-master ~]#vim  kube-proxy-csr.json 
{
  "CN": "system:kube-proxy",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "BeiJing",
      "ST": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}
[root@k8s-master ~]#cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy

最终生成以下证书文件：

[root@k8s-master ~]## ls *pem
ca-key.pem  ca.pem  kube-proxy-key.pem  kube-proxy.pem  server-key.pem  server.pem

如果有多个master，需要将证书拷贝到所有的 master节点：

[root@k8s-master ~]#scp server.pem  server-key.pem ca.pem ca-key.pem k8s-master2:/opt/kubernetes/ssl/

创建token文件：

#生成随机token
[root@k8s-master ~]#head -c 16 /dev/urandom | od -An -t x | tr -d ' '
[root@k8s-master ~]#cat << EOF >/opt/kubernetes/cfg/token.csv 
79d370bf4b3e1bda79087504d34b9e5d,kubelet-bootstrap,10001,"system:kubelet-bootstrap"
EOF
第一列：随机字符串，自己可生成，第二列：用户名，第三列：UID，第四列：用户组

配置apiserver

创建配置文件：

[root@k8s-master~]# vim  /opt/kubernetes/cfg/kube-apiserver 
KUBE_APISERVER_OPTS="--logtostderr=true \
--v=4 \
--log-dir=/opt/kubernetes/logs \
--etcd-servers=https://192.168.4.154:2379,https://192.168.4.155:2379,https://192.168.4.156:2379 \
--bind-address=192.168.4.154 \
--secure-port=6443 \
--advertise-address=192.168.4.154 \
--allow-privileged=true \
--service-cluster-ip-range=10.0.0.0/24 \
--enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota,NodeRestriction \
--authorization-mode=RBAC,Node \
--enable-bootstrap-token-auth \
--token-auth-file=/opt/kubernetes/cfg/token.csv \
--service-node-port-range=30000-50000 \
--tls-cert-file=/opt/kubernetes/ssl/server.pem  \
--tls-private-key-file=/opt/kubernetes/ssl/server-key.pem \
--client-ca-file=/opt/kubernetes/ssl/ca.pem \
--service-account-key-file=/opt/kubernetes/ssl/ca-key.pem \
--etcd-cafile=/opt/etcd/ssl/ca.pem \
--etcd-certfile=/opt/etcd/ssl/server.pem \
--etcd-keyfile=/opt/etcd/ssl/server-key.pem"


配置好前面生成的证书，确保能连接etcd。
参数说明：
* --logtostderr 启用日志
* --v 日志等级
* --etcd-servers etcd集群地址
* --bind-address 监听地址
* --secure-port https安全端口
* --advertise-address 集群通告地址
* --allow-privileged 启用授权
* --service-cluster-ip-range Service虚拟IP地址段
* --enable-admission-plugins 准入控制模块
* --authorization-mode 认证授权，启用RBAC授权和节点自管理
* --enable-bootstrap-token-auth 启用TLS bootstrap功能，后面会讲到
* --token-auth-file token文件
* --service-node-port-range Service Node类型默认分配端口范围

创建systemd服务文件：

[root@k8s-master~]#vim  /usr/lib/systemd/system/kube-apiserver.service 
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes

[Service]
EnvironmentFile=-/opt/kubernetes/cfg/kube-apiserver
ExecStart=/opt/kubernetes/bin/kube-apiserver $KUBE_APISERVER_OPTS
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动服务，并设置开机启动：

[root@k8s-master~]#systemctl daemon-reload
[root@k8s-master~]#systemctl enable kube-apiserver
[root@k8s-master~]#systemctl restart kube-apiserver

注意：apiserver默认支持etcd3，如果是etcd2，需启动时指定版本选项–storage-backend=etcd2

配置scheduler

创建配置文件：

[root@k8s-master~]#cat <<  EOF >/opt/kubernetes/cfg/kube-scheduler 
KUBE_SCHEDULER_OPTS="--logtostderr=true \
--v=4 \
--log-dir=/opt/kubernetes/logs \
--master=127.0.0.1:8080 \
--leader-elect"
EOF

参数说明：
* --master 连接本地apiserver
* --leader-elect 当该组件启动多个时，自动选举（HA）

创建systemd服务文件：

[root@k8s-master~]#vim /usr/lib/systemd/system/kube-scheduler.service 
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes

[Service]
EnvironmentFile=-/opt/kubernetes/cfg/kube-scheduler
ExecStart=/opt/kubernetes/bin/kube-scheduler $KUBE_SCHEDULER_OPTS
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动服务，并设置开机启动：

[root@k8s-master~]#systemctl daemon-reload
[root@k8s-master~]#systemctl enable kube-scheduler
[root@k8s-master~]#systemctl restart kube-scheduler

配置controller-manager

创建配置文件：

[root@k8s-master~]# vim  /opt/kubernetes/cfg/kube-controller-manager 
KUBE_CONTROLLER_MANAGER_OPTS="--logtostderr=true \
--v=4 \
--log-dir=/opt/kubernetes/logs \
--master=127.0.0.1:8080 \
--leader-elect=true \
--address=127.0.0.1 \
--service-cluster-ip-range=10.0.0.0/24 \
--cluster-name=kubernetes \
--cluster-signing-cert-file=/opt/kubernetes/ssl/ca.pem \
--cluster-signing-key-file=/opt/kubernetes/ssl/ca-key.pem  \
--root-ca-file=/opt/kubernetes/ssl/ca.pem \
--service-account-private-key-file=/opt/kubernetes/ssl/ca-key.pem"

创建systemd服务文件：

[root@k8s-master~]#vim /usr/lib/systemd/system/kube-controller-manager.service 
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes

[Service]
EnvironmentFile=-/opt/kubernetes/cfg/kube-controller-manager
ExecStart=/opt/kubernetes/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_OPTS
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动服务，并设置开机启动：

[root@k8s-master~]#systemctl daemon-reload
[root@k8s-master~]#systemctl enable kube-controller-manager
[root@k8s-master~]#systemctl restart kube-controller-manager

注意：几个组件启动顺序有依赖，需要先启动etcd，再启动apiserver，其他组件无顺序要求

所有组件都已经启动成功，通过kubectl工具查看当前集群组件状态：

[root@k8s-master~]# ln -s  /opt/kubernetes/bin/kubectl  /usr/bin/
[root@k8s-master~]# kubectl get cs
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok                  
etcd-0               Healthy   {"health":"true"}   
etcd-2               Healthy   {"health":"true"}   
etcd-1               Healthy   {"health":"true"}   
controller-manager   Healthy   ok
如上输出说明组件都正常。

查看启动日志的方法

[root@k8s-master~]# journalctl -u kube-apiserver

将kubelet-bootstrap用户绑定到系统集群角色

[root@k8s-master~]#/opt/kubernetes/bin/kubectl create clusterrolebinding kubelet-bootstrap \
  --clusterrole=system:node-bootstrapper \
  --user=kubelet-bootstrap

创建kubeconfig文件:

在生成kubernetes证书的目录下执行以下命令生成kubeconfig文件：

指定apiserver 地址(如果apiserver做了负载均衡，则填写负载均衡地址)
KUBE_APISERVER="https://192.168.4.154:6443"
BOOTSTRAP_TOKEN=79d370bf4b3e1bda79087504d34b9e5d

设置集群参数

[root@k8s-master~] #/opt/kubernetes/bin/kubectl config set-cluster kubernetes \
--certificate-authority=./ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=bootstrap.kubeconfig

设置客户端认证参数

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config set-credentials kubelet-bootstrap \
  --token=${BOOTSTRAP_TOKEN} \
  --kubeconfig=bootstrap.kubeconfig

设置上下文参数

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config set-context default \
  --cluster=kubernetes \
  --user=kubelet-bootstrap \
  --kubeconfig=bootstrap.kubeconfig

设置默认上下文

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config use-context default --kubeconfig=bootstrap.kubeconfig

创建kube-proxy kubeconfig文件

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config set-cluster kubernetes \
  --certificate-authority=./ca.pem \
  --embed-certs=true \
  --server=${KUBE_APISERVER} \
  --kubeconfig=kube-proxy.kubeconfig

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config set-credentials kube-proxy \
  --client-certificate=./kube-proxy.pem \
  --client-key=./kube-proxy-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-proxy.kubeconfig

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config set-context default \
  --cluster=kubernetes \
  --user=kube-proxy \
  --kubeconfig=kube-proxy.kubeconfig

[root@k8s-master~]#/opt/kubernetes/bin/kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

[root@k8s-master~]# ls
bootstrap.kubeconfig  kube-proxy.kubeconfig

注：将这两个文件拷贝到Node节点/opt/kubernetes/cfg目录下。

4.4 安装运行Node节点组件

下载组件

[root@k8s-node01 ~]#wget https://dl.k8s.io/v1.18.8/kubernetes-client-linux-amd64.tar.gz

解压组件包：

[root@k8s-node01 ~]#tar zxvf kubernetes-client-linux-amd64.tar.gz
[root@k8s-node01 ~]#cp  kubernetes/client/bin/kubectl   /opt/kubernetes/bin/
[root@k8s-node01 ~]#mkdir -p /opt/kubernetes/{bin,cfg,ssl}
[root@k8s-node01 ~]#scp root@192.168.4.154:/root/kubernetes/server/bin/{kubelet,kube-proxy} /opt/kubernetes/bin/

配置kubelet

创建kubelet配置文件：

[root@k8s-node01 ~]# cat << EOF >/opt/kubernetes/cfg/kubelet 
KUBELET_OPTS="--logtostderr=true \
--v=4 \
--log-dir=/opt/kubernetes/logs \
--hostname-override=192.168.4.155 \
--kubeconfig=/opt/kubernetes/cfg/kubelet.kubeconfig \
--bootstrap-kubeconfig=/opt/kubernetes/cfg/bootstrap.kubeconfig \
--config=/opt/kubernetes/cfg/kubelet.config \
--cert-dir=/opt/kubernetes/ssl  \
--network-plugin=cni \
--pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0"
EOF

参数说明：
* --hostname-override 在集群中显示的主机名
* --kubeconfig 指定kubeconfig文件位置，会自动生成
* --bootstrap-kubeconfig 指定刚才生成的bootstrap.kubeconfig文件
* --cert-dir 颁发证书存放位置
* --pod-infra-container-image 管理Pod网络的镜像

创建kube.config配置文件：

[root@ k8s-node01 ~]# cat << EOF > /opt/kubernetes/cfg/kubelet.config  
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 192.168.4.155
port: 10250
readOnlyPort: 10255
cgroupDriver: cgroupfs
clusterDNS: ["10.0.0.2"]
clusterDomain: cluster.local.
failSwapOn: false
authentication:
  anonymous:
    enabled: true 
  webhook:
    enabled: false
EOF

创建systemd服务文件：

[root@k8s-node01 ~]#vim /usr/lib/systemd/system/kubelet.service  
[Unit]
Description=Kubernetes Kubelet
After=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/opt/kubernetes/cfg/kubelet
ExecStart=/opt/kubernetes/bin/kubelet $KUBELET_OPTS
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target

启动服务，并设置开机启动：

[root@k8s-node01 ~]#systemctl daemon-reload
[root@k8s-node01 ~]#systemctl enable kubelet 
[root@k8s-node01 ~]#systemctl restart kubelet

在Master审批Node加入集群：

启动后还没加入到集群中，需要手动允许该节点才可以。在Master节点查看请求签名的Node：

[root@k8s-master~]# /opt/kubernetes/bin/kubectl get csr
[root@k8s-master~]#/opt/kubernetes/bin/kubectl certificate approve XXXXID
[root@k8s-master~]#/opt/kubernetes/bin/kubectl get node

配置kube-proxy

创建kube-proxy配置文件：

[root@k8s-node01 ~]#vim /opt/kubernetes/cfg/kube-proxy 
KUBE_PROXY_OPTS="--logtostderr=true \
--v=4 \
--log-dir=/opt/kubernetes/logs \
--hostname-override=192.168.4.155 \
--cluster-cidr=10.0.0.0/24 \           //不要改，就是这个ip
--kubeconfig=/opt/kubernetes/cfg/kube-proxy.kubeconfig"

创建systemd服务文件：

[root@k8s-node01 ~]# vim /usr/lib/systemd/system/kube-proxy.service 
[Unit]
Description=Kubernetes Proxy
After=network.target

[Service]
EnvironmentFile=-/opt/kubernetes/cfg/kube-proxy
ExecStart=/opt/kubernetes/bin/kube-proxy $KUBE_PROXY_OPTS
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动服务，并设置开机启动：

[root@k8s-node01 ~]#systemctl daemon-reload
[root@k8s-node01 ~]#systemctl enable kube-proxy
[root@k8s-node01 ~]#systemctl restart kube-proxy

注意：其他节点加入集群与k8s-node01方式相同，但需修改kubelet的–address和–hostname-override选项为本机IP。

查看集群状态

[root@k8s-master~]# /opt/kubernetes/bin/kubectl get node
[root@k8s-master~]# /opt/kubernetes/bin/kubectl get cs

查看启动日志的方法

[root@k8s-master~]# journalctl -u kubelet

4.5 部署Flannel网络

下载组件并定义网段

flannel要用etcd存储自身一个子网信息，所以要保证能成功连接etcd，写入预定义子网段：

[root@k8s-master ~]#wget https://github.com/coreos/flannel/releases/download/v0.12.0/flannel-v0.12.0-linux-amd64.tar.gz
[root@k8s-master ~]#/opt/etcd/bin/etcdctl \
--ca-file=/opt/etcd/ssl/ca.pem --cert-file=/opt/etcd/ssl/server.pem --key-file=/opt/etcd/ssl/server-key.pem \
--endpoints="https://192.168.4.154:2379,https://192.168.4.155:2379,https://192.168.4.156:2379" \
set /coreos.com/network/config  '{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}}'

以下部署步骤在每个node节点都操作

下载二进制包：

#wget https://github.com/coreos/flannel/releases/download/v0.12.0/flannel-v0.12.0-linux-amd64.tar.gz
mkdir -pv /opt/kubernetes/bin
tar -zxvf flannel-v0.12.0-linux-amd64.tar.gz
mv flanneld mk-docker-opts.sh /opt/kubernetes/bin

配置Flannel：

# mkdir -p /opt/kubernetes/cfg/
# vim /opt/kubernetes/cfg/flanneld
FLANNEL_OPTIONS="--etcd-endpoints=https://192.168.4.154:2379,https://192.168.4.155:2379, https://192.168.4.156:2379 \
 --etcd-cafile=/opt/etcd/ssl/ca.pem \
--etcd-certfile=/opt/etcd/ssl/server.pem \
--etcd-keyfile=/opt/etcd/ssl/server-key.pem  \
--log-dir=/opt/kubernetes/logs "

systemd管理Flannel：

# vi /usr/lib/systemd/system/flanneld.service
[Unit]
Description=Flanneld overlay address etcd agent
After=network-online.target network.target
Before=docker.service

[Service]
Type=notify
EnvironmentFile=/opt/kubernetes/cfg/flanneld
ExecStart=/opt/kubernetes/bin/flannel  $FLANNEL_OPTIONS --ip-masq=true --etcd-prefix=/coreos.com/network
ExecStartPost=/opt/kubernetes/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/subnet.env
Restart=on-failure

[Install]
WantedBy=multi-user.target

配置Docker启动指定子网段：

#vi /usr/lib/systemd/system/docker.service 
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/run/flannel/subnet.env
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

从其他节点拷贝证书文件到node1和node2上：flanel需要证书

scp /opt/etcd/ssl/*  k8s-node1:/opt/etcd/ssl/

重启flannel和docker：

systemctl daemon-reload
systemctl restart flanneld
systemctl enable flanneld
systemctl restart docker

检查是否生效：

#ps -ef |grep docker
root     20941     1  1 Jun28 ?        09:15:34 /usr/bin/dockerd --bip=172.17.34.1/24 --ip-masq=false --mtu=1450
#ip addr
3607: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN 
    link/ether 8a:2e:3d:09:dd:82 brd ff:ff:ff:ff:ff:ff
    inet 172.17.34.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
3608: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 02:42:31:8f:d3:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.34.1/24 brd 172.17.34.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:31ff:fe8f:d302/64 scope link 
       valid_lft forever preferred_lft forever

确保docker0与flannel.1在同一网段。

测试不同节点互通，在当前节点访问另一个Node节点docker0 IP：

[root@k8s-master ~]## ping 172.17.58.1
PING 172.17.58.1 (172.17.58.1) 56(84) bytes of data.
64 bytes from 172.17.58.1: icmp_seq=1 ttl=64 time=0.263 ms
64 bytes from 172.17.58.1: icmp_seq=2 ttl=64 time=0.204 ms
如果能通说明Flannel部署成功。如果不通检查下日志：journalctl -u flannel

4.6 部署Calico网络

关闭Flannel服务(在各node节点)

[root@k8s-node1 ~]# systemctl stop flanneld
[root@k8s-node1 ~]#systemctl disable flanneld
[root@k8s-node1 ~]# systemctl status flanneld

重启所有节点

下载官方yaml文件(在Master节点操作)

# wget https://docs.projectcalico.org/manifests/calico-etcd.yaml
# mv calico-etcd.yaml calico.yaml

配置calico

calico# vim calico.yaml
data:
  # Configure this with the location of your etcd cluster.
  etcd_endpoints: " https://192.168.4.154:2379,https://192.168.4.155:2379,https://192.168.4.156:2379"
  
  # If you're using TLS enabled etcd uncomment the following.
  # You must also populate the Secret below with these files.  
  etcd_ca: "/calico-secrets/etcd-ca"   #取消原来的注释即可
  etcd_cert: "/calico-secrets/etcd-cert"
  etcd_key: "/calico-secrets/etcd-key"
  
  apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: calico-etcd-secrets
  namespace: kube-system
data:  
 etcd-key: (cat /etc/kubernetes/ssl/etcd-key.pem | base64 | tr -d '\n') #将输出结果填写在这里
  etcd-cert: (cat /etc/kubernetes/ssl/etcd.pem | base64 | tr -d '\n') #将输出结果填写在这里
  etcd-ca: (cat /etc/kubernetes/ssl/ca.pem | base64 | tr -d '\n') #将输出结果填写在这里
   #如果etcd没用启用tls则为null 
  #上面是必须要修改的参数，文件中有一个参数是设置pod network地址的，根据实际情况做修改：
   - name: CALICO_IPV4POOL_CIDR
value: "10.37.0.0/16"

修改kubelet配置

设置各node上Kubelet服务的启动参数： –network-plugin=cni，

设置 master上的kube-apiserver服务的启动参数: –allow-privileged=true (因为calico-node需要以特权模式运行在各node上)

设置好后，重新启动kubelet。

这样通过calico就完成了Node间容器网络的设置，在后续的pod创建过程中，Kubelet将通过CNI接口调用 calico进行Pod的网络设置包括IP地址，路由规则，Iptables规则

验证各Node间网络联通性:

kubelet启动后主机上就生成了一个tunl0接口。

#第一台Node查看：

[root@k8s-node1 ~]#ip route
172.16.169.128/26 via 192.168.4.156 dev tunl0 proto bird onlink

#第二台Node查看：

[root@k8s-node1 ~]#ip route
172.16.36.64/26 via 192.168.4.155 dev tunl0 proto bird onlink

#每台node上都自动设置了到其它node上pod网络的路由，去往其它节点的路都是通过tunl0接口，这就是IPIP模式。

如果设置CALICO_IPV4POOL_IPIP=”off” ，即不使用IPIP模式，则Calico将不会创建tunl0网络接口，路由规则直接使用物理机网卡作为路由器转发。

4.7 部署WebUI

下载官方的yaml文件

[root@k8s-master~]#wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.3/aio/deploy/recommended.yaml

修改recommended.yaml

修改Service的NodePort便于我们从集群外使用浏览器访问dashboard
service段配置更改如下：(NodePort: 30001可以省略，缺省则为随机端口)

执行安装

[root@k8s-master~]# kubectl create -f recommended.yaml

创建 serviceaccount

[root@k8s-master~]# vi  dashboard-sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dashboard-admin
  namespace: kubernetes-dashboard

执行安装

[root@k8s-master~]# kubectl create -f dashboard-sa.yaml

创建clusterrolebinding为dashboard sa授权集群权限cluster-admin

[root@k8s-master~]# vi dashboard-clusterrolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dashboard-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: dashboard-admin
  namespace: kubernetes-dashboard

执行安装

[root@k8s-master~]# kubectl create -f dashboard-clusterrolebinding

获取token

[root@k8s-master~]# kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')

查看NodePort端口

如果按上面在NodePort配置成固定端口，就直接用固定端口访问即可

访问页面

5 结果验证

运行一个测试示例，创建一个Nginx Web，判断集群是否正常工作：

创建nginx.yaml

[root@k8s-master~]#vi nginx.yaml 
# API 版本号
apiVersion: apps/v1
# 类型，如：Pod/ReplicationController/Deployment/Service/Ingress
kind: Deployment
metadata:
  # Kind 的名称
  name: nginx-app
spec:
  selector:
    matchLabels:
      # 容器标签的名字，发布 Service 时，selector 需要和这里对应
      app: nginx
  # 部署的实例数量
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      # 配置容器，数组类型，说明可以配置多个容器
      containers:
      # 容器名称
      - name: nginx
        # 容器镜像
        image: nginx:1.17
        # 只有镜像不存在时，才会进行镜像拉取
        imagePullPolicy: IfNotPresent
        ports:
        # Pod 端口
        - containerPort: 80

创建pods：

[root@k8s-master~]#kubectl  apply -f nginx.yaml

查看pod详细信息：

[root@k8s-master~]# kubectl get pods
[root@k8s-master~]# kubectl get deployment

暴露服务：

[root@k8s-master~]# kubectl expose deployment nginx-app --port=80 --type=LoadBalancer

查看服务状态(查看对外的端口)：

[root@k8s-master~]# kubectl get services

浏览器校验

删除pods

先删除pod

再删除对应的deployment

6 参考资料

如有其它问题，请填写右侧需求表单联系我们。www.asterfusion.com

1 方案概述

本文主要讲解CX-N 系列交换机基于MC-LAG实现的三层组网下的相关解决方案，验证网络通信、故障转移和恢复等能力。整个验证过程中交换机所有命令通过KLISH命令行配置完成。

2 物理网络拓扑

本次相关方案验证的整体物理拓扑如图1所示：

3 硬件与软件环境

3.1 设备管理口

验证过程中所涉及到的设备、主机名及管理网口IP地址等信息，如下表所示：

设备	主机名	管理口IP
CX532-N	Spine1	10.230.1.7
CX532-N	Spine2	10.230.1.8
CX308-N	Leaf1	10.230.1.18
CX308-N	Leaf2	10.230.1.19
CX308-N	Leaf3	10.230.1.20
CX308-N	Leaf4	10.230.1.21
Server	Server1	10.230.1.11
Server	Server2	10.230.1.13

3.2 硬件环境

验证环境中涉及到的硬件环境，如下表所示：

名称	型号	硬件指标	数量	备注
Spine	CX532P-N	【参见产品彩页】	2
Leaf	CX308P-48Y-N	【参见产品彩页】	4
光模块	10G 100G	SFP+ QSFP28	8 24	为了尽量减少物料种类，线缆和模块速率进行了统一，交换机互联使用100G模块和线缆，服务器需用10G模块和线缆
网线	/	/	8
光纤	多模多模	10G /25G适用 100G适用	4 12
服务器	/	内存推荐8G以上	2

3.3 软件环境

验证环境中涉及到的软件环境，如下表所示：

名称	版本
iperf3	3.1.7
CX532-N	SONiC.201911.R0312P03
CX308-N	SONiC.201911.R0312P03
服务器系统	CentOS Linux 7.8.2003
服务器内核	3.10.0-1127.18.2.el7

4 基础环境部署

在两台Server服务器上，安装部署本次验证方案的所需要的基础软件。

补充说明：以”[root@server ~]#”为开头的命令表示两台服务器都要执行。

4.1 LLDP

在两台Server服务器上安装LLDP服务，如果是X710网卡要求网卡驱动版本大于2.3.6，然后配置网卡开启LLDP。

[root@server ~]# yum -y install epel-release
[root@server ~]# yum -y install lldpd
[root@server ~]# systemctl start lldpd
[root@server ~]# systemctl enable lldpd
[root@server ~]# lspci |grep -i ether

[root@server ~]# ethtool -i ens1f2
[root@server ~]# ethtool -i ens1f3

[root@sever ~]# ethtool --set-priv-flags ens1f2 disable-fw-lldp on
[root@sever ~]# ethtool --set-priv-flags ens1f3 disable-fw-lldp on

4.2 安装iPerf3

在2台Server服务器上安装iPerf3软件用来打流。

在2台服务器上上执行：
[root@server ~]# yum -y install iperf3
[root@server ~]# iperf3 -v
iperf 3.1.7
Linux compute-2 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64
Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing

4.3 检查链路连接

所有交换机设备要提前检查和Server服务器之间的链路连接情况，确保交换机设备和Server服务器之间的链路连接没有问题，以下命令在所有交换机设备上执行。

admin@sonic:~$ sudo config cli-mode cli
admin@sonic:~$ sudo sonic-cli
sonic#
Spine1# show lldp neighbor summary

Spine2# show lldp neighbor summary

Leaf1#show lldp neighbor summary

Leaf2#show lldp neighbor summary

Leaf3#show lldp neighbor summary

Leaf4#show lldp neighbor summary

5 组网环境配置

5.1 逻辑拓扑

5.2 Spine1

设备恢复出厂设置

配置CICSO-LIKE命令行，恢复Spine1设备到出厂设置。

Spine1@sonic:~$ sudo config cli-mode cli
Spine1@sonic:~$ sudo sonic-cli
sonic# delete startup-config
sonic# reload

配置Spine1接口IP

在Spine1交换机上配置与4台Leaf交换机的互联接口IP。

Spine1# configure terminal
Spine1(config)# interface ethernet 0/4
Spine1(config-if-0/4)# ip address 10.0.10.2/24
Spine1(config)# interface ethernet 0/8
Spine1(config-if-0/8)# ip address 10.0.11.2/24
Spine1(config)# interface ethernet 0/12
Spine1(config-if-0/12)# ip address 10.0.12.2/24
Spine1(config)# interface ethernet 0/16
Spine1(config-if-0/16)# ip address 10.0.13.2/24

配置Spine1的BGP

在Spine1交换机上配置4台Leaf交换机的BGP邻居。

Spine1# configure terminal
Spine1(config)# interface ethernet 0/4
Spine1(config-if-0/4)# ip address 10.0.10.2/24
Spine1(config-if-0/4)# interface ethernet 0/8
Spine1(config-if-0/8)# ip address 10.0.11.2/24
Spine1(config-if-0/8)# interface ethernet 0/12
Spine1(config-if-0/12)# ip address 10.0.12.2/24
Spine1(config-if-0/12)# interface ethernet 0/16
Spine1(config-if-0/16)# ip address 10.0.13.2/24
Spine1(config)# router bgp 65003     
Spine1(config-router)# bgp router-id 10.10.0.3
Spine1(config)# interface loopback 0
Spine1(config-loif-0)# ip address 10.10.0.3/32
Change Loopback0 ip from 10.1.0.1/32 to 10.10.0.3/32
Loopback ip will be used as bgp router-id in frr
Spine1(config)# router bgp 65003
Spine1(config-router)# no bgp ebgp-requires-policy 
Spine1(config-router)# neighbor 10.0.10.1 remote-as 65007
Spine1(config-router)# neighbor 10.0.11.1 remote-as 65007
Spine1(config-router)# neighbor 10.0.12.1 remote-as 65008
Spine1(config-router)# neighbor 10.0.13.1 remote-as 65008
Spine1(config-router)# address-family ipv4 unicast
Spine1(config-router)# address-family l2vpn evpn
Spine1(config-router-af)# neighbor 10.0.10.1 activate
Spine1(config-router-af)# neighbor 10.0.11.1 activate
Spine1(config-router-af)# neighbor 10.0.12.1 activate
Spine1(config-router-af)# neighbor 10.0.13.1 activate
Spine1(config-router-af)# advertise-all-vni

5.3 Spine2

设备恢复出厂设置

配置CICSO-LIKE命令行，恢复Spine2设备到出厂设置。

Spine2@sonic:~$ sudo config cli-mode cli
Spine2@sonic:~$ sudo sonic-cli
sonic# delete startup-config
sonic# reload

配置Spine2接口IP

在Spine2交换机上配置与4台Leaf交换机的互联接口IP。

Spine2# configure terminal
Spine2(config)# interface ethernet 0/4
Spine2(config-if-0/4)# ip address 10.1.10.2/24
Spine2(config)# interface ethernet 0/8
Spine2(config-if-0/8)# ip address 10.1.11.2/24
Spine2(config)# interface ethernet 0/12
Spine2(config-if-0/12)# ip address 10.1.12.2/24
Spine2(config)# interface ethernet 0/16
Spine2(config-if-0/16)# ip address 10.1.13.2/24

配置Spine2的BGP

在Spine2交换机上配置4台Leaf交换机的BGP邻居。

Spine2# configure terminal
Spine2(config)# interface ethernet 0/4
Spine2(config-if-0/4)# ip address 10.0.10.2/24
Spine2(config-if-0/4)# interface ethernet 0/8
Spine2(config-if-0/8)# ip address 10.0.11.2/24
Spine2(config-if-0/8)# interface ethernet 0/12
Spine2(config-if-0/12)# ip address 10.0.12.2/24
Spine2(config-if-0/12)# interface ethernet 0/16
Spine2(config-if-0/16)# ip address 10.0.13.2/24
Spine2(config)# router bgp 65004
Spine2(config-router)# bgp router-id 10.10.0.4
Spine2(config)# interface loopback 0
Spine2(config-loif-0)# ip address 10.10.0.4/32
Change Loopback0 ip from 10.1.0.1/32 to 10.10.0.3/32
Loopback ip will be used as bgp router-id in frr
Spine2(config)# router bgp 65004
Spine2(config-router)# no bgp ebgp-requires-policy 
Spine2(config-router)# neighbor 10.1.10.1 remote-as 65007
Spine2(config-router)# neighbor 10.1.11.1 remote-as 65007
Spine2(config-router)# neighbor 10.1.12.1 remote-as 65008
Spine2(config-router)# neighbor 10.1.13.1 remote-as 65008
Spine2(config-router)# address-family l2vpn evpn
Spine2(config-router-af)# neighbor 10.1.10.1 activate
Spine2(config-router-af)# neighbor 10.1.11.1 activate
Spine2(config-router-af)# neighbor 10.1.12.1 activate
Spine2(config-router-af)# neighbor 10.1.13.1 activate
Spine2(config-router-af)# advertise-all-vni

5.4 Leaf1

设备恢复出厂设置

恢复Leaf1设备到出厂设置。

Leaf1# delete startup-config
Leaf1# reload

配置Leaf1端口速率

配置Leaf1交换机的Ethernet2口速率为10G。

Leaf1# configure terminal 
Leaf1(config)# interface ethernet 0/2  
Leaf1(config-if-0/2)# speed 10000
Leaf1(config-if-0/2)# show this
!
interface ethernet 0/2
speed 10000

配置Leaf1接口IP

在Leaf1交换机上配置与Leaf、Spine交换机的互联接口IP以及PortChannel、VLAN信息。

Leaf1# configure terminal 
Leaf1(config)# interface ethernet 0/48
Leaf1(config-if-0/48)# ip address 10.0.10.1/24
Leaf1(config)# interface ethernet 0/52
Leaf1(config-if-0/52)# ip address 10.1.10.1/24
Leaf1(config) interface link-aggregation 1
Leaf1(config) interface link-aggregation 3
Leaf1(config)# interface ethernet 0/2       
Leaf1(config-if-0/2)# link-aggregation-group 1
Leaf1(config-if-0/2)# interface ethernet 0/56
Leaf1(config-if-0/56)# link-aggregation-group 3
Leaf1(config-if-0/56)# interface ethernet 0/60
Leaf1(config-if-0/60)# link-aggregation-group 3
Leaf1(config)# vlan 10
Leaf1(config)# interface vlan 10
Leaf1(config-vlanif-10)# ip address 100.0.10.1/24
Leaf1(config)# interface link-aggregation 1
Leaf1(config-lagif-1)# switchport access vlan 10
Leaf1(config)# interface link-aggregation 3
Leaf1(config-lagif-1)# switchport trunk vlan 10

配置Leaf1的MC-LAG

在Leaf1交换机上配置与Leaf2交换机互联接口的MC-LAG。

Leaf1# configure terminal 
Leaf1(config)# vlan 30
Leaf1(config)# interface link-aggregation 3
Leaf1(config-lagif-1)# switchport trunk vlan 30
Leaf1(config)# interface vlan 30
Leaf1(config-vlanif-30)# ip address 11.0.0.6/24
Leaf1(config)# mclag domain 1
Leaf1(mclag-domain)# peer-link link-aggregation 3 
Leaf1(mclag-domain)# local-address 11.0.0.6   
Leaf1(mclag-domain)# peer-address 11.0.0.7
Leaf1(mclag-domain)# member lag 1
Leaf1(mclag-domain)# commit
Leaf1(config)# interface vlan 10
Leaf1(config-vlanif-10)# mac-address 18:17:25:37:64:40

配置Leaf1的BGP

在Leaf1交换机上配置2台Spine交换机的BGP邻居。

Leaf1# configure terminal 
Leaf1(config)# router bgp 65007
Leaf1(config-router)# bgp router-id 10.10.0.7
Leaf1(config)# interface loopback 0
Leaf1(config-loif-0)# ip address 10.10.0.7/32
Change Loopback0 ip from 10.1.0.1/32 to 10.10.0.7/32
Loopback ip will be used as bgp router-id in frr
Leaf1(config)# router bgp 65007
Leaf1(config-router)# no bgp ebgp-requires-policy
Leaf1(config-router)# neighbor 10.0.10.2 remote-as 65003
Leaf1(config-router)# neighbor 10.1.10.2 remote-as 65004
Leaf1(config-router)# address-family ipv4 unicast
Leaf1(config-router)# network 10.10.0.7/32
Leaf1(config-router)# address-family l2vpn evpn
Leaf1(config-router-af)# neighbor 10.0.10.2 activate
Leaf1(config-router-af)# neighbor 10.1.10.2 activate
Leaf1(config-router-af)# advertise-all-vni

配置Leaf1的EVPN

在Leaf1交换机上配置EVPN、创建VNET，建立二三层VXLAN映射。

Leaf1# configure terminal
Leaf1(config)# interface vxlan 0
Leaf1(config-vxlanif-0)# source 10.10.0.7
Leaf1(config)# evpn-overlay enable
Leaf1(config)# vrf 123
Leaf1(config-vrf)# mac 60:eb:5a:00:86:20
Leaf1(config-vrf)# interface vlan 10
Leaf1(config-vlanif-10)# vrf 123
Leaf1(config)# vlan 10
Leaf1(config-vlan-10)# vni 10
Leaf1(config)# vrf 123
Leaf1(config-vrf)# vni 1000

5.5 Leaf2

设备恢复出厂设置

恢复Leaf2设备到出厂设置。

sonic# delete startup-config
sonic# reload

配置Leaf2端口速率

配置Leaf2交换机的Ethernet2口速率为10G。

sonic# configure terminal 
sonic(config)# interface ethernet 0/2  
sonic(config-if-0/2)# speed 10000
sonic(config-if-0/2)# show this
!
interface ethernet 0/2
speed 10000

配置Leaf2接口IP

在Leaf2交换机上配置与Leaf、Spine交换机的互联接口IP以及PortChannel、VLAN信息。

Leaf2# configure terminal 
Leaf2(config)# interface ethernet 0/48
Leaf2(config-if-0/48)# ip address 10.0.11.1/24
Leaf2(config)# interface ethernet 0/52
Leaf2(config-if-0/52)# ip address 10.1.11.1/24
Leaf2(config) interface link-aggregation 1
Leaf2(config) interface link-aggregation 3
Leaf2(config)# interface ethernet 0/2       
Leaf2(config-if-0/2)# link-aggregation-group 1
Leaf2(config-if-0/2)# interface ethernet 0/56
Leaf2(config-if-0/56)# link-aggregation-group 3
Leaf2(config-if-0/56)# interface ethernet 0/60
Leaf2(config-if-0/60)# link-aggregation-group 3
Leaf2(config)# vlan 10
Leaf2(config)# interface vlan 10
Leaf2(config-vlanif-10)# ip address 100.0.10.1/24
Leaf2(config)# interface link-aggregation 1
Leaf2(config-lagif-1)# switchport access vlan 10
Leaf2(config)# interface link-aggregation 3
Leaf2(config-lagif-1)# switchport trunk vlan 10

配置Leaf2的MC-LAG

在Leaf2交换机上配置与Leaf1交换机互联接口的MC-LAG。

Leaf2# configure terminal 
Leaf2(config)# vlan 30
Leaf2(config)# interface link-aggregation 3
Leaf2(config-lagif-1)# switchport trunk vlan 30
Leaf2(config)# interface vlan 30
Leaf2(config-vlanif-30)# ip address 11.0.0.7/24
Leaf2(config)# mclag domain 1
Leaf2(mclag-domain)# peer-link link-aggregation 3 
Leaf2(mclag-domain)# local-address 11.0.0.7   
Leaf2(mclag-domain)# peer-address 11.0.0.6
Leaf2(mclag-domain)# member lag 1
Leaf2(mclag-domain)# commit
Leaf2(config)# interface vlan 10
Leaf2(config-vlanif-10)# mac-address 18:17:25:37:64:40

配置Leaf2的BGP

在Leaf2交换机上配置2台Spine交换机的BGP邻居。

Leaf1# configure terminal 
Leaf2(config)# router bgp 65007
Leaf2(config-router)# bgp router-id 10.10.0.7
Leaf2(config)# interface loopback 0
Leaf2(config-loif-0)# ip address 10.10.0.7/32
Change Loopback0 ip from 10.1.0.1/32 to 10.10.0.7/32
Loopback ip will be used as bgp router-id in frr
Leaf2(config)# router bgp 65007
Leaf2(config-router)# no bgp ebgp-requires-policy
Leaf2(config-router)# neighbor 10.0.11.2 remote-as 65003
Leaf2(config-router)# neighbor 10.1.11.2 remote-as 65004
Leaf2(config-router)# address-family ipv4 unicast
Leaf2(config-router)# network 10.10.0.7/32
Leaf2(config-router)# address-family l2vpn evpn
Leaf2(config-router-af)# neighbor 10.0.11.2 activate
Leaf2(config-router-af)# neighbor 10.1.11.2 activate
Leaf2(config-router-af)# advertise-all-vni

配置Leaf2的EVPN

在Leaf2交换机上配置EVPN、创建VNET，建立二三层VXLAN映射。

Leaf2# configure terminal
Leaf2(config)# interface vxlan 0
Leaf2(config-vxlanif-0)# source 10.10.0.7
Leaf2(config)# evpn-overlay enable
Leaf2(config)# vrf 123
Leaf2(config-vrf)# mac 60:eb:5a:00:86:20
Leaf2(config-vrf)# interface vlan 10
Leaf2(config-vlanif-10)# vrf 123
Leaf2(config)# vlan 10
Leaf2(config-vlan-10)# vni 10
Leaf2(config)# vrf 123
Leaf2(config-vrf)# vni 1000

5.6 Leaf3

设备恢复出厂设置

恢复Leaf3设备到出厂设置。

sonic# delete startup-config
sonic# reload

配置Leaf3端口速率

配置Leaf3交换机的Ethernet2口速率为10G。

Leaf3# configure terminal 
Leaf3(config)# interface ethernet 0/2  
Leaf3(config-if-0/2)# speed 10000
Leaf3(config-if-0/2)# show this
!
interface ethernet 0/2
speed 10000

配置Leaf3接口IP

在Leaf3交换机上配置与Leaf、Spine交换机的互联接口IP以及PortChannel、VLAN信息。

Leaf3# configure terminal 
Leaf3(config)# interface ethernet 0/48
Leaf3(config-if-0/48)# ip address 10.0.12.1/24
Leaf3(config)# interface ethernet 0/52
Leaf3(config-if-0/52)# ip address 10.1.12.1/24
Leaf3(config) interface link-aggregation 1
Leaf3(config) interface link-aggregation 3
Leaf3(config)# interface ethernet 0/2       
Leaf3(config-if-0/2)# link-aggregation-group 1
Leaf3(config-if-0/2)# interface ethernet 0/64
Leaf3(config-if-0/64)# link-aggregation-group 3
Leaf3(config-if-0/64)# interface ethernet 0/68
Leaf3(config-if-0/68)# link-aggregation-group 3
Leaf3(config)# vlan 20
Leaf3(config)# interface vlan 20
Leaf3(config-vlanif-20)# ip address 100.0.20.1/24
Leaf3(config)# interface link-aggregation 1
Leaf3(config-lagif-1)# switchport access vlan 20
Leaf3(config)# interface link-aggregation 3
Leaf3(config-lagif-3)# switchport trunk vlan 20

配置Leaf3的MC-LAG

在Leaf3交换机上配置与Leaf4交换机互联接口的MC-LAG。

Leaf3(config)# vlan 30
Leaf3(config)# interface link-aggregation 3
Leaf3(config-lagif-3)# switchport trunk vlan 30
Leaf3(config)# interface vlan 30
Leaf3(config-vlanif-30)# ip address 11.0.0.8/24
Leaf3(config)# mclag domain 1
Leaf3(mclag-domain)# peer-link link-aggregation 3 
Leaf3(mclag-domain)# local-address 11.0.0.8 
Leaf3(mclag-domain)# peer-address 11.0.0.9
Leaf3(mclag-domain)# member lag 1
Leaf3(mclag-domain)# commit
Leaf3(config)# interface vlan 20
Leaf3(config-vlanif-20)# mac-address 18:17:25:37:64:32

配置Leaf3的BGP

在Leaf3交换机上配置2台Spine交换机的BGP邻居。

Leaf3(config)# router bgp 65008
Leaf3(config-router)# bgp router-id 10.10.0.8
Leaf3(config)# interface loopback 0
Leaf3(config-loif-0)# ip address 10.10.0.8/32
Change Loopback0 ip from 10.1.0.1/32 to 10.10.0.8/32
Loopback ip will be used as bgp router-id in frr
Leaf3(config)# router bgp 65008
Leaf3(config-router)# no bgp ebgp-requires-policy
Leaf3(config-router)# neighbor 10.0.12.2 remote-as 65003
Leaf3(config-router)# neighbor 10.1.12.2 remote-as 65004
Leaf3(config-router)# address-family ipv4 unicast
Leaf3(config-router)# network 10.10.0.8/32
Leaf3(config-router)# address-family l2vpn evpn
Leaf3(config-router-af)# neighbor 10.0.12.2 activate
Leaf3(config-router-af)# neighbor 10.1.12.2 activate
Leaf3(config-router-af)# advertise-all-vni

配置Leaf3的EVPN

在Leaf3交换机上配置EVPN、创建VNET，建立二三层VXLAN映射。

Leaf3# configure terminal
Leaf3(config)# interface vxlan 0
Leaf3(config-vxlanif-0)# source 10.10.0.8
Leaf3(config)# evpn-overlay enable
Leaf3(config)# vrf 456
Leaf3(config-vrf)# mac 60:eb:5a:00:86:22
Leaf3(config-vrf)# interface vlan 20
Leaf3(config-vlanif-10)# vrf 456
Leaf3(config)# vlan 20
Leaf3(config-vlan-10)# vni 20
Leaf3(config)# vrf 456
Leaf3(config-vrf)# vni 1000

5.7 Leaf4

设备恢复出厂设置

恢复Leaf4设备到出厂设置。

sonic# delete startup-config
sonic# reload

配置Leaf4端口速率

配置Leaf4交换机的Ethernet2口速率为10G。

Leaf4# configure terminal 
Leaf4(config)# interface ethernet 0/2  
Leaf4(config-if-0/2)# speed 10000
Leaf4(config-if-0/2)# show this
!
interface ethernet 0/2
speed 10000

配置Leaf4接口IP

在Leaf4交换机上配置与Leaf、Spine交换机的互联接口IP以及PortChannel、VLAN信息。

Leaf4# configure terminal 
Leaf4(config)# interface ethernet 0/48
Leaf4(config-if-0/48)# ip address 10.0.13.1/24
Leaf4(config)# interface ethernet 0/52
Leaf4(config-if-0/52)# ip address 10.1.13.1/24
Leaf4(config) interface link-aggregation 1
Leaf4(config) interface link-aggregation 3
Leaf4(config)# interface ethernet 0/2       
Leaf4(config-if-0/2)# link-aggregation-group 1
Leaf4(config-if-0/2)# interface ethernet 0/64
Leaf4(config-if-0/64)# link-aggregation-group 3
Leaf4(config-if-0/64)# interface ethernet 0/68
Leaf4(config-if-0/68)# link-aggregation-group 3
Leaf4(config)# vlan 20
Leaf4(config)# interface vlan 20
Leaf4(config-vlanif-20)# ip address 100.0.20.1/24
Leaf4(config)# interface link-aggregation 1
Leaf4(config-lagif-1)# switchport access vlan 20
Leaf4(config)# interface link-aggregation 3
Leaf4(config-lagif-3)# switchport trunk vlan 20

配置Leaf4的MC-LAG

在Leaf4交换机上配置与Leaf3交换机互联接口的MC-LAG。

Leaf4(config)# vlan 30
Leaf4(config)# interface link-aggregation 3
Leaf4(config-lagif-3)# switchport trunk vlan 30
Leaf4(config)# interface vlan 30
Leaf4(config-vlanif-30)# ip address 11.0.0.9/24
Leaf4(config)# mclag domain 1
Leaf4(mclag-domain)# peer-link link-aggregation 3 
Leaf4(mclag-domain)# local-address 11.0.0.9
Leaf4(mclag-domain)# peer-address 11.0.0.8
Leaf4(mclag-domain)# member lag 1
Leaf4(mclag-domain)# commit
Leaf4(config)# interface vlan 20
Leaf4(config-vlanif-20)# mac-address 18:17:25:37:64:32

配置Leaf4的BGP

在Leaf4交换机上配置2台Spine交换机的BGP邻居。

Leaf4(config)# router bgp 65008
Leaf4(config-router)# bgp router-id 10.10.0.8
Leaf4(config)# interface loopback 0
Leaf4(config-loif-0)# ip address 10.10.0.8/32
Change Loopback0 ip from 10.1.0.1/32 to 10.10.0.8/32
Loopback ip will be used as bgp router-id in frr
Leaf4(config)# router bgp 65008
Leaf4(config-router)# no bgp ebgp-requires-policy
Leaf4(config-router)# neighbor 10.0.13.2 remote-as 65003
Leaf4(config-router)# neighbor 10.1.13.2 remote-as 65004
Leaf4(config-router)# address-family ipv4 unicast
Leaf4(config-router)# network 10.10.0.8/32
Leaf4(config-router)# address-family l2vpn evpn
Leaf4(config-router-af)# neighbor 10.0.13.2 activate
Leaf4(config-router-af)# neighbor 10.1.13.2 activate
Leaf4(config-router-af)# advertise-all-vni

配置Leaf4的EVPN

在Leaf4交换机上配置EVPN、创建VNET，建立二三层VXLAN映射。

Leaf4# configure terminal
Leaf4(config)# interface vxlan 0
Leaf4(config-vxlanif-0)# source 10.10.0.8
Leaf4(config)# evpn-overlay enable
Leaf4(config)# vrf 456
Leaf4(config-vrf)# mac 60:eb:5a:00:86:22
Leaf4(config-vrf)# interface vlan 20
Leaf4(config-vlanif-10)# vrf 456
Leaf4(config)# vlan 20
Leaf4(config-vlan-10)# vni 20
Leaf4(config)# vrf 456
Leaf4(config-vrf)# vni 1000

配置指导：CX102S开放智能网关平台的DPU上安装Linux Debian

1 操作前声明
2 安装流程
3 附录

DPU操作系统安装指导-Debian

1 操作前声明

技术人员在进行后续操作前，建议仔细阅读产品的用户手册，对CX102S-D设备的结构设计充分了解。
本文档将以Debian Linux系统的安装为例，介绍如何安装一个新系统到设备的计算单元（DPU）。

2 安装流程

2.1 准备安装所需文件和物料

系统文件，包括：内核镜像、设备树、文件系统；
U盘，容量不小于4GB。

常见的Linux发行版系统（Debian、OpenWRT等）的内核镜像与设备树文件，请联系星融元的售前/售后获取。用户也可以根据需求自行编译适配内核镜像和设备树，以支持特定的系统和版本。

U盘烧录：

可以使用balenaEtcher烧录工具，或通过Linux的命令行，将准备好的Debian Linux烧录进U盘。

工具参考下载地址：

2.2 从U盘中引导临时系统

把制作好的U盘插入设备USB接口，连接串口到电脑，设备上电启动，根据系统提示按任意键中断autoboot进入uboot界面。在串口连接下输入switchUart0、1或2可分别切换到交换单元、计算单元 1或计算单元 2。交换单元中断autoboot流程后会进入ac5y uboot，计算单元中断autoboot流程后会进入9130 uboot。

默认下会进入ac5y uboot，后续操作需要通过switchUart*命令切换到指定计算单元的9130 uboot界面中进行。

设置环境变量，让计算单元从U盘中引导系统：

Marvell>> setenv bootusb 'usb reset;ext4load usb 0:1 $kernel_addr_r boot/Image;ext4load usb 0:1 $fdt_addr_r boot/cn9130-db-A.dtb;booti $kernel_addr_r - $fdt_addr_r'
Marvell>> setenv bootargs 'console=ttyS0,115200 earlycon=uart8250,mmio32,0xf0512000 root=/dev/sda1 rootwait  rw pci=pcie_bus_safe cpuidle.off=1'
Marvell>> saveenv
Marvell>> run bootusb

2.3 安装系统到DPU硬盘

成功从U盘引导系统后，证明准备的系统能正常适配计算单元芯片，所以在这一步将U盘的系统文件拷贝到计算单元的本地存储MMC。

# 强制格式化MMC为ext4
mkfs.ext4 -F /dev/mmcblk0

# 将MMC分成两个分区,boot和root
fdisk /dev/mmcblk0 <<EOF
n
p
1
2048
+1024M
n
p
2
2099200

w
EOF

# 格式化分区为 ext4 文件系统
echo "格式化分区为 ext4 文件系统..."
mkfs.ext4 /dev/mmcblk0p1
mkfs.ext4 /dev/mmcblk0p2

# 挂载目标分区
mkdir -p /mnt/mmcblk0p1 /mnt/mmcblk0p2
mount /dev/mmcblk0p1 /mnt/mmcblk0p1
mount /dev/mmcblk0p2 /mnt/mmcblk0p2

# 复制系统文件到目标分区
echo "复制系统文件到目标分区..."
cp -r /boot/* /mnt/mmcblk0p1/
cp -r /basefs-ac5x-ac5y.tgz /mnt/mmcblk0p2/
cd /mnt/mmcblk0p2/
tar xvf basefs-ac5x-ac5y.tgz

# 取消挂载目标分区
umount /mnt/mmcblk0p1
umount /mnt/mmcblk0p2

2.4 设置uboot环境变量

重启系统，进入uboot界面，设置环境变量使其从MMC引导系统。

Marvell>> setenv bootmmc 'usb reset;ext4load mmc 0:1 $kernel_addr_r boot/Image;ext4load mmc 0:1 $fdt_addr_r boot/cn9130-db-A.dtb;booti $kernel_addr_r - $fdt_addr_r'
Marvell>> setenv bootargs 'console=ttyS0,115200 earlycon=uart8250,mmio32,0xf0512000 root=/dev/mmcblk0p2 rootwait  rw pci=pcie_bus_safe cpuidle.off=1'
Marvell>> setenv bootcmd 'run bootmmc'
Marvell>> saveenv

2.5 从DPU硬盘引导系统

Marvell>> run bootmmc

从DPU硬盘启动后可以正常进入操作系统，进入系统后进行测试确认系统工作状态正常，到此完流程系统安装的所有流程，拔掉U盘。

3 附录

3.1 环境变量解释

3.1.1 setenv bootusb

setenv bootusb：设置bootusb的环境变量；
'usb reset;ext4load usb 0:1 $kernel_addr_r boot/Image;ext4load usb 0:1 $fdt_addr_r boot/cn9130-9130-102.dtb;booti $kernel_addr_r - $fdt_addr_r'

· usb reset：重置USB，确保USB设备可以读取；
· ext4load usb 0:1 $kernel_addr_r boot/Image：从USB设备的第一个分区中加载Linux内核文件Image到内存中的地址$kernel_addr_r；
· ext4load usb 0:1 $fdt_addr_r boot/cn9130-9130-102.dtb：从USB设备的第一个分区中 加载cn9130设备树cn9130-9130-102.dtb到内存中地址$fdt_addr_r；
· booti $kernel_addr_r - $fdt_addr_r：启动Linux内核，$kernel_addr_r是内核文件在  内存中的地址，$fdt_addr_r是设备树文件在内存中的地址。
命令作用：从USB设备加载Linux内核和设备树文件到内存中，然后启动Linux内核。

3.1.2 setenv bootarg

setenv bootargs：设置bootargs的环境变量。
· console=ttyS0,115200：指定了系统的控制台设备为串口0（ttyS0），波特率为115200，表示系统的输出会通过串口0进行，波特率为115200；
· earlycon=uart8250,mmio32,0xf0512000：指定了早期控制台（early console），用于在Linux内核启动早期输出信息到串口。使用了UART8250控制器，并指定MMIO地址为0xf0512000；
· root=/dev/sda2：指定了根文件系统的位置，这里设定为/dev/sda2；
· phy-mode=sgmii：设置物理层传输模式为SGMII（SerDes Gigabit Media Independent Interface），这是一种千兆以太网的物理层传输模式；
· rootwait：在根文件系统挂载之前等待根设备就绪；
· rw：将根文件系统以读写模式挂载；
· pci=pcie_bus_safe：用于PCI子系统的配置，pcie_bus_safe表示在PCI Express总线上运行时采用了安全的探测方式；
· cpuidle.off=1：禁用CPU空闲状态管理（CPU Idle）。CPU空闲状态管理是一种节能机制，通过降低CPU的功耗来减少电能消耗，但在某些情况下可能会引起问题，这个参数用来禁用这个功能。

3.1.3 setenv bootcmd

setenv bootcmd：设置 bootcmd 环境变量；
· 'run bootusb'：bootcmd 环境变量的值。告诉系统在启动时运行bootusb这个命令。使用了run命令来执行之前设置好的bootusb环境变量中的命令。
命令作用：是设置系统启动时要执行的命令序列为bootusb。

3.1.4 saveenv

保存环境变量

配置指导：基于Kubeadm安装kubernetes集群

1 Kubernetes简介
2 Kubernetes功能
3 Kubernetes集群角色
4 Kubernetes架构
5 Kubernetes安装环境
6 基础环境部署
7 安装docker（所有节点）
8 升级系统内核（所有节点）
9 Kubernetes组件安装（所有节点）
10 初始化集群（master节点）
11 部署容器网络
12 测试Kubernetes集群
13 部署Dashboard

1 Kubernetes简介

Kubernetes是一个轻便和可扩展的开源云平台，用于管理容器化应用和服务。通过Kubernetes能够进行应用的自动化部署以及扩容和缩容等操作。在Kubernetes中，可以将组成应用的容器结合成一个逻辑单元，更易于管理和发现。

2 Kubernetes功能

自动装箱

基于容器对应用运行环境的资源配置要求自动部署应用容器。

自我修复

当容器失败时，会对容器进行重启；当所部署的Node节点出现问题时，会对容器进行重新部署和重新调度；当容器未通过监控检查时，会关闭此容器，直到容器正常运行时，才会对外提供服务。

水平扩展

通过简单的命令，对应用容器进行规模扩大或剪裁。

服务发现

用户不需要使用额外的服务发现机制就能够基于Kubernetes自身能力实现服务的发现和负载均衡。

滚动更新

可以根据应用的变化，对应用容器的应用进行一次性或批量更新。

版本回退

可以根据应用部署情况，对应用容器运行的应用，进行历史版本即时回退。

密钥和配置管理

在不需要重新构建镜像情况下，可以部署和更新密钥以及应用配置。

存储编排

自动实现存储系统挂载及应用，尤其对有状态应用实现数据持久化十分重要。存储系统可以来自本地目录、网络存储（NFS、Gluster、Ceph、Cinder等）、公共云存储等。

3 Kubernetes集群角色

Master Node

集群控制节点，对集群进行调度管理，接收集群外用户操作请求，由API Server、Scheduler、Cluster State Store(ETCD数据库)和Controller Server组成。

Worker Node

集群工作节点，运行用户业务应用容器，由Kubelet、Kube Proxy和Container Runtime组成。

4 Kubernetes架构

架构说明：

Etcd

保存整个集群的状态。

API Server

提供了资源操作的唯一入口，并提供认证、授权、访问控制、API注册和发现等机制。

Controller Manager

负责维护集群的状态，如故障检测、自动扩展、滚动更新等。

Scheduler

负责资源的调度，按照预定的调度策略将Pod调度到相应的机器上。

Kubelet

负责维护容器的生命周期、Volume(CVI) 和网络(CNI)的管理。

Container Runtime

负责镜像管理以及Pod和容器的真正运行（CRI）。

Kube-proxy

负责为Service提供Cluster内部的服务发现和负载均衡（四层）。

5 Kubernetes安装环境

本次部署三个节点，一个master节点，两个worker节点，如表5-1。

节点	系统	网卡：eth0
master1	Centos7.6	10.0.0.100
worker1	Centos7.6	10.0.0.101
worker2	Centos7.6	10.0.0.102

表5-1：安装环境

配置说明：

master1

内存：16G
CPU：双核双线程，虚拟化开启
硬盘：300G

worker1/2

内存：16G
CPU：双核双线程，虚拟化开启
硬盘：300G

6 基础环境部署

6.1 修改主机名（所有节点）

master节点

hostnamectl set-hostname master1

worker1节点

hostnamectl set-hostname worker1

worker2节点

hostnamectl set-hostname worker2

6.2 配置域名解析（所有节点）

vi etc/hosts
10.0.0.100 master1
10.0.0.101 worker1
10.0.0.102 worker2

6.3 关闭防火墙与SELINUX（所有节点）

systemctl stop firewalld
systemctl disable firewalld
vi /etc/selinux/config
SELINUX=disabled
reboot

6.4 关闭swap分区（所有节点）

使用Kubeadm部署时必须关闭swap分区，此处采用将swap分区注释掉方式。

vi /etc/fstab
#/dev/mapper/centos-swap swap         swap    defaults         0 0
reboot

6.5 配置时间同步（所有节点）

master节点与worker节点的时间需要同步，否则可能会出现意外问题。

master1节点

yum install -y chrony
vi /etc/chrony.conf 
allow 10.0.0.0/24
systemctl enable chronyd.service
systemctl start chronyd.service

worker1/worker2节点

yum install -y chrony
vi /etc/chrony.conf 
server 10.0.0.100 iburst

设置开机自启并启动

systemctl enable chronyd.service
systemctl start chronyd.service

6.6 配置优化（所有节点）

添加网桥过滤及地址转发，实现内核的过滤

vi /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

加载br_netfilter模块

modprobe br_netfilter

加载网桥过滤配置文件

sysctl -p /etc/sysctl.d/k8s.conf

所有节点开启ipvs

sysctl -p /etc/sysctl.d/k8s.conf

安装软件ipset和ipvsadm

yum install -y ipset ipvsadm

添加需要加载的模块

cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

添加权限并应用

chmod 777 /etc/sysconfig/modules/ipvs.modules
sh /etc/sysconfig/modules/ipvs.modules

7 安装docker（所有节点）

7.1 安装docker依赖包

yum install -y yum-utils device-mapper-persistent-data lvm2

7.2 设置阿里镜像源

yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

7.3 安装指定版本docker

yum -y install docker-ce-20.10.12-3.el7

7.4 设置开机自启并启动

systemctl enable docker
systemctl start docker

7.5 修改配置文件

vi /etc/docker/daemon.json
{
     "exec-opts": ["native.cgroupdriver=systemd"]
}

8 升级系统内核（所有节点）

由于CentOS 7.x 系统自带的3.10.x内核存在一些Bug，导致运行的Docker和Kubernetes不稳定，因此需要将系统内核升级至最新版本，升级步骤如下（如内核已是新版则跳过此步骤）。

安装工具wget和unzip

yum install -y curl wget unzip

导入ELRepo仓库的公共密钥

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

安装ELRepo仓库的yum源

rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

安装最新版本内核

yum --enablerepo=elrepo-kernel install kernel-ml

设置新的内核为grub2的默认版本

grub2-set-default 0

生成grub配置文件并重启

grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

9 Kubernetes组件安装（所有节点）

Kubernetes组件包含Kubeadm、Kubelet、Kubectl，功能如下。

Kubeadm

初始化集群、管理集群等。

Kubelet

接收api-server指令，对Pod生命周期进行管理。

Kubectl

集群命令行管理工具。

9.1 配置Kubernetes的yum源

cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

9.2 安装组件

安装组件Kubeadm，Kubelet和Kubectl并指定版本。

yum makecache fast
yum install -y kubelet-1.21.3 kubeadm-1.21.3 kubectl-1.21.3
systemctl enable kubelet

修改配置文件。

vi /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"

10 初始化集群（master节点）

10.1 master节点初始化

在master节点上的任意路径下输入执行以下命令。

kubeadm init \
  --apiserver-advertise-address=10.0.0.100 \
  --image-repository registry.aliyuncs.com/google_containers \
  --kubernetes-version v1.21.3 \
  --service-cidr=10.96.0.0/12 \
  --pod-network-cidr=10.244.0.0/16 \
  --ignore-preflight-errors=all

参数说明：

apiserver-advertise-address

集群通告地址。

image-repository

由于默认拉取镜像地址k8s.gcr.io国内无法访问，这里指定阿里云镜像仓库地址。

kubernetes-version

Kubernetes版本，与上面安装的一致。

service-cidr

集群内部虚拟网络，Pod统一访问入口。

pod-network-cidr

Pod网络，与下面部署的CNI网络组件yaml中保持一致。

正常初始化后，会提示下图10-1中的内容，并将箭头所指处复制到本地。

根据提示配置如下内容。

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

10.2 镜像准备

查看集群使用的容器镜像。

[root@master1 ~]# kubeadm config images list
I0608 09:54:34.987170    2894 version.go:254] remote version is much newer: v1.24.1; falling back to: stable-1.21
registry.aliyuncs.com/google_containers/kube-apiserver:v1.21.13
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.21.13
registry.aliyuncs.com/google_containers/kube-scheduler:v1.21.13
registry.aliyuncs.com/google_containers/kube-proxy:v1.21.13
registry.aliyuncs.com/google_containers/pause:3.4.1
registry.aliyuncs.com/google_containers/etcd:3.4.13-0
registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.0
[root@master1 ~]#

生成脚本。

kubeadm config images list >> image.list

编辑脚本。

vi image.list
#!/bin/bash
img_list='registry.aliyuncs.com/google_containers/kube-apiserver:v1.21.3
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.21.3
registry.aliyuncs.com/google_containers/kube-scheduler:v1.21.3
registry.aliyuncs.com/google_containers/kube-proxy:v1.21.3
registry.aliyuncs.com/google_containers/pause:3.4.1
registry.aliyuncs.com/google_containers/etcd:3.4.13-0
registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.0'
for img in ${img_list}
do
          docker pull $img
done

执行脚本。

sh image.list

10.3 worker节点加入集群

将图10.1所指地方配置到worker1和worker2节点上。

kubeadm join 10.0.0.100:6443 --token wq53fj.x28gsb67wd3josc4 \
  --discovery-token-ca-cert-hash sha256:ecabaf79ece2225a8d52b0febe03001ad512ada9dd8b26926161a85a341ac6f9

master节点查看集群。

kubectl get nodes

11 部署容器网络

本文档使用Calico部署容器网络，Calico是一个纯三层的数据中心网络方案，是目前Kubernetes主流的网络方案，步骤如下。

下载yaml

wget https://docs.projectcalico.org/manifests/calico.yaml

应用calico.yaml

kubectl apply -f calico.yaml

查看部署进度，全部为running后则正常

[root@master1 ~]# kubectl get pods -n kube-system
NAME                                             READY   STATUS    RESTARTS   AGE
calico-kube-controllers-685b65ddf9-hgjx7   1/1     Running   0          148m
calico-node-8ngrz                               1/1     Running   0          148m
calico-node-p2lc9                               1/1     Running   0          148m
calico-node-r2tkg                               1/1     Running   0          148m
coredns-59d64cd4d4-fqfq9                       1/1     Running   0          163m
coredns-59d64cd4d4-zcph8                       1/1     Running   0          163m
etcd-master1                                     1/1     Running   0          163m
kube-apiserver-master1                         1/1     Running   0          163m
kube-controller-manager-master1              1/1     Running   0          163m
kube-proxy-lszzs                                1/1     Running   0          162m
kube-proxy-pbjhs                                1/1     Running   0          163m
kube-proxy-wjl7x                                1/1     Running   0          162m
kube-scheduler-master1                         1/1     Running   0          163m 
[root@master1 ~]#

12 测试Kubernetes集群

在Kubernetes集群中创建一个Pod，验证是否正常运行。

kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get pod,svc

结果如下图12-1。

验证Kubernetes可以保持一定个数容器运行的功能，此处把replicas修改为3如下。

kubectl scale --replicas=3 deployment/nginx
kubectl get pod -o wide

尝试删除一个正在运行中的Pod。

kubectl delete pod nginx-6799fc88d8-dcxql

图12-3 删除一个Pod

再次查看Pod数量。

kubectl get pod -o wide

可以看到之前的ip为10.244.235.136的Pod已经被删除，并产生了新的Pod，ip为10.244.235.137，说明Kubernetes功能正常。

检查各ip的连通性。

ping各Pod的ip：10.244.235.141、10.244.235.140、10.244.189.72，如下图12-5、12-6、12-7。

curl service的ip：10.107.23.235，如图12-8。

curl node:ip测试，如图12-9、12-10、12-11。

Pod与Pod连通性测试，如图12-12。

检查DNS解析可用性，如图12-13。

访问地址：http://<任意node的ip>:port，此处访问：10.0.0.101:31339，结果如图12-14。

测试结果：无异常。

13 部署Dashboard

Dashboard是官方提供的一个UI，可用于管理Kubernetes资源。

master节点输入如下命令。

wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml

默认Dashboard只能集群内部访问，可以修改recommended.yaml文件中Service类型为nodeport，方便集群外的机器访问。

kind: Service
apiVersion: v1
metadata:
labels:
  k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kubernetes-dashboard
spec:
ports:
  - port: 443
    targetPort: 8443
    nodePort: 30443
selector:
  k8s-app: kubernetes-dashboard
type: NodePort

再次输入如下命令。

kubectl apply -f recommended.yaml
kubectl get pods -n kubernetes-dashboard

待所有Pod处于running的状态后，创建service account并绑定默认cluster-admin管理员集群角色。

kubectl create serviceaccount dashboard-admin -n kube-system
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')

访问地址：https://<任意node的ip>:30443，将上条命令产生的token复制后填入，进行登录，如图13-1，13-2。

至此一个可用的kubernetes集群安装完毕。

更多内容请参考：A-Lab

功能验证：CX-N交换机与部分国产RoCE网卡组网的HPC场景测试

1 目标与物理网络拓扑

本文主要描述如何在光润通国产100G RoCEv2网卡（以下简称GRT）和飞迈瑞克国产100G RoCEv2网卡（以下简称Femrice）搭建的网络上针对HPC场景进行性能/时延测试，具体方案如下：

E2E转发测试

测试两款国产网卡在相同拓扑E2E（End to End）的转发时延和带宽，本次方案测试点采用Perftest通信测试工具包进行发包，测试过程遍历2~8388608字节。

HPC应用测试

本次测试方案在相同场景下运行HPC应用，比较GTP和Femrice两款国产网卡的运行速度（时间更短）。

1.1 GRT物理拓扑

如上解决方案的IB交换机物理拓扑，如图1所示：

图1：GRT网卡物理网络拓扑

1.2 Femrice物理拓扑

如上解决方案的Femrice物理拓扑，如图2所示：

图2：Femrice网卡物理网络拓扑

1.3 管理口IP规划

测试过程中涉及到设备的管理网口和业务口的的IP地址如表1所示：

设备名称	接口	IP地址	备注
Server1	管理口	192.168.4.144	/
	业务口ens1f0	100.0.1.10	GRT网卡RoCEv2模式直连
	业务口ens1f1	100.0.2.10	Femrice网卡RoCEv2模式直连
Server2	管理口	192.168.4.145	/
	业务口ens1f0	100.0.1.11	GRT网卡RoCEv2模式直连
	业务口ens1f1	100.0.2.11	Femrice网卡RoCEv2模式直连

表1：管理口和业务口IP规划

2 硬件与软件环境

部署环境中涉及到的硬件和软件如表2和表3所示：

名称	型号	硬件指标	数量	备注
服务器	x86	Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz，48核内存：128G	2	需安装100G网卡
光模块	100G	QSFP28	4	无
光纤	多模	100G适用	2	无
Femrice网卡	FM- E810CAM2-QF2	Interl E810-C	2	/
GRT	F1102E-v4.0	Interl E810-C	2	/

表2：硬件环境

名称	版本	备注
操作系统	CentOS Linux release 7.8.2003 (Core)	无
内核	3.10.0-1127.18.2.el7.x86_64	无
Intel网卡驱动	ice-1.9.11	https://www.intel.cn/
RDMA网卡驱动	irdma-1.11.16.6	https://www.intel.cn/
WRF	WRFV4.0	https://www2.mmm.ucar.edu
LAMMPS	LAMMPS（3 Mar 2020）	https://github.com/lammps/lammps/
Perftest	V4.5-0.20	https://github.com/linux-rdma/perftest

表3：软件环境

3 测试环境部署

在两台Server服务器上，安装部署HPC两种测试场景所需的基础环境。

补充说明：以”[root@server ~]#”为开头的命令表示两台服务器都要执行。

3.1 网卡驱动部署

在两台Server服务器上安装网卡所需的ice和irdma驱动程序以及Perftest测试工具集，网卡驱动安装完成之后检查网卡及驱动状态，确保网卡可以正常使用。

3.1.1 网卡ice驱动程序安装

[root@Server ~]# wget https://downloadmirror.intel.com/763930/ice-1.9.11.tar.gz
[root@Server ~]# tar zxf  ice-1.9.11.tar.gz
[root@Server ~]# cd ice-1.9.11/src/
[root@Server src]# make install 
[root@Server src]# modinfo ice
[root@Server src]# modprobe ice
[root@Server src]# ethtool -i ens1f0
driver: ice
version: 1.9.11
firmware-version: 3.20 0x8000d84c 1.3146.0
expansion-rom-version: 
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

3.1.2 网卡irdma驱动程序安装

[root@Server ~]# wget https://downloadmirror.intel.com/763932/irdma-1.11.16.6.tgz
[root@Server ~]# tar zxf  irdma-1.11.16.6.tgz
[root@Server ~]# cd irdma-1.11.16.6/
[root@Server irdma-1.11.16.6]# ./build
[root@Server irdma-1.11.16.6]# modprobe irdma
[root@Server ~]# wget https://github.com/linux-rdma/rdma-core/release/download/v42.0/rdma-core-42.0.tar.gz
[root@Server ~]# tar -xzvf rdma-core-42.0.tar.gz
[root@Server ~]# cd rdma-core-42.0/
[root@Server rdma-core-42.0]# patch -p2 < /root/ ice-1.9.11/libirdma-42.0.patch
[root@Server rdma-core-42.0]# cd ..
[root@Server ~]# chgrp -R root rdma-core-42.0/redhat
[root@Server ~]# chgrp -R root rdma-core-42.0/redhat
[root@Server ~]# mkdir -p ~/rpmbuild/SOURCES
[root@Server ~]# mkdir -p ~/rpmbuild/SPECS
[root@Server ~]# cp rdma-core-42.0.tgz ~/rpmbuild/SOURCES/
[root@Server SOURCES]# cd ~/rpmbuild/SOURCES
[root@Server SOURCES]# tar -xzvf rdma-core-42.0.tgz
[root@Server SOURCES]# cp ~/rpmbuild/SOURCES/rdma-core-42.0/redhat/rdma-core.spec ~/rpmbuild/SPECS/
[root@Server SPECS]# cd ~/rpmbuild/SPECS/
[root@Server SPECS]# rpmbuild -ba rdma-core.spec
[root@Server SPECS]# cd ~/rpmbuild/RPMS/x86_64
[root@Server x86_64]# yum install *42.0*.rpm

3.1.2.1 Perfest性能测试工具集

[root@Server ~]# git clone https://github.com/linux-rdma/perftest.git
[root@Server ~]# cd perftest
[root@Server perftest]# ./autogen.sh
[root@Server perftest]# ./configure
[root@Server perftest]# make
[root@Server perftest]# make install

4 WRF运行环境部署

4.1 安装环境准备

4.1.1 创建文件目录

[root@Server1 ~]# cd /data/home/wrf01/202302test/
[root@Server1 202302test]# mkdir Build_WRF
[root@Server1 202302test]# mkdir TESTS

4.1.2 安装编译器

[root@Server1 ~]# yum -y install gcc cpp gcc-gfortran gcc-g++ m4 make csh

4.1.3 添加环境变量

[root@Server1 ~]# vi ~/.bashrc
export DIR=/data/home/wrf01/202302test/Build_WRF/LIBRARIES
export CC=gcc
export CXX=g++
export FC=gfortran
export CFLAGS='-m64'
export F77=gfortran
export FFLAGS='-m64'
export PATH=$DIR/mpich/bin:$PATH
export PATH=$DIR/netcdf/bin:$PATH
export NETCDF=$DIR/netcdf
export JASPERLIB=$DIR/grib2/lib
export JASPERINC=$DIR/grib2/include
export LDFLAGS=-L$DIR/grib2/lib
export CPPFLAGS=-I$DIR/grib2/include
export LD_LIBRARY_PATH=$DIR/grib2/lib:$LD_LIBRARY_PATH
[root@Server1 ~]# source ~/.bashrc

4.2 安装三方依赖库

4.2.1 创建文件目录

[root@Server1 ~]# cd /data/home/wrf01/202302test/Build_WRF
[root@Server1 Build_WRF]# mkdir LIBRARIES

4.2.2 下载第三方库

[root@Server1 Build_WRF]# wget \ https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/zlib-1.2.7.tar.gz
[root@Server1 Build_WRF]# wget \ https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/mpich-3.0.4.tar.gz
[root@Server1 Build_WRF]# wget \ https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/netcdf-4.1.3.tar.gz
[root@Server1 Build_WRF]# wget \ https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/jasper-1.900.1.tar.gz
[root@Server1 Build_WRF]# wget \ https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/libpng-1.2.50.tar.gz

4.2.3 编译安装zlib

[root@Server1 Build_WRF]# tar xzvf zlib-1.2.7.tar.gz 
[root@Server1 Build_WRF]# cd zlib-1.2.7    
[root@Server1 zlib-1.2.7]# ./configure --prefix=$DIR/grib2
[root@Server1 zlib-1.2.7]# make
[root@Server1 zlib-1.2.7]# make install

4.2.4 编译安装libpng

[root@Server1 Build_WRF]# tar xzvf libpng-1.2.50.tar.gz
[root@Server1 Build_WRF]# cd  libpng-1.2.50
[root@Server1 libpng-1.2.50]# ./configure --prefix=$DIR/grib2
[root@Server1 libpng-1.2.50]# make
[root@Server1 libpng-1.2.50]# make install

4.2.5 编译安装mpich

[root@Server1 Build_WRF]# tar xzvf mpich-3.0.4.tar.gz 
[root@Server1 Build_WRF]# cd  mpich-3.0.4
[root@Server1 mpich-3.0.4]# ./configure --prefix=$DIR/mpich
[root@Server1 mpich-3.0.4]# make
[root@Server1 mpich-3.0.4]# make install

4.2.6 编译安装jasper

[root@Server1 Build_WRF]# tar xzvf jasper-1.900.1.tar.gz 
[root@Server1 Build_WRF]# cd  jasper-1.900.1
[root@Server1 jasper-1.900.1]# ./configure --prefix=$DIR/grib2
[root@Server1 jasper-1.900.1]# make
[root@Server1 jasper-1.900.1]# make install

4.2.7 编译安装netcdf

[root@Server1 Build_WRF]# tar xzvf netcdf-4.1.3.tar.gz
[root@Server1 Build_WRF]# cd  netcdf-4.1.3
[root@Server1 netcdf-4.1.3]# ./configure --prefix=$DIR/netcdf \
--disable-dap --disable-netcdf-4 --disable-shared
[root@Server1 netcdf-4.1.3]# make
[root@Server1 netcdf-4.1.3]# make install

4.2.8 依赖库测试

[root@Server1 Build_WRF]# cd TESTS
[root@Server1 TESTS]# wget \ https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/Fortran_C_NETCDF_MPI_tests.tar
[root@Server1 TESTS]# tar -xf Fortran_C_NETCDF_MPI_tests.tar

测试Fortran+C+NetCDF:
[root@Server1 TESTS]# cp ${NETCDF}/include/netcdf.inc .
[root@Server1 TESTS]# gfortran -c 01_fortran+c+netcdf_f.f
[root@Server1 TESTS]# gcc -c 01_fortran+c+netcdf_c.c
[root@Server1 TESTS]# gfortran 01_fortran+c+netcdf_f.o \  01_fortran+c+netcdf_c.o \-L${NETCDF}/lib -lnetcdff -lnetcdf
[root@Server1 TESTS]# ./a.out

测试Fortran+C+NetCDF+MPI:
[root@Server1 TESTS]# cp ${NETCDF}/include/netcdf.inc .
[root@Server1 TESTS]# mpif90 -c 02_fortran+c+netcdf+mpi_f.f
[root@Server1 TESTS]# mpicc -c 02_fortran+c+netcdf+mpi_c.c
[root@Server1 TESTS]# mpif90 02_fortran+c+netcdf+mpi_f.o 02_fortran+c+netcdf+mpi_c.o -L${NETCDF}/lib -lnetcdff -lnetcdf
[root@Server1 TESTS]# mpirun ./a.out

4.3 安装WRF

4.3.1 下载WRFV4.0

[root@Server1 ~]# cd /data/home/wrf01/202302test/Build_WRF
[root@Server1 Build_WRF]# wget \ https://www2.mmm.ucar.edu/wrf/src/WRFV4.0.TAR.gz
[root@Server1 Build_WRF]# tar xzvf WRFV4.0.TAR.gz
[root@Server1 Build_WRF]# cd WRF

4.3.2 安装WRF

[root@Server1 WRF]# ./configure

[root@Server1 WRF]# ./compile
[root@Server1 WRF]# ls -ls main/*.exe

4.4 安装WPS

4.4.1 下载WPSV4.0

[root@Server1 ~]# cd /data/home/wrf01/202302test/Build_WRF
[root@Server1 Build_WRF]# wget \
https://www2.mmm.ucar.edu/wrf/src/WPSV4.0.TAR.gz
[root@Server1 Build_WRF]# tar xzvf WRFV4.0.TAR.gz
[root@Server1 Build_WRF]# cd WPS
[root@Server1 WPS]# ./clean

4.4.2 修改intmath.f文件

[root@Server1 WPS]# cat ./ungrib/src/ngl/g2/intmath.f

4.4.3 安装WPS

[root@Server1 WPS]# ./configure
Enter selection [1-40] : 1
[root@Server1 WPS]# ./compile
[root@Server1 WPS]# ls -las *.exe

max_dom = 1,
 start_date = '2000-01-24_12:00:00',
 end_date   = '2000-01-26_00:00:00',
 interval_seconds = 21600
 io_form_geogrid = 2,
/

&geogrid
 parent_id         =   1,   1,
 parent_grid_ratio =   1,   3,
 i_parent_start    =   1,  31,
 j_parent_start    =   1,  17,
 e_we              =  104, 142,
 e_sn              =  61,  97,
geog_data_res = '10m','2m',
 dx = 30000,
 dy = 30000,
 map_proj = 'lambert',
 ref_lat   =  34.83,
 ref_lon   = -81.03,
 truelat1  =  30.0,
 truelat2  =  60.0,
 stand_lon = -98.0,
 geog_data_path = '/data/home/wrf01/202302test/Build_WRF/WPS_GEOG/WPS_GEOG/'
/

&ungrib
 out_format = 'WPS',
 prefix = 'FILE',
/

&metgrid
 fg_name = 'FILE'
 io_form_metgrid = 2, 
/

4.4.4 下载静态地理数据

[root@Server1 ~]# cd /data/home/wrf01/202302test/Build_WRF
[root@Server1 Build_WRF]# mkdir WPS_GEOG
下载链接：https://www2.mmm.ucar.edu/wrf/users/download/get_sources_wps_geog.html

4.5 WRF可执行文件

4.5.1 下载WPSV4.0

[root@Server1 ~]# cd /data/home/wrf01/202302test/Build_WRF
[root@Server1 Build_WRF]# mkdir DATA
[root@Server1 Build_WRF]# vi WRF/test/em_real/namelist.input
&time_control
 run_days                            = 0,
 run_hours                           = 36,
 run_minutes                         = 0,
 run_seconds                         = 0,
 start_year                          = 2000, 2000, 2000,
 start_month                         = 01,   01,   01,
 start_day                           = 24,   24,   24,
 start_hour                          = 12,   12,   12,
 end_year                            = 2000, 2000, 2000,
 end_month                           = 01,   01,   01,
 end_day                             = 26,   25,   25,
 end_hour                            = 00,   12,   12,
 interval_seconds                    = 21600
 input_from_file                     = .true.,.true.,.true.,
 history_interval                    = 180,  60,   60,
 frames_per_outfile                  = 1000, 1000, 1000,
 restart                             = .false.,
 restart_interval                    = 5000,
 io_form_history                     = 2
 io_form_restart                     = 2
 io_form_input                       = 2
 io_form_boundary                    = 2
 /

 &domains
 time_step                           = 180,
 time_step_fract_num                 = 0,
 time_step_fract_den                 = 1,
 max_dom                             = 1,
 e_we                                = 104,    142,   94,
 e_sn                                = 61,    97,    91,
 e_vert                              = 34,    34,    34,
 p_top_requested                     = 4500,
 num_metgrid_levels                  = 27,
 num_metgrid_soil_levels             = 2,
 dx                                  = 30000, 10000,  3333.33,
 dy                                  = 30000, 10000,  3333.33,
 grid_id                             = 1,     2,     3,
 parent_id                           = 0,     1,     2,
 i_parent_start                      = 1,     31,    30,
 j_parent_start                      = 1,     17,    30,
 parent_grid_ratio                   = 1,     3,     3,
 parent_time_step_ratio              = 1,     3,     3,
 feedback                            = 1,
 smooth_option                       = 0
 /

 &physics
 physics_suite                       = 'CONUS'
 mp_physics                          = -1,    -1,    -1,
 cu_physics                          = -1,    -1,     0,
 ra_lw_physics                       = -1,    -1,    -1,
 ra_sw_physics                       = -1,    -1,    -1,
 bl_pbl_physics                      = -1,    -1,    -1,
 sf_sfclay_physics                   = -1,    -1,    -1,
 sf_surface_physics                  = -1,    -1,    -1,
 radt                                = 30,    30,    30,
 bldt                                = 0,     0,     0,
 cudt                                = 5,     5,     5,
 icloud                              = 1,
 num_land_cat                        = 21,
 sf_urban_physics                    = 0,     0,     0,
 /

 &fdda
 /

 &dynamics
 hybrid_opt                          = 2, 
 w_damping                           = 0,
 diff_opt                            = 1,      1,      1,
 km_opt                              = 4,      4,      4,
 diff_6th_opt                        = 0,      0,      0,
 diff_6th_factor                     = 0.12,   0.12,   0.12,
 base_temp                           = 290.
 damp_opt                            = 3,
 zdamp                               = 5000.,  5000.,  5000.,
 dampcoef                            = 0.2,    0.2,    0.2
 khdif                               = 0,      0,      0,
 kvdif                               = 0,      0,      0,
 non_hydrostatic                     = .true., .true., .true.,
 moist_adv_opt                       = 1,      1,      1,     
 scalar_adv_opt                      = 1,      1,      1,     
 gwd_opt                             = 1,
 /

 &bdy_control
 spec_bdy_width                      = 5,
 specified                           = .true.
 /

 &grib2
 /

 &namelist_quilt
 nio_tasks_per_group = 0,
 nio_groups = 1,
 /

4.5.2 生成地理数据

[root@Server1 WPS]# ./geogrid.exe
[root@Server1 WPS]# ls -lah geo_em.d01.nc

4.5.3 下载并链接气象数据

气象数据下载网址：https://rda.ucar.edu/。

[root@Server1 Build_WRF]# mkdir DATA
[root@Server1 Build_WRF]# ls -lah ./DATA/JAN00/fnl*

[root@Server1 Build_WRF]# cd WPS
[root@Server1 WPS]# ./link_grib.csh ../DATA/JAN00/fnl
[root@Server1 WPS]# ln -sf ungrib/Variable_Tables/Vtable.GFS Vtable
[root@Server1 WPS]# ./ungrib.exe
[root@Server1 WPS]# ls -lah FILE*

4.5.4 融合气象和地理数据

[root@Server1 WPS]# ./metgrid.exe

4.5.5 链接WPS到WRF

[root@Server1 WPS]#  cd ../WRF/test/em_real/
[root@Server1 em_real]# ln -sf ~/Build_WRF/WPS/met_em* .
[root@Server1 em_real]# mpirun -np 1 ./real.exe
[root@Server1 em_real]# ls -alh wrfbdy_d01 wrfinput_d01

5 GRT国产100G RoCEv2网卡

5.1 E2E转发测试

配置网卡工作模式RoCEv2，使用ib_read_lat和ib_read_bw工具在服务器Server1上建立发包服务端，在Server2上建立发包客户端，测试GRT网卡直连情况下的带宽和时延。

5.1.1 基础配置

[root@Server ~]# rmmod irdma
[root@Server ~]# modprobe irdma roce_ena=1
[root@Server ~]# ibv_devices
    device          	   node GUID
    ------          	----------------
    rdmap2s0f0      	5a53c0fffe790004
irdma1          	5a53c0fffe790005
[root@Server ~]# ibv_devinfo rdmap2s0f0
 
[root@Server1 ~]# ifconfig ens1f0 100.0.1.10 up
[root@Server2 ~]# ifconfig ens1f0 100.0.1.11 up

[root@Server1 ~]# ifconfig ens1f0 100.0.1.10 up
[root@Server2 ~]# ifconfig ens1f0 100.0.1.11 up

5.1.2 GRT网卡直连

[root@Server1 ~]# ib_read_lat -R -d rdmap2s0f0 -F --report_gbits -a
[root@Server2 ~]# ib_read_lat -a -R -x 5 -d rdmap2s0f0 -F -f 2 100.0.1.10
[root@Server1 ~]# ib_read_bw -R -d rdmap2s0f0 -F --report_gbits -a
[root@Server2 ~]# ib_read_bw -a -R -x 5 -d rdmap2s0f0 -F -f 2 100.0.1.10

5.2 HPC应用测试

在两台服务器上使用WRF开源气象模拟软件和LAMMPS高分子计算进行数据测试，测试GTR国产网卡完成并行计算运行所需时间。

5.2.1 WRF

使用两台服务器每台12个核心总计24个核心并发运行WRF应用，服务器之间GRT网卡RoCEv2模式直连。

[root@Server1 em_real]# time /usr/mpi/gcc/openmpi-4.1.5a1/bin/mpirun -np 24 -oversubscribe --allow-run-as-root \
--host 100.0.1.10,100.0.1.11  ./wrf.exe

5.2.2 LAMMPS

使用两台服务器每台12个核心总计24个核心并发运行LAMMPS应用，服务器之间GRT网卡RoCEv2模式直连。

[root@Server1 ~]# cd ~/lammps/lammps-stable_3Mar2020/examples/shear
[root@server1 ~]# time /usr/mpi/gcc/openmpi-4.1.5a1/bin/mpirun --allow-run-as-root -np 24 –oversubscribe \
--host 100.0.1.10,100.0.1.11 lmp_mpi \
< /root/lammps/lammps-3Mar20/examples/shear/in.shear

6 Femrice国产100G RoCEv2网卡

6.1 E2E转发测试

6.1.1 基础配置

[root@Server ~]# rmmod irdma
[root@Server ~]# modprobe irdma roce_ena=1
[root@Server ~]# ibv_devices
    device          	   node GUID
    ------          	----------------
    rdmap3s0f0      	5a53c0fffe7608ea
rdmap3s0f1      	5a53c0fffe7608eb 
[root@Server ~]# ibv_devinfo rdmap3s0f0

[root@Server1 ~]# ifconfig ens1f1 100.0.2.10 up
[root@Server2 ~]# ifconfig ens1f1 100.0.2.11 up

6.1.2 GRT网卡直连

[root@Server1 ~]# ib_read_lat -R -d rdmap2s0f0 -F --report_gbits -a
[root@Server2 ~]# ib_read_lat -a -R -x 5 -d rdmap3s0f0 -F -f 2 100.0.2.10
[root@Server1 ~]# ib_read_bw -R -d rdmap2s0f0 -F --report_gbits -a
[root@Server2 ~]# ib_read_bw -a -R -x 5 -d rdmap3s0f0 -F -f 2 100.0.2.10

6.2 HPC应用测试

在两台服务器上使用WRF开源气象模拟软件和LAMMPS高分子计算软件进行数据测试，测试Femrice国产网卡完成并行计算运行所需时间。

6.2.1 LAMMPS

使用两台服务器每台12个核心总计24个核心并发运行LAMMPS应用，服务器之间Femrice网卡RoCEv2模式直连。

[root@Server1 ~]# cd ~/lammps/lammps-stable_3Mar2020/examples/shear
[root@server1 ~]# mpirun --allow-run-as-root -np 24 –oversubscribe \
-host 100.0.1.10,100.0.1.11 lmp_mpi \
< /root/lammps/lammps-3Mar20/examples/shear/in.shear

6.2.2 WRF

使用两台服务器每台12个核心总计24个核心并发运行WRF应用，服务器之间Fmerice网卡RoCEv2模式直连。

[root@Server1 em_real]# time /usr/mpi/gcc/openmpi-4.1.5a1/bin/mpirun -np 24 -oversubscribe --allow-run-as-root \
--host 100.0.1.10,100.0.1.11  ./wrf.exe

7 测试结果

7.1 E2E转发测试

本次E2E场景测试方案，测试结果如图3、图4所示：
Mellanox X-4 100G网卡，网卡时延1.74us。
Femrice Intel E810-C网卡，带宽4723.19MB/s，网卡时延8.59us。
GRT Intel E810-C网卡带宽4794.26MB/s，网卡时延9.02us。

图3：国产网卡时延数据

图4：国产网卡带宽数据

7.2 HPC应用测试

本次HPC应用WRF和LAMMPS测试方案经过多次测试，测试结果3款网卡通过相同应用配置并行计算，国产100G网卡性能低约10%。

图5：CX-N和IB交换机跑HPC应用时间

更多相关文章：

配置指导：CX102S-DPU开放智能网关用户指导手册

1 引言
2 首次登录设备
3 配置DPU1
4 配置DPU2

CX102S-DPU开放智能网关用户指导手册

1 引言

CX102S-DPU开放智能网关内部核心模块为1*交换芯片+2*DPU（高性能计算单元），其中两张DPU和交换芯片通过内部2*10G链路互联，其中面板的1-8接口还支持PoE++供电能力，交换芯片中运行着基于SONiC的开放的网络操作系统AsterNOS，提供丰富的L2/L3转发功能。默认情况下，DPU1默认安装了OpenWrt系统，专为网络出口提供网关能力。DPU2默认安装了Debian系统，用户可以根据实际业务需求，按需安装多种类型的软件工具。

同时，允许用户在设备的两张DPU上安装任意Linux发行版本，包括Ubuntu、Debian、OpenWRT、CentOS 等，从而支持开放的软件生态，如VPP、UFW、OpenVPN、Snort、HAProxy、Nginx、ntopng等，并可根据用户需要在同一台设备上组合运行多个软件。

2 首次登录设备

2.1 设备基础配置

设备初始化情况下没有默认的登录/管理IP，设备上电后需要通过串口连接，配置IP地址后可以通过SSH的方式登录设备。

设备默认登录用户名：admin，密码：asteros。
通过串口连接设备时需注意修改波特率为：115200。

该设备内置两张高性能DPU卡，其中DPU1通过内部接口Ethernet19与交换芯片相连，DPU2通过内部接口Ethernet20与交换芯片相连。缺省情况下，设备面板上的所有物理接口以及内部接口都为access模式，默认均为VLAN1，登录设备后可以通过给VLAN1配置IP地址作为设备的管理IP。

设备配置命令如下：

sonic# configure terminal 
sonic(config)# interface vlan 1
sonic(config-vlanif-1)# ip address 192.168.1.254/24

3 配置DPU1

DPU1默认安装了OpenWrt系统，专为网络出口提供网关能力，用户也可以基于该系统安装第三方软件。如需自定义安装其他基础操作系统，请前往Asterfusion官网参考《CX102S-DPU开放智能网关DPU操作系统安装指导》进行系统安装。

注：如需修改DPU1/DPU2的默认IP地址，请通过串口连接设备后，在命令行输入以下命令分别可以进入DPU1和DPU2后台进行配置修改：（注：命令需一次性完整输入）

sonic# 
sonic# switchUart1           //输入该命令进入DPU1
root@OpenWrt:/# 
root@OpenWrt:/# switchUart0      //输入该命令退出
sonic# 
sonic# switchUart2             //输入该命令进入DPU2
root@OCTEONTX:~/ntopng-6.2# 
root@OCTEONTX:~/ntopng-6.2# switchUart0 //输入该命令退出

3.2 OpenWrt功能配置

3.2.1 配置WAN接口

缺省情况下，OpenWrt默认的LAN接口状态如下图所示，需先创建一个WAN口接入运营商网络：

按照以下步骤创建接口，需要注意设备侧需要对应配置相同的VLAN才能互通（以下步骤中LAN端口为VLAN10，WAN端口为VLAN100，用户可根据网络情况进行修改）
1、点击【网络】-【接口】-【设备】-【添加设备配置】

2、配置以下参数信息，点击【保存】

3、删除IPV6默认配置后点击【保存并应用】

4、再返回到【接口】，点击【添加新接口】

5、配置以下参数信息后点击【创建接口】

6、点击【创建接口后】，根据网络出口真实的IP规划进行配置后点击【保存】

7、WAN接口配置完成后点击【保存并应用】，完成接口的配置

3.2.2 配置LAN接口

1、点击【网络】-【接口】-【设备】-【添加设备配置】

2、配置以下参数信息，点击【保存】

3、再返回到【接口】，点击【添加新接口】

4、点击【创建接口后】，根据网络LAN真实的IP规划进行配置后点击【保存】

5、将初始的LAN口删除后点击【保存并应用】，即可完成OpenWrt的配置

注：如需配置其他功能，详情请参考OpenWrt官方配置指导手册

3.3 设备接口配置

典型场景配置举例：当图中PC需要经过网关设备访问外网时，需要把OpenWrt中eth0.100接口设置为WAN口，eth0.10接口设置成LAN口（上一节配置中已介绍），同时需要把设备的Eth18口与运营商专线连接，配置为WAN端口，把设备的Eth1-4口连接PC，配置成LAN端口，VLAN（WAN：VLAN100，LAN：VLAN10），DPU1与交换机侧通过Ethernet19口进行数据传输。将交换机的内部接口Eth19的VLAN放行，设备配置如下：

sonic# configure terminal 
sonic(config)# vlan 100            //创建WAN口对应的VLAN
sonic(config-vlan-100)# exit
sonic(config)# interface ethernet 18     //配置WAN口
sonic(config-if-18)# no switchport access vlan 1
sonic(config-if-18)# switchport access vlan 100
sonic(config-if-18)# exit
sonic(config)# vlan 10                  //创建LAN口对应的VLAN
sonic(config-vlan-10)# exit
sonic(config)# interface ethernet 1      //配置LAN口（其余LAN口配置相同）
sonic(config-if-1)# no switchport access vlan 1
sonic(config-if-1)# switchport access vlan 10
sonic(config-if-1)# exit

4 配置DPU2

DPU2默认安装了Debian系统，用户也可以基于该系统安装其他第三方软件。该文档以在系统之上安装ntopng（监控网络流量工具）举例，可以将网络出口的全流量进行可视化监控分析，用户如需自定义安装其他基础操作系统，请前往Asterfusion官网参考《CX102S-DPU开放智能网关-DPU操作系统安装指导》进行系统安装，如需自定义安装其他软件工具，请前往Asterfusion官网参考《CX102S-DPU开放智能网关-DPU软件安装指导-ntopng》进行系统安装。

4.1 安装依赖

sonic# switchUart2
admin@OCTEONTX:~$ sudo apt-get install build-essential git bison flex libxml2-dev libpcap-dev libtool libtool-bin rrdtool librrd-dev autoconf pkg-config automake autogen redis-server wget libsqlite3-dev libhiredis-dev libmaxminddb-dev libcurl4-openssl-dev libpango1.0-dev libcairo2-dev libnetfilter-queue-dev zlib1g-dev libssl-dev libcap-dev libnetfilter-conntrack-dev libreadline-dev libjson-c-dev libldap2-dev rename libsnmp-dev libexpat1-dev libmaxminddb-dev libradcli-dev libjson-c-dev libzmq3-dev curl jq libnl-genl-3-dev libgcrypt20-dev
admin@OCTEONTX:~$ sudo apt-get install vim git

4.2 准备源码

root@OCTEONTX:~# git clone https://github.com/ntop/ntopng.git
root@OCTEONTX:~# git clone https://github.com/ntop/ntopng-dist.git /root/ntopng/httpdocs/dist
root@OCTEONTX:~# git clone https://github.com/ntop/nDPI.git /root/ntopng/
# 网络问题可能会导致拉取失败或耗时较长，因此可以手动下载到本地，再上传解压
root@OCTEONTX:~# unzip nDPI-4.10.tar.gz
root@OCTEONTX:~# unzip ntopng-6.2.tar.gz
root@OCTEONTX:~# cp -vrf nDPI-4.10 ntopng-6.2/nDPI
root@OCTEONTX:~# unzip ntopng-dist-6.2-stable.zip
root@OCTEONTX:~# cp -vrf ntopng-dist-6.2-stable/* ntopng-6.2/ httpdocs/dist/

4.3 编译安装

# 进入编译安装目录
root@OCTEONTX:~# cd ntopng-6.2/
root@OCTEONTX:~/ntopng-6.2# 
# 先安装nDPI
root@OCTEONTX:~/ntopng-6.2# cd nDPI
root@OCTEONTX:~/ntopng-6.2 /nDPI# ./autogen.sh
root@OCTEONTX:~/ntopng-6.2 /nDPI# ./configure
root@OCTEONTX:~/ntopng-6.2 /nDPI# make
root@OCTEONTX:~/ntopng-6.2 /nDPI# cd ..
# 再安装ntopng
root@OCTEONTX:~/ntopng-6.2# ./autogen.sh
root@OCTEONTX:~/ntopng-6.2# ./configure
root@OCTEONTX:~/ntopng-6.2# make
root@OCTEONTX:~/ntopng-6.2# make install
root@OCTEONTX:~/ntopng-6.2# which ntopng 
/usr/local/bin/ntopng
root@OCTEONTX:~/ntopng-6.2# ntopng --version
Version:        6.2.240815 [Community build]
GIT rev:        :6.2.240815
root@OCTEONTX:~/ntopng-6.2#
# 配置接口IP地址（可以自定义IP）
root@OCTEONTX:~/ntopng-6.2# ip add add dev eth0 192.168.2.1/24
root@OCTEONTX:~/ntopng-6.2# route add default gw 192.168.2.254
# 启动Redis数据库
root@OCTEONTX:~/ntopng-6.2# systemctl start redis
# 命令行方式运行ntopng
root@OCTEONTX:~/ntopng-6.2# ntopng /etc/ntopng/ntopng.conf --dont-change-user

4.4 交换机侧配置

DPU2与交换机侧通过Ethernet20口进行数据传输，此时需要把交换机上WAN侧（Ethernet 16）的流量镜像到Ethernet20，供DPU2上的ntopng进行监控分析。

sonic# show running-config 
interface ethernet 1                                  //连接PC登录ntopng
 switchport access vlan 20  
!
interface ethernet 8                                  //LAN口
 switchport access vlan 10
!
interface ethernet 16                                 //WAN口
 switchport access vlan 100
!
interface ethernet 19                                 //DPU1与交换机的互联口
 switchport trunk vlan 10
 switchport trunk vlan 100
!
interface ethernet 20                                 //DPU2与交换机的互联口
 switchport access vlan 20
!
vlan 1
!
vlan 10
!
vlan 20
!
vlan 100
!
interface vlan 10
 ip address 192.168.1.254/24
!
interface vlan 20
 ip address 192.168.2.254/24
!
interface vlan 100
 ip address 192.168.17.2/24
!
//配置端口镜像，将16口的流量镜像到20口，供DPU2上的ntopng进行监控分析
mirror session 1 span   
!
mirror session 1 span src-ethernet 16 dst-ethernet 20
!
ip route 0.0.0.0/0 192.168.17.254
!
end
sonic#

4.5 功能验证

当前ntopng所在计算单元的管理IP是192.168.2.1/24，PC配置相同网段的IP，与设备Ethernet1口连接，通过192.168.2.1:3000即可访问ntopngp的WEB界面。默认的用户名密码是admin/admin，第一次登录需要修改密码。

在Web页面上可以看到所有经过WAN口的全部流量：

配置指导：在Debian Linux上安装ntopng

1 操作前声明
2 工具介绍
3 编译安装
4 启动运行
5 访问验证

DPU软件安装指导-ntopng

本文档介绍如何在星融元CX102S-DPU设备的计算单元（DPU）的Debian Linux系统上安装网络流量可视化监控工具ntopng。

1、ntopng介绍

ntopng是ntop的下一代版本，是一款基于web的网络流量分析工具，它能够实时监控和分析网络流量，提供丰富的可视化界面，帮助用户更好地了解网络状况和优化网络性能。

ntopng支持多种协议和数据源，包括TCP、UDP、HTTP、DNS、NetFlow等，可以对网络流量进行深度分析，并提供实时警报和日志记录功能。ntopng的优点是易于安装和使用，具有强大的功能和灵活的配置选项，可以帮助管理员快速识别网络问题并采取相应措施。

2、 ntopng编译安装

2.1 安装依赖

admin@OCTEONTX:~$ sudo apt-get install build-essential git bison flex libxml2-dev libpcap-dev libtool libtool-bin rrdtool librrd-dev autoconf pkg-config automake autogen redis-server wget libsqlite3-dev libhiredis-dev libmaxminddb-dev libcurl4-openssl-dev libpango1.0-dev libcairo2-dev libnetfilter-queue-dev zlib1g-dev libssl-dev libcap-dev libnetfilter-conntrack-dev libreadline-dev libjson-c-dev libldap2-dev rename libsnmp-dev libexpat1-dev libmaxminddb-dev libradcli-dev libjson-c-dev libzmq3-dev curl jq libnl-genl-3-dev libgcrypt20-dev
admin@OCTEONTX:~$ sudo apt-get install vim git

2.2 准备源码

root@OCTEONTX:~# git clone https://github.com/ntop/ntopng.git
root@OCTEONTX:~# git clone https://github.com/ntop/ntopng-dist.git /root/ntopng/httpdocs/dist
root@OCTEONTX:~# git clone https://github.com/ntop/nDPI.git /root/ntopng/
# 网络问题可能会导致拉取失败或耗时较长，因此可以手动下载到本地，再上传解压
root@OCTEONTX:~# unzip nDPI-4.10.tar.gz
root@OCTEONTX:~# unzip ntopng-6.2.tar.gz
root@OCTEONTX:~# cp -vrf nDPI-4.10 ntopng-6.2/nDPI
root@OCTEONTX:~# unzip ntopng-dist-6.2-stable.zip
root@OCTEONTX:~# cp -vrf ntopng-dist-6.2-stable/* ntopng-6.2/ httpdocs/dist/

3.3 编译安装

# 进入编译安装目录
root@OCTEONTX:~# cd ntopng-6.2/
root@OCTEONTX:~/ntopng-6.2# 
# 先安装nDPI
root@OCTEONTX:~/ntopng-6.2# cd nDPI
root@OCTEONTX:~/ntopng-6.2 /nDPI# ./autogen.sh
root@OCTEONTX:~/ntopng-6.2 /nDPI# ./configure
root@OCTEONTX:~/ntopng-6.2 /nDPI# make
root@OCTEONTX:~/ntopng-6.2 /nDPI# cd ..
# 再安装ntopng
root@OCTEONTX:~/ntopng-6.2# ./autogen.sh
root@OCTEONTX:~/ntopng-6.2# ./configure
root@OCTEONTX:~/ntopng-6.2# make
root@OCTEONTX:~/ntopng-6.2# make install
root@OCTEONTX:~/ntopng-6.2# which ntopng 
/usr/local/bin/ntopng
root@OCTEONTX:~/ntopng-6.2# ntopng --version
Version:        6.2.240815 [Community build]
GIT rev:        :6.2.240815
root@OCTEONTX:~/ntopng-6.2#

3、启动运行

计算单元-1侧配置步骤：

默认情况下，计算单元-1预装OpenWRT系统，可根据用户所处网络环境到OpenWRT的WEB界面进行相应的网络配置。具体配置方法/流程请参考OpenWRT的指导文档。

计算单元-2侧配置步骤：

# 用配置模板，准备一份运行时配置文件
root@OCTEONTX:~/ntopng-6.2# mkdir -p /etc/ntopng
root@OCTEONTX:~/ntopng-6.2# cp ./packages/etc/ntopng/ntopng.conf /etc/ntopng/ntopng.conf

# 启动Redis数据库
root@OCTEONTX:~/ntopng-6.2# systemctl start redis

# 命令行方式运行ntopng
root@OCTEONTX:~/ntopng-6.2# ntopng /etc/ntopng/ntopng.conf --dont-change-user
交换单元侧配置：
计算单元-2的管理口eth0，对应到交换单元为Ethernet20。

sonic# show startup-config 
!
interface ethernet 8
 switchport access vlan 10
!
interface ethernet 16
 switchport access vlan 100
!
interface ethernet 19
 switchport trunk vlan 10
 switchport trunk vlan 100
!
interface ethernet 20
 switchport access vlan 100
!
vlan 1
 broadcast flood
 unknown-uni flood
 unre-multi flood
!
vlan 10
!
vlan 100
!
interface vlan 10
 ip address 192.168.1.2/24
!
interface vlan 100
 ip address 192.168.17.2/24
!
ip route 0.0.0.0/0 192.168.17.254
!
end
sonic#

5 访问验证ntopng运行

当前ntopng所在计算单元的管理IP是192.168.17.26/24，所以通过 http://192.168.17.26:3000 访问ntopngp的WEB界面。默认的用户名密码是admin/admin，第一次登录需要修改密码。

相关产品：CX-M系列云化园区交换机

CX-M系列云化园区交换机：构建新一代精简高效的云化园区网络

云化园区

网络操作系统（SONiC）

可视交换机

开放硬件平台

1 存储场景介绍

2 常用的测试工具

3 存储性能指标解读

4 测试流程与使用到的软件

5 Fio使用介绍与测试结果说明

5.1 工具介绍

5.2 参数说明

5.3 测试结果

5.4 结果解读

附录：常用测试工具的使用文档

点击了解Asterfusion CX-N数据中心交换机

1.1 GPU服务器选型

2.1 RoCEv2交换机

2.2 GPU服务器基础配置

2.2.5 安装MLNX网卡驱动

2.3 安装GPU驱动和集合通讯库

2.3.1.1 安装GPU驱动和CUDA、CUDNN

2.3.1.2编译安装OpenMPI

2.3.2 集合通信性能测试方法（all_reduce）

3.1.1 获取LLaMA-Factory源码包

3.1.6 推理测试

1.PXE简介

2.PXE与SONiC LAG Fallback

3.PXE与AsterNOS MC-LAG Fallback

4.AsterNOS的MC-LAG Fallback功能验证

在Centos76-1上完成mode4 bond配置、DHCP Server配置：

在Leaf1和2上完成MC-LAG配置，并确认状态正常

在Centos76-2的两个业务口上，通过DHCP无法获取IP地址

在Leaf1和2上的LAG2接口组上，启用Fallback功能，AsterNOS会暂时保持一侧端口被激活，在收到LACP协商报文后恢复动态聚合模式

在Centos76-2的两个业务口上，其中一个口能通过DHCP获取到IP地址

在DHCP Server上能看到租约信息

在Centos76-2上对两个业务口做bond，观察到Leaf1和2上LAG2的成员口都进入Active状态，Fallback功能生效，LAG2恢复动态聚合模式

1 目标

2 硬件与软件环境

3 Kubernetes简介

4 安装步骤

4.1 准备环境

4.2 部署Etcd集群

4.3 安装运行Master节点组件

4.4 安装运行Node节点组件

4.5 部署Flannel网络

4.6 部署Calico网络

4.7 部署WebUI

5 结果验证

6 参考资料

1 方案概述

2 物理网络拓扑

3 硬件与软件环境

3.1 设备管理口

3.2 硬件环境

3.3 软件环境

4 基础环境部署

4.1 LLDP

4.2 安装iPerf3

4.3 检查链路连接

5 组网环境配置

5.1 逻辑拓扑

5.2 Spine1

5.3 Spine2

5.4 Leaf1

5.5 Leaf2

5.6 Leaf3

5.7 Leaf4

1 操作前声明

2 安装流程

2.1 准备安装所需文件和物料

2.2 从U盘中引导临时系统

2.3 安装系统到DPU硬盘

2.4 设置uboot环境变量

2.5 从DPU硬盘引导系统

3 附录

3.1 环境变量解释

3.1.1 setenv bootusb

3.1.2 setenv bootarg

3.1.3 setenv bootcmd

3.1.4 saveenv