Skip to main content
开放网络的先行者与推动者—星融元
加入我们技术支持(Support)  TEL:(+86)4000989811
BeeGPF

并行文件系统的安装部署与性能测试

1 目标

本文档将简要介绍并行文件系统及其开源方案BeeGFS的基本概念,并选用Asterfusion CX-N系列超低时延云交换机进行组网,完成3节点环境的部署配置和性能测试。

2 概要介绍

2.1 关于并行文件系统

高性能计算机系统(HPC)通过汇总多个计算资源来快速解决大型计算问题,为了让集群中的计算节点更好地配合,通常会为HPC搭建一个用于计算节点之间共享的并行文件系统。

并行文件系统(Parallel File System)是一种经过优化的高性能文件系统,提供毫秒级别访问时延,TB/s级别带宽和百万级别的IOPS,能够快速处理高性能计算(HPC)工作负载。

并行文件系统适用于需要处理大量数据和高度并行化计算的领域,例如:

  1. 科学计算:天气预测、气候模拟、地震模拟、流体力学、生物医学、物理学等需要处理大量实验数据和进行复杂计算的领域;
  2. 工业制造:汽车设计、航空航天、船舶设计、复杂机械制造等需要进行大规模计算和模拟的领域;
  3. 金融:证券交易、风险管理、金融建模等需要处理大量交易数据和进行复杂计算的领域;
  4. 动画制作:电影、电视、游戏等需要进行大规模渲染和图像处理的领域;
  5. 互联网应用:大规模数据挖掘、搜索引擎、社交网络、电子商务等需要处理大量数据和进行实时计算的领域。
    如今,HPC已经从传统的计算密集型(大规模的仿真应用等),转变为数据驱动的以数据为中心的计算(基于大规模数据的生产、处理和分析等),这种转变趋势驱使后端存储不断演进发展,满足高性能和高可扩展性的要求。
    并行文件系统中,文件/数据被切分并放置到多个存储设备中(各个被切分的数据如何放置,由并行文件系统通过算法来控制),系统使用全局名称空间来进行数据访问。并行文件系统的客户端可以同时使用多个 IO 路径将数据读/写到多个存储设备。
    目前,常用的并行文件系统有以下几种:
  6. Lustre:是一种开源的并行分布式文件系统,由Sun Microsystems和Cray公司开发,目前由OpenSFS和欧洲开源软件基金会(EOFS)维护。被广泛应用于高性能计算(HPC)和大规模数据存储领域,具有高性能、高可靠性和高可扩展性等特点;
  7. BeeGFS:是一种高性能、可扩展的并行文件系统,由Fraunhofer Institute for Industrial Mathematics and IT(ITWM)开发。它支持多种数据访问协议,包括POSIX、NFS和SMB等,被广泛应用于高性能计算和大规模数据存储领域;
  8. IBM Spectrum Scale(原名GPFS):是一种高性能、可扩展的并行文件系统,由IBM公司开发,可用于大规模数据存储和分析。它支持多种数据访问协议,包括POSIX、NFS、SMB和HDFS等;
  9. Ceph:是一种开源的分布式存储系统,可以提供块存储、对象存储和文件存储等多种接口,支持高可靠性和可扩展性的数据存储和访问;
  10. PVFS(Parallel Virtual File System):是一种开源的并行文件系统,由Clemson University和Oak Ridge National Laboratory共同开发。被广泛用于科学计算和高性能计算领域,具有高性能和高可扩展性等特点。

这些并行文件系统都具有高性能、高可靠性和高可扩展性等特点,被广泛应用于高性能计算、大规模数据存储和分析等领域。

2.2 关于BeeGFS

BeeGFS原名为FhGFS,是由Fraunhofer Institute为工业数学计算而设计开发,由于在欧洲和美国的中小型HPC系统性能表现良好,在2014年改名注册为BeeGFS并受到科研和商业的广泛应用。

BeeGFS既是一个网络文件系统也是一个并行文件系统。客户端通过网络与存储服务器进行通信(具有TCP/IP或任何具有RDMA功能的互连,如InfiniBand,RoCE或Omni-Path,支持native verbs 接口)。

BeeGFS实现了文件和MetaData的分离。文件是用户希望存储的数据,而MetaData是包括访问权限、文件大小和位置的“关于数据的数据”,MetaData中最重要的是如何从多个文件服务器中找到具体对应的文件,这样才能使客户端获取特定文件或目录的MetaData后,可以直接与存储文件的Stroage服务器对话以检索信息。

BeeGFS的Storage Servers和MetaData Servers的数量可以弹性伸缩。因此,可通过扩展到适当数量的服务器来满足不同性能要求,软件整体架构如下图所示:

BeeGFS整体架构图

图1:BeeGFS整体架构图

组件名称组件包名称说明
管理服务beegfs-mgmtd管理服务用于监控所有已注册服务的状态,不存储用户的任何数据。
注:在进行元数据服务、存储服务、客户端服务配置时,都需要指向管理服务节点IP地址,一般集群部署需要第一个部署的服务,有且只有一个。
元数据服务beegfs-meta元数据服务用于存储文件的元数据信息,如目录结构、权限信息、数据分片存放位置等,一个文件对应一个元数据文件,客户端打开文件时,由元数据服务向客户端提供数据具体存放节点位置,之后客户端直接与存储服务进行数据读写操作,可支持横向扩展,增加多个元数据服务用以提升文件系统性能。
存储服务beegfs-storage存储服务用于存储条带化后的数据块文件。
客户端服务beegfs-client
beegfs-helperd
客户端服务用于集群存储空间挂载,当服务开启时自动挂载到本地路径,之后可通过nfs/samba服务将本地挂载点导出,提供linux/windows客户端访问。
注:挂载路径通过/etc/beegfs/beegfs-mounts.conf 配置文件指定,beegfs-helperd主要用于日志写入,不需要额外的配置。
命令行组件beegfs-utils
beegfs-common
提供命令行工具,如beegfs-ctl、beegfs-df等。
表1:BeeGFS的系统组件

2.3 关于Asterfusion CX-N系列超低时延云交换机

星融元Asterfusion自主开发的面向数据中心网络的CX-N系列超低时延云交换机,为云数据中心的高性能计算集群、存储集群、大数据分析、高频交易、Cloud OS全面融合等多业务场景提供高性能的网络服务。

本次测试环境使用了一台CX532P-N进行组网,这款1U交换机拥有32x100GE QSFP28光口,和2 x 10GE SFP+光口,交换容量高达6.4Tbps。

3 测试环境声明与组网拓扑

3.1 硬件设备与物料

设备类型配置参数数量
交换机CX532P-N(1U, 32 x 100GE QSFP28, 2 x 10GE SFP+)1
模块、线缆100G breakout 4x25G[10G] 的模块、线缆1
服务器处理器:Intel(R) Core(TM) i7-7700
内存:8GB
硬盘:1TB HDD + 1TB SSD
1
服务器处理器:Intel(R) Core(TM) i7-8700
内存:8GB
硬盘:1TB HDD + 1TB SSD
1
服务器处理器:Intel(R) Core(TM) i7-9700
内存:8GB
硬盘:1TB HDD + 1TB SSD
1
表2:硬件设备与物料

3.2 系统和软件版本

设备类型主机名版本
交换机CX532P-NAsterNOS Software, Version 3.1, R0314P06
服务器server4操作系统:openEuler release 22.03 (LTS-SP1)
内核:5.10.0-136.35.0.111.oe2203sp1.x86_64
BeeGFS:7.3.3
OFED驱动:MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64
服务器server5操作系统:openEuler release 22.03 (LTS-SP1)
内核:5.10.0-136.35.0.111.oe2203sp1.x86_64
BeeGFS:7.3.3
OFED驱动:MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64
服务器server6操作系统:Rocky Linux release 8.8 (Green Obsidian)
内核:4.18.0-477.13.1.el8_8.x86_64
BeeGFS:7.3.3
OFED驱动:MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64
表3:系统和软件版本

3.3 存储系统规划

主机名节点IP节点角色硬盘划分
server4管理口 10.230.1.54
业务-1 172.16.8.54/24
mgmtd、meta、storagemgmtd:50G NVMe
meta:50G NVMe
storage:500G NVMe
server5管理口 10.230.1.55
业务-1 172.16.8.55/24
mgmtd、meta、storagemgmtd:50G NVMe
meta:50G NVMe
storage:500G NVMe
server6管理口 10.230.1.56
业务-1 172.16.8.56/24
client、helperd/
表4:存储规划

3.4 测试组网拓扑

测试组网拓扑

图2:测试组网拓扑

4 测试结果

4.1 Run BeeGFS Bench

命令行

4.2 Run IOR and mdtest

命令行
命令行
命令行

4.3 Run dbench

命令行

4.4 Run IO500

命令行

5 配置参考

5.1 服务器

5.1.1 安装Mellanox OFED驱动

Server4
# 下载当前系统发行版适用的驱动包
[root@server4 ~]# cat /etc/openEuler-release 
openEuler release 22.03 (LTS-SP1)
[root@server4 ~]# wget https://content.mellanox.com/ofed/MLNX_OFED-5.4-3.6.8.1/MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64.tgz
# 编译当前系统内核适用的驱动包
[root@server4 ~]# tar xvf MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64.tgz
[root@server4 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64]# cd MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64
[root@server4 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64]# ./mlnx_add_kernel_support.sh -m /root/MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64
[root@server4 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64]# cd ..
# 安装生成的驱动包
[root@server4 ~]# cp /tmp/MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext.tgz ./
[root@server4 ~]# tar xvf MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext.tgz
[root@server4 ~]# cd MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext
[root@server4 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext]# ./mlnxofedinstall
# 生成initramfs,重启生效
[root@server4 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext]# dracut -f
[root@server4 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext]# reboot
# 启动openibd,检查驱动运行状态
[root@server4 ~]# /etc/init.d/openibd restart
[root@server4 ~]# /etc/init.d/openibd status

  HCA driver loaded

Configured Mellanox EN devices:
enp1s0f0
enp1s0f1

Currently active Mellanox devices:
enp1s0f0
enp1s0f1

The following OFED modules are loaded:

  rdma_ucm
  rdma_cm
  ib_ipoib
  mlx5_core
  mlx5_ib
  ib_uverbs
  ib_umad
  ib_cm
  ib_core
  mlxfw

[root@server4 ~]# 

Server5
# 下载当前系统发行版适用的驱动包
[root@server5 ~]# cat /etc/openEuler-release 
openEuler release 22.03 (LTS-SP1)
[root@server5 ~]# wget https://content.mellanox.com/ofed/MLNX_OFED-5.4-3.6.8.1/MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64.tgz
# 编译当前系统内核适用的驱动包
[root@server5 ~]# tar xvf MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64.tgz
[root@server5 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64]# cd MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64
[root@server5 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64]# ./mlnx_add_kernel_support.sh -m /root/MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64
[root@server5 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64]# cd ..
# 安装生成的驱动包
[root@server5 ~]# cp /tmp/MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext.tgz ./
[root@server5 ~]# tar xvf MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext.tgz
[root@server5 ~]# cd MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext
[root@server5 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext]# ./mlnxofedinstall
# 生成initramfs,重启生效
[root@server5 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext]# dracut -f
[root@server5 MLNX_OFED_LINUX-5.4-3.6.8.1-openeuler22.03-x86_64-ext]# reboot
# 启动openibd,检查驱动运行状态
[root@server5 ~]# /etc/init.d/openibd restart
[root@server5 ~]# /etc/init.d/openibd status

  HCA driver loaded

Configured Mellanox EN devices:
enp1s0f0
enp1s0f1

Currently active Mellanox devices:
enp1s0f0
enp1s0f1

The following OFED modules are loaded:

  rdma_ucm
  rdma_cm
  ib_ipoib
  mlx5_core
  mlx5_ib
  ib_uverbs
  ib_umad
  ib_cm
  ib_core
  mlxfw

[root@server5 ~]# 

Server6
# 下载当前系统发行版适用的驱动包
[root@server6 ~]# cat /etc/rocky-release
Rocky Linux release 8.8 (Green Obsidian)
[root@server6 ~]# wget https://content.mellanox.com/ofed/MLNX_OFED-5.4-3.7.5.0/MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64.tgz
# 编译当前系统内核适用的驱动包
[root@server6 ~]# tar xvf MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64.tgz
[root@server6 MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64]# cd MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64
[root@server6 MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64]# ./mlnx_add_kernel_support.sh -m /root/MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64
[root@server6 MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64]# cd ..
# 安装生成的驱动包
[root@server6 ~]# cp /tmp/MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64-ext.tgz ./
[root@server6 ~]# tar xvf MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64-ext.tgz
[root@server6 ~]# cd MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64-ext
[root@server6 MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64-ext]# ./mlnxofedinstall
# 生成initramfs,重启生效
[root@server6 MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64-ext]# dracut -f
[root@server6 MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.8-x86_64-ext]# reboot
# 启动openibd,检查驱动运行状态
[root@server6 ~]# /etc/init.d/openibd restart
[root@server6 ~]# /etc/init.d/openibd status

  HCA driver loaded

Configured Mellanox EN devices:
enp7s0
enp8s0

Currently active Mellanox devices:
enp7s0
enp8s0

The following OFED modules are loaded:

  rdma_ucm
  rdma_cm
  ib_ipoib
  mlx5_core
  mlx5_ib
  ib_uverbs
  ib_umad
  ib_cm
  ib_core
  mlxfw

[root@server6 ~]# 

5.1.2 配置RoCEv2

Server4
[root@server4 ~]# ibdev2netdev 
mlx5_0 port 1 ==> enp1s0f0 (Up)
mlx5_1 port 1 ==> enp1s0f1 (Up)
[root@server4 ~]# cat /etc/sysconfig/network-scripts/config-rocev2.sh
#enp1s0f0
mlnx_qos -i enp1s0f0 --trust dscp
mlnx_qos -i enp1s0f0 --pfc 0,0,0,0,1,0,0,0
cma_roce_mode -d mlx5_0 -p 1 -m 2
echo 128 > /sys/class/infiniband/mlx5_0/tc/1/traffic_class
cma_roce_tos -d mlx5_0 -t 128
echo 1 > /sys/class/net/enp1s0f0/ecn/roce_np/enable/1
echo 1 > /sys/class/net/enp1s0f0/ecn/roce_rp/enable/1
echo 40 > /sys/class/net/enp1s0f0/ecn/roce_np/cnp_dscp
sysctl -w net.ipv4.tcp_ecn=1
# enp1s0f1
mlnx_qos -i enp1s0f1 --trust dscp
mlnx_qos -i enp1s0f1 --pfc 0,0,0,0,1,0,0,0
cma_roce_mode -d mlx5_1 -p 1 -m 2
echo 128 > /sys/class/infiniband/mlx5_1/tc/1/traffic_class
cma_roce_tos -d mlx5_1 -t 128
echo 1 > /sys/class/net/enp1s0f1/ecn/roce_np/enable/1
echo 1 > /sys/class/net/enp1s0f1/ecn/roce_rp/enable/1
echo 40 > /sys/class/net/enp1s0f1/ecn/roce_np/cnp_dscp
[root@server4 ~]# mlnx_qos -i enp1s0f0
DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
        prio:0 dscp:07,06,05,04,03,02,01,00,
        prio:1 dscp:15,14,13,12,11,10,09,08,
        prio:2 dscp:23,22,21,20,19,18,17,16,
        prio:3 dscp:31,30,29,28,27,26,25,24,
        prio:4 dscp:39,38,37,36,35,34,33,32,
        prio:5 dscp:47,46,45,44,43,42,41,40,
        prio:6 dscp:55,54,53,52,51,50,49,48,
        prio:7 dscp:63,62,61,60,59,58,57,56,
default priority:
Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0   
        buffer      0   0   0   0   1   0   0   0   
tc: 0 ratelimit: unlimited, tsa: strict
         priority:  0
         priority:  1
         priority:  2
         priority:  3
         priority:  4
         priority:  5
         priority:  6
         priority:  7
[root@server4 ~]# mlnx_qos -i enp1s0f1
DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
        prio:0 dscp:07,06,05,04,03,02,01,00,
        prio:1 dscp:15,14,13,12,11,10,09,08,
        prio:2 dscp:23,22,21,20,19,18,17,16,
        prio:3 dscp:31,30,29,28,27,26,25,24,
        prio:4 dscp:39,38,37,36,35,34,33,32,
        prio:5 dscp:47,46,45,44,43,42,41,40,
        prio:6 dscp:55,54,53,52,51,50,49,48,
        prio:7 dscp:63,62,61,60,59,58,57,56,
default priority:
Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0   
        buffer      0   0   0   0   1   0   0   0   
tc: 0 ratelimit: unlimited, tsa: strict
         priority:  0
         priority:  1
         priority:  2
         priority:  3
         priority:  4
         priority:  5
         priority:  6
         priority:  7
[root@server4 ~]# cat /sys/class/net/*/ecn/roce_np/cnp_dscp
40
40
[root@server4 ~]# 

Server5
[root@server5 ~]# ibdev2netdev 
mlx5_0 port 1 ==> enp1s0f0 (Up)
mlx5_1 port 1 ==> enp1s0f1 (Up)
[root@server5 ~]# cat /etc/sysconfig/network-scripts/config-rocev2.sh
#enp1s0f0
mlnx_qos -i enp1s0f0 --trust dscp
mlnx_qos -i enp1s0f0 --pfc 0,0,0,0,1,0,0,0
cma_roce_mode -d mlx5_0 -p 1 -m 2
echo 128 > /sys/class/infiniband/mlx5_0/tc/1/traffic_class
cma_roce_tos -d mlx5_0 -t 128
echo 1 > /sys/class/net/enp1s0f0/ecn/roce_np/enable/1
echo 1 > /sys/class/net/enp1s0f0/ecn/roce_rp/enable/1
echo 40 > /sys/class/net/enp1s0f0/ecn/roce_np/cnp_dscp
sysctl -w net.ipv4.tcp_ecn=1
# enp1s0f1
mlnx_qos -i enp1s0f1 --trust dscp
mlnx_qos -i enp1s0f1 --pfc 0,0,0,0,1,0,0,0
cma_roce_mode -d mlx5_1 -p 1 -m 2
echo 128 > /sys/class/infiniband/mlx5_1/tc/1/traffic_class
cma_roce_tos -d mlx5_1 -t 128
echo 1 > /sys/class/net/enp1s0f1/ecn/roce_np/enable/1
echo 1 > /sys/class/net/enp1s0f1/ecn/roce_rp/enable/1
echo 40 > /sys/class/net/enp1s0f1/ecn/roce_np/cnp_dscp
[root@server5 ~]# mlnx_qos -i enp1s0f0
DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
        prio:0 dscp:07,06,05,04,03,02,01,00,
        prio:1 dscp:15,14,13,12,11,10,09,08,
        prio:2 dscp:23,22,21,20,19,18,17,16,
        prio:3 dscp:31,30,29,28,27,26,25,24,
        prio:4 dscp:39,38,37,36,35,34,33,32,
        prio:5 dscp:47,46,45,44,43,42,41,40,
        prio:6 dscp:55,54,53,52,51,50,49,48,
        prio:7 dscp:63,62,61,60,59,58,57,56,
default priority:
Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0   
        buffer      0   0   0   0   1   0   0   0   
tc: 0 ratelimit: unlimited, tsa: strict
         priority:  0
         priority:  1
         priority:  2
         priority:  3
         priority:  4
         priority:  5
         priority:  6
         priority:  7
[root@server5 ~]# mlnx_qos -i enp1s0f1
DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
        prio:0 dscp:07,06,05,04,03,02,01,00,
        prio:1 dscp:15,14,13,12,11,10,09,08,
        prio:2 dscp:23,22,21,20,19,18,17,16,
        prio:3 dscp:31,30,29,28,27,26,25,24,
        prio:4 dscp:39,38,37,36,35,34,33,32,
        prio:5 dscp:47,46,45,44,43,42,41,40,
        prio:6 dscp:55,54,53,52,51,50,49,48,
        prio:7 dscp:63,62,61,60,59,58,57,56,
default priority:
Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0   
        buffer      0   0   0   0   1   0   0   0   
tc: 0 ratelimit: unlimited, tsa: strict
         priority:  0
         priority:  1
         priority:  2
         priority:  3
         priority:  4
         priority:  5
         priority:  6
         priority:  7
[root@server5 ~]# cat /sys/class/net/*/ecn/roce_np/cnp_dscp
40
40
[root@server5 ~]# 

Server6
[root@server6 ~]# ibdev2netdev 
mlx5_0 port 1 ==> enp7s0 (Up)
mlx5_1 port 1 ==> enp8s0 (Up)
[root@server6 ~]# cat /etc/sysconfig/network-scripts/config-rocev2.sh
#enp7s0
mlnx_qos -i enp7s0 --trust dscp
mlnx_qos -i enp7s0 --pfc 0,0,0,0,1,0,0,0
cma_roce_mode -d mlx5_0 -p 1 -m 2
echo 128 > /sys/class/infiniband/mlx5_0/tc/1/traffic_class
cma_roce_tos -d mlx5_0 -t 128
echo 1 > /sys/class/net/enp7s0/ecn/roce_np/enable/1
echo 1 > /sys/class/net/enp7s0/ecn/roce_rp/enable/1
echo 40 > /sys/class/net/enp7s0/ecn/roce_np/cnp_dscp
sysctl -w net.ipv4.tcp_ecn=1
# enp8s0
mlnx_qos -i enp8s0 --trust dscp
mlnx_qos -i enp8s0 --pfc 0,0,0,0,1,0,0,0
cma_roce_mode -d mlx5_1 -p 1 -m 2
echo 128 > /sys/class/infiniband/mlx5_1/tc/1/traffic_class
cma_roce_tos -d mlx5_1 -t 128
echo 1 > /sys/class/net/enp8s0/ecn/roce_np/enable/1
echo 1 > /sys/class/net/enp8s0/ecn/roce_rp/enable/1
echo 40 > /sys/class/net/enp8s0/ecn/roce_np/cnp_dscp
[root@server6 ~]# mlnx_qos -i enp7s0
DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
        prio:0 dscp:07,06,05,04,03,02,01,00,
        prio:1 dscp:15,14,13,12,11,10,09,08,
        prio:2 dscp:23,22,21,20,19,18,17,16,
        prio:3 dscp:31,30,29,28,27,26,25,24,
        prio:4 dscp:39,38,37,36,35,34,33,32,
        prio:5 dscp:47,46,45,44,43,42,41,40,
        prio:6 dscp:55,54,53,52,51,50,49,48,
        prio:7 dscp:63,62,61,60,59,58,57,56,
default priority:
Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,max_buffer_size=262016
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0   
        buffer      0   0   0   0   1   0   0   0   
tc: 0 ratelimit: unlimited, tsa: strict
         priority:  0
         priority:  1
         priority:  2
         priority:  3
         priority:  4
         priority:  5
         priority:  6
         priority:  7
[root@server6 ~]# mlnx_qos -i enp8s0
DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
        prio:0 dscp:07,06,05,04,03,02,01,00,
        prio:1 dscp:15,14,13,12,11,10,09,08,
        prio:2 dscp:23,22,21,20,19,18,17,16,
        prio:3 dscp:31,30,29,28,27,26,25,24,
        prio:4 dscp:39,38,37,36,35,34,33,32,
        prio:5 dscp:47,46,45,44,43,42,41,40,
        prio:6 dscp:55,54,53,52,51,50,49,48,
        prio:7 dscp:63,62,61,60,59,58,57,56,
default priority:
Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,max_buffer_size=262016
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0   
        buffer      0   0   0   0   1   0   0   0   
tc: 0 ratelimit: unlimited, tsa: strict
         priority:  0
         priority:  1
         priority:  2
         priority:  3
         priority:  4
         priority:  5
         priority:  6
         priority:  7
[root@server6 ~]# cat /sys/class/net/*/ecn/roce_np/cnp_dscp
40
40
[root@server6 ~]# 

5.1.3 部署BeeGFS

5.1.3.1 安装各个服务的软件包

Server4:meta、storage、mgmt

[root@server4 ~]# cd /etc/yum.repos.d/
[root@server4 yum.repos.d]# wget https://www.beegfs.io/release/beegfs_7.3.3/dists/beegfs-rhel8.repo
[root@server4 yum.repos.d]# yum makecache
[root@server4 ~]# yum install beegfs-mgmtd
[root@server4 ~]# yum install beegfs-meta libbeegfs-ib
[root@server4 ~]# yum install beegfs-storage libbeegfs-ib

Server5:meta、storage、mgmt

[root@server5 ~]# cd /etc/yum.repos.d/
[root@server5 yum.repos.d]# wget https://www.beegfs.io/release/beegfs_7.3.3/dists/beegfs-rhel8.repo
[root@server5 yum.repos.d]# yum makecache
[root@server5 ~]# yum install beegfs-mgmtd
[root@server5 ~]# yum install beegfs-meta libbeegfs-ib
[root@server5 ~]# yum install beegfs-storage libbeegfs-ib

Server6:client

[root@server6 ~]# cd /etc/yum.repos.d/
[root@server6 yum.repos.d]# wget https://www.beegfs.io/release/beegfs_7.3.3/dists/beegfs-rhel8.repo
[root@server6 yum.repos.d]# yum makecache
[root@server6 ~]# yum install beegfs-client beegfs-helperd beegfs-utils

5.1.3.2 编译客户端内核模块
[root@server6 ~]# cat /etc/beegfs/beegfs-client-autobuild.conf
# This is a config file for the automatic build process of BeeGFS client kernel
# modules.
# http://www.beegfs.com

#
# --- Section: [Notes] ---
#

# General Notes
# =============
# To force a rebuild of the client modules:
#  $ /etc/init.d/beegfs-client rebuild
#
# To see a list of available build arguments:
#  $ make help -C /opt/beegfs/src/client/client_module_${BEEGFS_MAJOR_VERSION}/build
#
#  Help example for BeeGFS 2015.03 release:
#   $ make help -C /opt/beegfs/src/client/client_module_2015.03/build

# RDMA Support Notes
# ==================
# If you installed InfiniBand kernel modules from OpenFabrics OFED, then also
# define the correspsonding header include path by adding
# "OFED_INCLUDE_PATH=<path>" to the "buildArgs", where <path> usually is
# "/usr/src/openib/include" or "/usr/src/ofa_kernel/default/include" for
# Mellanox OFED.
#
# OFED headers are automatically detected even if OFED_INCLUDE_PATH is not
# defined. To build the client without RDMA support, define BEEGFS_NO_RDMA=1.
#

# NVIDIA GPUDirect Storage Support Notes
# ==================
# If you want to build BeeGFS with NVIDIA GPUDirect Storage support, add
# "NVFS_INCLUDE_PATH=<path>" to the "buildArgs" below, where path is the directory
# that contains nvfs-dma.h. This is usually the nvidia-fs source directory:
# /usr/src/nvidia-fs-VERSION.
#
# If config-host.h is not present in NVFS_INCLUDE_PATH, execute the configure
# script. Example:
# $ cd /usr/src/nvidia-fs-2.13.5
# $ ./configure
#
# NVIDIA_INCLUDE_PATH must be defined and point to the NVIDIA driver source:
# /usr/src/nvidia-VERSION/nvidia
#
# OFED_INCLUDE_PATH must be defined and point to Mellanox OFED.
#

#
# --- Section: [Build Settings] ---
#

# Build Settings
# ==============
# These are the arguments for the client module "make" command.
#
# Note: Quotation marks and equal signs can be used without escape characters
# here.
#
# Example1:
#  buildArgs=-j8
#
# Example2 (see "RDMA Support Notes" above):
#  buildArgs=-j8 OFED_INCLUDE_PATH=/usr/src/openib/include
#
# Example3 (see "NVIDIA GPUDirect Storage Support Notes" above):
#  buildArgs=-j8 OFED_INCLUDE_PATH=/usr/src/ofa_kernel/default/include \
#    NVFS_INCLUDE_PATH=/usr/src/nvidia-fs-2.13.5 \
#    NVIDIA_INCLUDE_PATH=/usr/src/nvidia-520.61.05/nvidia
#
# Default:
#  buildArgs=-j8

buildArgs=-j8 OFED_INCLUDE_PATH=/usr/src/ofa_kernel/default/include

# Turn Autobuild on/off
# =====================
# Controls whether modules will be built on "/etc/init.d/beegfs-client start".
#
# Note that even if autobuild is enabled here, the modules will only be built
# if no beegfs kernel module for the current kernel version exists in
# "/lib/modules/<kernel_version>/updates/".
#
# Default:
#  buildEnabled=true

buildEnabled=true
[root@server6 ~]# cat /etc/beegfs/beegfs-client.conf 
# This is a config file for BeeGFS clients.
# http://www.beegfs.com

# --- [Table of Contents] ---
#
# 1) Settings
# 2) Mount Options
# 3) Basic Settings Documentation
# 4) Advanced Settings Documentation

#
# --- Section 1.1: [Basic Settings] ---
#

sysMgmtdHost                  = server5

#
# --- Section 1.2: [Advanced Settings] ---
#

connAuthFile                  =
connDisableAuthentication     = true
connClientPortUDP             = 8004
connHelperdPortTCP            = 8006
connMgmtdPortTCP              = 8008
connMgmtdPortUDP              = 8008
connPortShift                 = 0

connCommRetrySecs             = 600
connFallbackExpirationSecs    = 900
connInterfacesFile            = /etc/beegfs/interface.conf
connRDMAInterfacesFile        = /etc/beegfs/interface.conf
connMaxInternodeNum           = 12
connMaxConcurrentAttempts     = 0
connNetFilterFile             = /etc/beegfs/network.conf

connUseRDMA                   = true
connTCPFallbackEnabled        = true
connTCPRcvBufSize             = 0
connUDPRcvBufSize             = 0
connRDMABufNum                = 70
connRDMABufSize               = 8192
connRDMATypeOfService         = 0
connTcpOnlyFilterFile         =

logClientID                   = false
logHelperdIP                  =
logLevel                      = 3
logType                       = helperd

quotaEnabled                  = false

sysCreateHardlinksAsSymlinks  = false
sysMountSanityCheckMS         = 11000
sysSessionCheckOnClose        = false
sysSyncOnClose                = false
sysTargetOfflineTimeoutSecs   = 900
sysUpdateTargetStatesSecs     = 30
sysXAttrsEnabled              = false

tuneFileCacheType             = buffered
tunePreferredMetaFile         =
tunePreferredStorageFile      =
tuneRemoteFSync               = true
tuneUseGlobalAppendLocks      = false
tuneUseGlobalFileLocks        = false

#
# --- Section 1.3: [Enterprise Features] ---
#
# See end-user license agreement for definition and usage limitations of
# enterprise features.
#

sysACLsEnabled                = false
[root@server6 ~]# mkdir /mnt/beegfs
[root@server6 ~]# cat /etc/beegfs/beegfs-mounts.conf 
/mnt/beegfs /etc/beegfs/beegfs-client.conf
[root@server6 ~]# cat /etc/beegfs/interface.conf 
enp7s0
[root@server6 ~]# cat /etc/beegfs/network.conf 
172.16.8.0/24
[root@server6 ~]# /etc/init.d/beegfs-client rebuild
[root@server6 ~]# systemctl restart beegfs-client
[root@server6 ~]# systemctl status beegfs-client
● beegfs-client.service - Start BeeGFS Client
   Loaded: loaded (/usr/lib/systemd/system/beegfs-client.service; enabled; vendor preset: disabled)
   Active: active (exited) since Tue 2023-06-27 19:25:17 CST; 18min ago
  Process: 22301 ExecStop=/etc/init.d/beegfs-client stop (code=exited, status=0/SUCCESS)
  Process: 22323 ExecStart=/etc/init.d/beegfs-client start (code=exited, status=0/SUCCESS)
 Main PID: 22323 (code=exited, status=0/SUCCESS)

6月 27 19:25:17 server6 systemd[1]: Starting Start BeeGFS Client...
6月 27 19:25:17 server6 beegfs-client[22323]: Starting BeeGFS Client:
6月 27 19:25:17 server6 beegfs-client[22323]: - Loading BeeGFS modules
6月 27 19:25:17 server6 beegfs-client[22323]: - Mounting directories from /etc/beegfs/beegfs-mounts.conf
6月 27 19:25:17 server6 systemd[1]: Started Start BeeGFS Client.
[root@server6 ~]# lsmod | grep beegfs
beegfs                540672  1
rdma_cm               118784  2 beegfs,rdma_ucm
ib_core               425984  9 beegfs,rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx_compat             16384  12 beegfs,rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
[root@server6 ~]# 
5.1.3.3 BeeGFS配置

Server5和Server6上的存储空间分配。

[root@server4 ~]# mkdir -p /mnt/beegfs/{mgmtd,meta,storage}
[root@server4 ~]# fdisk  -l /dev/nvme0n1
Disk /dev/nvme0n1:953.87 GiB,1024209543168 字节,2000409264 个扇区
磁盘型号:ZHITAI TiPlus5000 1TB                   
单元:扇区 / 1 * 512 = 512 字节
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节
磁盘标签类型:gpt
磁盘标识符:090F6714-0F4E-E543-8293-10A0405490DE

设备                起点       末尾       扇区  大小 类型
/dev/nvme0n1p1      2048  104859647  104857600   50G Linux 文件系统
/dev/nvme0n1p2 104859648  209717247  104857600   50G Linux 文件系统
/dev/nvme0n1p3 209717248 1258293247 1048576000  500G Linux 文件系统
[root@server4 ~]# mkfs.ext4 /dev/nvme0n1p1
[root@server4 ~]# mkfs.ext4 /dev/nvme0n1p2
[root@server4 ~]# mkfs.xfs /dev/nvme0n1p3
[root@server4 ~]# mount /dev/nvme0n1p1 /mnt/beegfs/mgmtd/
[root@server4 ~]# mount /dev/nvme0n1p2 /mnt/beegfs/meta/
[root@server4 ~]# mount /dev/nvme0n1p3 /mnt/beegfs/storage/

[root@server5 ~]# mkdir -p /mnt/beegfs/{mgmtd,meta,storage}
[root@server5 ~]# fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1:953.87 GiB,1024209543168 字节,2000409264 个扇区
磁盘型号:ZHITAI TiPlus5000 1TB                   
单元:扇区 / 1 * 512 = 512 字节
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节
磁盘标签类型:gpt
磁盘标识符:A64F55F2-0650-8A40-BE56-BC451387B729

设备                起点       末尾       扇区  大小 类型
/dev/nvme0n1p1      2048  104859647  104857600   50G Linux 文件系统
/dev/nvme0n1p2 104859648  209717247  104857600   50G Linux 文件系统
/dev/nvme0n1p3 209717248 1258293247 1048576000  500G Linux 文件系统
[root@server5 ~]# mkfs.ext4 /dev/nvme0n1p1
[root@server5 ~]# mkfs.ext4 /dev/nvme0n1p2
[root@server5 ~]# mkfs.xfs /dev/nvme0n1p3
[root@server5 ~]# mount /dev/nvme0n1p1 /mnt/beegfs/mgmtd/
[root@server5 ~]# mount /dev/nvme0n1p2 /mnt/beegfs/meta/
[root@server5 ~]# mount /dev/nvme0n1p3 /mnt/beegfs/storage/ 

Mgmt服务配置。

[root@server5 ~]# /opt/beegfs/sbin/beegfs-setup-mgmtd -p /mnt/beegfs/mgmtd
[root@server5 ~]# systemctl restart beegfs-mgmtd
[root@server5 ~]# systemctl status beegfs-mgmtd
● beegfs-mgmtd.service - BeeGFS Management Server
     Loaded: loaded (/usr/lib/systemd/system/beegfs-mgmtd.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2023-06-25 11:22:00 CST; 2 days ago
       Docs: http://www.beegfs.com/content/documentation/
   Main PID: 18739 (beegfs-mgmtd/Ma)
      Tasks: 11 (limit: 45464)
     Memory: 13.9M
     CGroup: /system.slice/beegfs-mgmtd.service
             └─ 18739 /opt/beegfs/sbin/beegfs-mgmtd cfgFile=/etc/beegfs/beegfs-mgmtd.conf runDaemonized=false

6月 25 11:22:00 server5 systemd[1]: Started BeeGFS Management Server.
[root@server5 ~]# 

Meta服务配置。

Server4
[root@server4 ~]# /opt/beegfs/sbin/beegfs-setup-meta -p /mnt/beegfs/meta -s 54 -m server5
[root@server4 ~]# systemctl restart beegfs-meta
[root@server4 ~]# systemctl status beegfs-meta
● beegfs-meta.service - BeeGFS Metadata Server
     Loaded: loaded (/usr/lib/systemd/system/beegfs-meta.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2023-06-25 16:31:57 CST; 2 days ago
       Docs: http://www.beegfs.com/content/documentation/
   Main PID: 4444 (beegfs-meta/Mai)
      Tasks: 63 (limit: 45901)
     Memory: 2.2G
     CGroup: /system.slice/beegfs-meta.service
             └─ 4444 /opt/beegfs/sbin/beegfs-meta cfgFile=/etc/beegfs/beegfs-meta.conf runDaemonized=false

6月 25 16:31:57 server4 systemd[1]: Started BeeGFS Metadata Server.
[root@server4 ~]# 

Server5
[root@server5 ~]# /opt/beegfs/sbin/beegfs-setup-meta -p /mnt/beegfs/meta -s 55 -m server5
[root@server5 ~]# systemctl restart beegfs-meta
[root@server5 ~]# systemctl status beegfs-meta
● beegfs-meta.service - BeeGFS Metadata Server
     Loaded: loaded (/usr/lib/systemd/system/beegfs-meta.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2023-06-25 11:22:16 CST; 2 days ago
       Docs: http://www.beegfs.com/content/documentation/
   Main PID: 18763 (beegfs-meta/Mai)
      Tasks: 87 (limit: 45464)
     Memory: 1.7G
     CGroup: /system.slice/beegfs-meta.service
             └─ 18763 /opt/beegfs/sbin/beegfs-meta cfgFile=/etc/beegfs/beegfs-meta.conf runDaemonized=false

6月 25 11:22:16 server5 systemd[1]: Started BeeGFS Metadata Server.
[root@server5 ~]# 

Storage服务配置。

Server4
[root@server4 ~]# /opt/beegfs/sbin/beegfs-setup-storage -p /mnt/beegfs/storage -s 540 -i 5401 -m server5 -f
[root@server4 ~]# systemctl restart beegfs-storage
[root@server4 ~]# systemctl status beegfs-storage
● beegfs-storage.service - BeeGFS Storage Server
     Loaded: loaded (/usr/lib/systemd/system/beegfs-storage.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2023-06-25 15:46:33 CST; 2 days ago
       Docs: http://www.beegfs.com/content/documentation/
   Main PID: 4197 (beegfs-storage/)
      Tasks: 21 (limit: 45901)
     Memory: 118.4M
     CGroup: /system.slice/beegfs-storage.service
             └─ 4197 /opt/beegfs/sbin/beegfs-storage cfgFile=/etc/beegfs/beegfs-storage.conf runDaemonized=false

6月 25 15:46:33 server4 systemd[1]: Started BeeGFS Storage Server.
[root@server4 ~]# 

Server5
[root@server5 ~]# /opt/beegfs/sbin/beegfs-setup-storage -p /mnt/beegfs/storage -s 550 -i 5501 -m server5 -f
[root@server5 ~]# systemctl restart beegfs-storage.service 
[root@server5 ~]# systemctl status beegfs-storage.service 
● beegfs-storage.service - BeeGFS Storage Server
     Loaded: loaded (/usr/lib/systemd/system/beegfs-storage.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2023-06-25 11:29:58 CST; 2 days ago
       Docs: http://www.beegfs.com/content/documentation/
   Main PID: 18901 (beegfs-storage/)
      Tasks: 21 (limit: 45464)
     Memory: 124.8M
     CGroup: /system.slice/beegfs-storage.service
             └─ 18901 /opt/beegfs/sbin/beegfs-storage cfgFile=/etc/beegfs/beegfs-storage.conf runDaemonized=false

6月 25 11:29:58 server5 systemd[1]: Started BeeGFS Storage Server.
[root@server5 ~]# 
5.1.3.4 状态检查
[root@server6 ~]# beegfs-check-servers 
Management
==========
server5 [ID: 1]: reachable at 172.16.8.55:8008 (protocol: TCP)

Metadata
==========
server4 [ID: 54]: reachable at 172.16.8.54:8005 (protocol: RDMA)
server5 [ID: 55]: reachable at 172.16.8.55:8005 (protocol: RDMA)

Storage
==========
server4 [ID: 540]: reachable at 172.16.8.54:8003 (protocol: RDMA)
server5 [ID: 550]: reachable at 172.16.8.55:8003 (protocol: RDMA)

[root@server6 ~]# beegfs-df
METADATA SERVERS:
TargetID   Cap. Pool        Total         Free    %      ITotal       IFree    %
========   =========        =====         ====    =      ======       =====    =
      54         low      48.9GiB      48.7GiB  99%        3.3M        3.2M  98%
      55         low      48.9GiB      48.7GiB  99%        3.3M        3.2M  98%

STORAGE TARGETS:
TargetID   Cap. Pool        Total         Free    %      ITotal       IFree    %
========   =========        =====         ====    =      ======       =====    =
    5401         low     499.8GiB     464.2GiB  93%      262.1M      262.1M 100%
    5501         low     499.8GiB     464.2GiB  93%      262.1M      262.1M 100%
[root@server6 ~]# beegfs-ctl --listnodes --nodetype=meta --nicdetails
server4 [ID: 54]
   Ports: UDP: 8005; TCP: 8005
   Interfaces: 
   + enp1s0f0[ip addr: 172.16.8.54; type: RDMA]
   + enp1s0f0[ip addr: 172.16.8.54; type: TCP]
server5 [ID: 55]
   Ports: UDP: 8005; TCP: 8005
   Interfaces: 
   + enp1s0f0[ip addr: 172.16.8.55; type: RDMA]
   + enp1s0f0[ip addr: 172.16.8.55; type: TCP]

Number of nodes: 2
Root: 55
[root@server6 ~]# beegfs-ctl --listnodes --nodetype=storage --nicdetails
server4 [ID: 540]
   Ports: UDP: 8003; TCP: 8003
   Interfaces: 
   + enp1s0f0[ip addr: 172.16.8.54; type: RDMA]
   + enp1s0f0[ip addr: 172.16.8.54; type: TCP]
server5 [ID: 550]
   Ports: UDP: 8003; TCP: 8003
   Interfaces: 
   + enp1s0f0[ip addr: 172.16.8.55; type: RDMA]
   + enp1s0f0[ip addr: 172.16.8.55; type: TCP]

Number of nodes: 2
[root@server6 ~]# beegfs-ctl --listnodes --nodetype=client --nicdetails
5751-649AC71D-server6 [ID: 8]
   Ports: UDP: 8004; TCP: 0
   Interfaces: 
   + enp7s0[ip addr: 172.16.8.56; type: TCP]
   + enp7s0[ip addr: 172.16.8.56; type: RDMA]

Number of nodes: 1
[root@server6 ~]# beegfs-net 

mgmt_nodes
=============
server5 [ID: 1]
   Connections: TCP: 1 (172.16.8.55:8008); 

meta_nodes
=============
server4 [ID: 54]
   Connections: RDMA: 4 (172.16.8.54:8005); 
server5 [ID: 55]
   Connections: RDMA: 4 (172.16.8.55:8005); 

storage_nodes
=============
server4 [ID: 540]
   Connections: RDMA: 4 (172.16.8.54:8003); 
server5 [ID: 550]
   Connections: RDMA: 4 (172.16.8.55:8003); 

[root@server6 ~]# 

5.1.4 挂载测试

[root@server6 ~]# df -h
文件系统                容量  已用  可用 已用% 挂载点
devtmpfs                1.8G     0  1.8G    0% /dev
tmpfs                   1.8G  4.0K  1.8G    1% /dev/shm
tmpfs                   1.8G  8.7M  1.8G    1% /run
tmpfs                   1.8G     0  1.8G    0% /sys/fs/cgroup
/dev/mapper/rocky-root  9.0G  6.8G  2.2G   76% /
/dev/vda2               994M  431M  564M   44% /boot
/dev/vda1                99M  5.8M   94M    6% /boot/efi
tmpfs                   367M     0  367M    0% /run/user/0
beegfs_nodev           1000G   72G  929G    8% /mnt/beegfs
[root@server6 ~]# 

5.1.5 安装IO500(IOR&mdtest)

安装OpenMPI。

[root@server6 ~]# mkdir iobench_tools
# 下载源码包
[root@server6 iobench_tools]# wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.1.tar.gz
[root@server6 iobench_tools]# tar xvf openmpi-4.1.1.tar.gz
[root@server6 iobench_tools]# cd openmpi-4.1.1
# 编译安装
[root@server6 openmpi-4.1.1]# yum install automake gcc gcc-c++ gcc-gfortran
[root@server6 openmpi-4.1.1]# mkdir /usr/local/openmpi
[root@server6 openmpi-4.1.1]# ./configure --prefix=/usr/local/openmpi/
[root@server6 openmpi-4.1.1]# make
[root@server6 openmpi-4.1.1]# make install
# 配置环境变量
[root@server6 openmpi-4.1.1]# export PATH=$PATH:/usr/local/openmpi/bin
[root@server6 openmpi-4.1.1]# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi/lib
# 安装结果验证
[root@server6 openmpi-4.1.1]# mpirun --version
mpirun (Open MPI) 4.1.1

Report bugs to http://www.open-mpi.org/community/help/
# 运行测试
[root@server6 openmpi-4.1.1]# cd ..
[root@server6 iobench_tools]# echo '#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    printf("Hello from process %d\n", world_rank);
    MPI_Finalize();
    return 0;
}' > mpi_hello.c
[root@server6 iobench_tools]# mpicc mpi_hello.c -o mpi_hello
[root@server6 iobench_tools]# mpirun --allow-run-as-root -mca btl ^openib -n 2 ./mpi_hello
Hello from process 0
Hello from process 1
[root@server6 iobench_tools]# 
# 添加环境变量到用户的bashrc文件
[root@server6 iobench_tools]# tail ~/.bashrc
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

export PATH=$PATH:/usr/local/openmpi/bin:/usr/local/ior/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi/lib:/usr/local/ior/lib
export MPI_CC=mpicc

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
[root@server6 iobench_tools]# source ~/.bashrc

安装IOR(mdtest)。

# 下载源代码
[root@server6 iobench_tools]# yum install git
[root@server6 iobench_tools]# git clone https://github.com/hpc/ior.git
[root@server6 iobench_tools]# cd ior/
# 编译安装
[root@server6 ior]# ./bootstrap 
[root@server6 ior]# mkdir /usr/local/ior
[root@server6 ior]# ./configure --prefix=/usr/local/ior/
[root@server6 ior]# make
[root@server6 ior]# make install
# 添加环境变量到用户的bashrc文件
[root@server6 ior]# tail ~/.bashrc
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

export PATH=$PATH:/usr/local/openmpi/bin:/usr/local/ior/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi/lib:/usr/local/ior/lib
export MPI_CC=mpicc

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
[root@server6 ior]# source ~/.bashrc 

安装IO500。

# 下载源码
[root@server6 iobench_tools]# git clone https://github.com/IO500/io500.git
[root@server6 iobench_tools]# cd io500
# 编译安装
[root@server6 io500]# ./prepare.sh
# 获取所有配置项
[root@server6 io500]# ./io500 --list > config-all.ini
# 自定义测试配置
[root@server6 io500]# cat config-beegfs.ini 
[global]
datadir = /mnt/beegfs/io500
timestamp-datadir = TRUE
resultdir = ./results
timestamp-resultdir = TRUE
api = POSIX
drop-caches = FALSE
drop-caches-cmd = sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches"
io-buffers-on-gpu = FALSE
verbosity = 1
scc = TRUE
dataPacketType = timestamp

[debug]
stonewall-time = 30

[ior-easy]
API = 
transferSize = 1m
blockSize = 204800m
filePerProc = TRUE
uniqueDir = FALSE
run = TRUE
verbosity = 

[ior-easy-write]
API = 
run = TRUE

[mdtest-easy]
API = 
n = 500000
run = TRUE

[mdtest-easy-write]
API = 
run = TRUE

[find-easy]
external-script = 
external-mpi-args = 
external-extra-args = 
nproc = 
run = TRUE
pfind-queue-length = 10000
pfind-steal-next = FALSE
pfind-parallelize-single-dir-access-using-hashing = FALSE

[ior-hard]
API = 
segmentCount = 500000
collective = 
run = TRUE
verbosity = 

[ior-hard-write]
API = 
collective = 
run = TRUE

[mdtest-hard]
API = 
n = 500000
files-per-dir = 
run = TRUE

[mdtest-hard-write]
API = 
run = TRUE

[find]
external-script = 
external-mpi-args = 
external-extra-args = 
nproc = 
run = TRUE
pfind-queue-length = 10000
pfind-steal-next = FALSE
pfind-parallelize-single-dir-access-using-hashing = FALSE

[find-hard]
external-script = 
external-mpi-args = 
external-extra-args = 
nproc = 
run = FALSE
pfind-queue-length = 10000
pfind-steal-next = FALSE
pfind-parallelize-single-dir-access-using-hashing = FALSE

[mdworkbench-bench]
run = FALSE

[ior-easy-read]
API = 
run = TRUE

[mdtest-easy-stat]
API = 
run = TRUE

[ior-hard-read]
API = 
collective = 
run = TRUE

[mdtest-hard-stat]
API = 
run = TRUE

[mdtest-easy-delete]
API = 
run = TRUE

[mdtest-hard-read]
API = 
run = TRUE

[mdtest-hard-delete]
API = 
run = TRUE

[root@server6 io500]# 

5.1.6 安装dbench

[root@server6 iobench_tools]# yum install dbench

5.2 交换机

5.2.1 CX532P-N的配置结果

532# show running-config 
!
class-map ecn_map
 match cos 3 4
!
vlan 456
!
policy-map ecn
 class ecn_map
  wred default_ecn
!
interface ethernet 0/16
 breakout 4x25G[10G]
 service-policy ecn
 switchport access vlan 456
exit
!
interface ethernet 0/17
 service-policy ecn
 switchport access vlan 456
exit
!
interface ethernet 0/18
 service-policy ecn
 switchport access vlan 456
exit
!
interface ethernet 0/19
 service-policy ecn
 switchport access vlan 456
exit
!
ip route 0.0.0.0/0 10.230.1.1 200
!
end

532# show interface priority-flow-control 
       Port    PFC0    PFC1    PFC2    PFC3    PFC4    PFC5    PFC6    PFC7
-----------  ------  ------  ------  ------  ------  ------  ------  ------
        0/0       -       -       -  enable  enable       -       -       -
        0/4       -       -       -  enable  enable       -       -       -
        0/8       -       -       -  enable  enable       -       -       -
       0/12       -       -       -  enable  enable       -       -       -
       0/16       -       -       -  enable  enable       -       -       -
       0/17       -       -       -  enable  enable       -       -       -
       0/18       -       -       -  enable  enable       -       -       -
       0/19       -       -       -  enable  enable       -       -       -
       0/20       -       -       -  enable  enable       -       -       -
       0/24       -       -       -  enable  enable       -       -       -
       0/28       -       -       -  enable  enable       -       -       -
       0/32       -       -       -  enable  enable       -       -       -
       0/36       -       -       -  enable  enable       -       -       -
       0/40       -       -       -  enable  enable       -       -       -
       0/44       -       -       -  enable  enable       -       -       -
       0/48       -       -       -  enable  enable       -       -       -
       0/52       -       -       -  enable  enable       -       -       -
       0/56       -       -       -  enable  enable       -       -       -
       0/60       -       -       -  enable  enable       -       -       -
       0/64       -       -       -  enable  enable       -       -       -
       0/68       -       -       -  enable  enable       -       -       -
       0/72       -       -       -  enable  enable       -       -       -
       0/76       -       -       -  enable  enable       -       -       -
       0/80       -       -       -  enable  enable       -       -       -
       0/84       -       -       -  enable  enable       -       -       -
       0/88       -       -       -  enable  enable       -       -       -
       0/92       -       -       -  enable  enable       -       -       -
       0/96       -       -       -  enable  enable       -       -       -
      0/100       -       -       -  enable  enable       -       -       -
      0/104       -       -       -  enable  enable       -       -       -
      0/108       -       -       -  enable  enable       -       -       -
      0/112       -       -       -  enable  enable       -       -       -
      0/116       -       -       -  enable  enable       -       -       -
      0/120       -       -       -  enable  enable       -       -       -
      0/124       -       -       -  enable  enable       -       -       -
 
532# show interface ecn
       Port    ECN0    ECN1    ECN2    ECN3    ECN4    ECN5    ECN6    ECN7
-----------  ------  ------  ------  ------  ------  ------  ------  ------
        0/0       -       -       -       -       -       -       -       -
        0/4       -       -       -       -       -       -       -       -
        0/8       -       -       -       -       -       -       -       -
       0/12       -       -       -       -       -       -       -       -
       0/16       -       -       -  enable  enable       -       -       -
       0/17       -       -       -  enable  enable       -       -       -
       0/18       -       -       -  enable  enable       -       -       -
       0/19       -       -       -  enable  enable       -       -       -
       0/20       -       -       -       -       -       -       -       -
       0/24       -       -       -       -       -       -       -       -
       0/28       -       -       -       -       -       -       -       -
       0/32       -       -       -       -       -       -       -       -
       0/36       -       -       -       -       -       -       -       -
       0/40       -       -       -       -       -       -       -       -
       0/44       -       -       -       -       -       -       -       -
       0/48       -       -       -       -       -       -       -       -
       0/52       -       -       -       -       -       -       -       -
       0/56       -       -       -       -       -       -       -       -
       0/60       -       -       -       -       -       -       -       -
       0/64       -       -       -       -       -       -       -       -
       0/68       -       -       -       -       -       -       -       -
       0/72       -       -       -       -       -       -       -       -
       0/76       -       -       -       -       -       -       -       -
       0/80       -       -       -       -       -       -       -       -
       0/84       -       -       -       -       -       -       -       -
       0/88       -       -       -       -       -       -       -       -
       0/92       -       -       -       -       -       -       -       -
       0/96       -       -       -       -       -       -       -       -
      0/100       -       -       -       -       -       -       -       -
      0/104       -       -       -       -       -       -       -       -
      0/108       -       -       -       -       -       -       -       -
      0/112       -       -       -       -       -       -       -       -
      0/116       -       -       -       -       -       -       -       -
      0/120       -       -       -       -       -       -       -       -
      0/124       -       -       -       -       -       -       -       -

532# 

6 参考资料

【1】BeeGFS Documentation 7.3.3 – Architecture、Quick Start Guide、RDMA Support
【2】高性能计算IO 500存储优化:实践与经验
【3】Github:open-mpi/ompi
【4】Github:IO500/io500
【5】AWS FSx Lustre 并行文件系统在 HPC 中的应用和性能评估

A-lab-部署验证

对星融元产品感兴趣?

立即联系!

返回顶部

© 星融元数据技术(苏州)有限公司 苏ICP备17070048号-2