Anton
November 13, 2024, 8:04am
1
I configured 2 paths for load-balance and failover. But replication traffic goes only though single path.
2 nodes, Rocky 9.4 (5.14.0-427.42.1.el9_4.x86_64), drbd-9.28.0-1
[root@memverge ~]# cat /etc/drbd.d/resource0.res
resource resource0 {
net
{
load-balance-paths yes;
protocol C;
# fencing resource-and-stonith;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
}
on "memverge" {
device /dev/drbd0;
disk /dev/md0;
meta-disk internal;
# address 10.72.14.152:7789;
node-id 0;
}
on "memverge2" {
device /dev/drbd0;
disk /dev/md0;
meta-disk internal;
# address 10.72.14.154:7789;
node-id 1;
}
connection
{
path
{
host "memverge" address ipv4 192.168.0.6:7900;
host "memverge2" address ipv4 192.168.0.8:7900;
}
path
{
host "memverge" address ipv4 192.168.0.7:7900;
host "memverge2" address ipv4 192.168.0.9:7900;
}
}
}
[root@memverge ~]#
What am I doing wrong ?
kermat
November 13, 2024, 5:19pm
2
Is the drbd_transport_lb_tcp
kernel module loaded?
# lsmod | grep drbd
drbd_transport_tcp 28672 1
handshake 28672 1 drbd_transport_tcp
drbd 700416 3 drbd_transport_tcp
lru_cache 16384 1 drbd
libcrc32c 16384 3 dm_persistent_data,drbd,sctp
# modprobe drbd_transport_lb-tcp
# lsmod | grep drbd
drbd_transport_lb_tcp 28672 0
drbd_transport_tcp 28672 1
handshake 28672 1 drbd_transport_tcp
drbd 700416 4 drbd_transport_lb_tcp,drbd_transport_tcp
lru_cache 16384 1 drbd
libcrc32c 16384 3 dm_persistent_data,drbd,sctp
I haven’t configured load balanced resources on the host I took those outputs from, just FYI in case there are some differences.
Or, maybe you have yet to adjust (drbdadm adjust all
) your resources?
Anton
November 14, 2024, 8:34am
3
There is no drbd_transport_lb-tcp module available.
[root@memverge ~]# modprobe drbd_transport_lb-tcp
modprobe: FATAL: Module drbd_transport_lb-tcp not found in directory /lib/modules/5.14.0-427.42.1.el9_4.x86_64
[root@memverge ~]#
[root@memverge ~]# find / -name drbd_transport_*
/sys/kernel/debug/printk/index/drbd_transport_tcp
/sys/module/drbd_transport_tcp
/sys/module/drbd/holders/drbd_transport_tcp
/usr/lib/modules/5.14.0-427.13.1.el9_4.x86_64/extra/drbd9x/drbd_transport_tcp.ko
/usr/lib/modules/5.14.0-427.42.1.el9_4.x86_64/weak-updates/drbd9x/drbd_transport_tcp.ko
[root@memverge ~]#
I have next drbd packages installed -
[root@memverge ~]# rpm -qa|grep -i drbd
drbd-selinux-9.28.0-1.el9.x86_64
drbd-utils-9.28.0-1.el9.x86_64
drbd-udev-9.28.0-1.el9.x86_64
collectd-drbd-5.12.0-24.el9.x86_64
drbd-9.28.0-1.el9.x86_64
drbd-bash-completion-9.28.0-1.el9.x86_64
drbd-pacemaker-9.28.0-1.el9.x86_64
drbdlinks-1.29-5.el9.noarch
kmod-drbd9x-9.1.22-1.el9_4.elrepo.x86_64
[root@memverge ~]#
kermat
November 14, 2024, 5:02pm
4
Aha, the load-balanced TCP module wasn’t added until DRBD 9.2.6 .
If ELRepo doesn’t have a newer kernel module package and you really want to try the load-balanced TCP transport, you can download and compile a 9.2.6 (or newer) kmod from LINBIT’s prepared source code tarballs (or reach out to LINBIT for eval access to their customer repos).
Anton
November 15, 2024, 6:29pm
5
Ok, thank you Matt.
I’m surprised that Elrepo contains so old version…
I compiled latest stable 9.2.11 and add it at boot for both nodes.
So after nodes boot, I see on both nodes
lsmod | grep drbd
drbd_transport_lb_tcp 40960 1
drbd 1007616 3 drbd_transport_lb_tcp
libcrc32c 16384 2 xfs,drbd
All looks good, but when I generated replication traffic with simplest
rsync -av --progress /home/anton/testfile /mnt/testfile
replication traffic spread slightly unevenly across two paths, but OK.
I checked with iptraf-ng monitoring “General interface statistics”.
P. S., there two nodes mounted in the single rack and directly connected using Nvidia Connect-X5 NICs without any switches.
kermat
November 15, 2024, 8:51pm
6
Yes. When I’ve looked it’s been pretty close, but never “perfectly even”. If you’re seeing wildly different numbers, then please do let us know.
P. S., there two nodes mounted in the single rack and directly connected using Nvidia Connect-X5 NICs without any switches.
Not to send you down a totally different path, but DRBD’s RDMA transport balances traffic over multiple paths.
Anton
November 15, 2024, 11:10pm
7
I like RDMA, but with RoCE there is a small problem.
How to monitor RoCE traffic ?, standart tools such as nload doesn’t work, becase they looked at TCP/IP stack, with RoCE there is no TCP/IP stack.
So how to monitor RoCE network traffic ?
Something like, cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_data is not very informative…