Load-balance-paths yes

I configured 2 paths for load-balance and failover. But replication traffic goes only though single path.

2 nodes, Rocky 9.4 (5.14.0-427.42.1.el9_4.x86_64), drbd-9.28.0-1

[root@memverge ~]# cat /etc/drbd.d/resource0.res
resource resource0 {
net
    {
        load-balance-paths      yes;
        protocol  C;
#       fencing resource-and-stonith;
    }
handlers {
      fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
      after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
        }
  on "memverge" {
    device    /dev/drbd0;
    disk      /dev/md0;
    meta-disk internal;
#    address 10.72.14.152:7789;
    node-id     0;
  }
  on "memverge2" {
    device    /dev/drbd0;
    disk      /dev/md0;
    meta-disk internal;
#    address   10.72.14.154:7789;
node-id         1;
  }
connection
    {
        path
        {
            host "memverge" address ipv4 192.168.0.6:7900;
            host "memverge2" address ipv4 192.168.0.8:7900;
        }
        path
        {
            host "memverge" address ipv4 192.168.0.7:7900;
            host "memverge2" address ipv4 192.168.0.9:7900;
        }
    }
}
[root@memverge ~]#

What am I doing wrong ?

Is the drbd_transport_lb_tcp kernel module loaded?

# lsmod | grep drbd
drbd_transport_tcp     28672  1
handshake              28672  1 drbd_transport_tcp
drbd                  700416  3 drbd_transport_tcp
lru_cache              16384  1 drbd
libcrc32c              16384  3 dm_persistent_data,drbd,sctp

# modprobe drbd_transport_lb-tcp

# lsmod | grep drbd
drbd_transport_lb_tcp    28672  0
drbd_transport_tcp     28672  1
handshake              28672  1 drbd_transport_tcp
drbd                  700416  4 drbd_transport_lb_tcp,drbd_transport_tcp
lru_cache              16384  1 drbd
libcrc32c              16384  3 dm_persistent_data,drbd,sctp

I haven’t configured load balanced resources on the host I took those outputs from, just FYI in case there are some differences.

Or, maybe you have yet to adjust (drbdadm adjust all) your resources?

There is no drbd_transport_lb-tcp module available.

[root@memverge ~]# modprobe drbd_transport_lb-tcp
modprobe: FATAL: Module drbd_transport_lb-tcp not found in directory /lib/modules/5.14.0-427.42.1.el9_4.x86_64
[root@memverge ~]#

[root@memverge ~]# find / -name drbd_transport_*
/sys/kernel/debug/printk/index/drbd_transport_tcp
/sys/module/drbd_transport_tcp
/sys/module/drbd/holders/drbd_transport_tcp
/usr/lib/modules/5.14.0-427.13.1.el9_4.x86_64/extra/drbd9x/drbd_transport_tcp.ko
/usr/lib/modules/5.14.0-427.42.1.el9_4.x86_64/weak-updates/drbd9x/drbd_transport_tcp.ko
[root@memverge ~]#

I have next drbd packages installed -

[root@memverge ~]# rpm -qa|grep -i drbd
drbd-selinux-9.28.0-1.el9.x86_64
drbd-utils-9.28.0-1.el9.x86_64
drbd-udev-9.28.0-1.el9.x86_64
collectd-drbd-5.12.0-24.el9.x86_64
drbd-9.28.0-1.el9.x86_64
drbd-bash-completion-9.28.0-1.el9.x86_64
drbd-pacemaker-9.28.0-1.el9.x86_64
drbdlinks-1.29-5.el9.noarch
kmod-drbd9x-9.1.22-1.el9_4.elrepo.x86_64
[root@memverge ~]#

Aha, the load-balanced TCP module wasn’t added until DRBD 9.2.6.

If ELRepo doesn’t have a newer kernel module package and you really want to try the load-balanced TCP transport, you can download and compile a 9.2.6 (or newer) kmod from LINBIT’s prepared source code tarballs (or reach out to LINBIT for eval access to their customer repos).

Ok, thank you Matt.

I’m surprised that Elrepo contains so old version…

I compiled latest stable 9.2.11 and add it at boot for both nodes.

So after nodes boot, I see on both nodes

lsmod | grep drbd
drbd_transport_lb_tcp 40960 1
drbd 1007616 3 drbd_transport_lb_tcp
libcrc32c 16384 2 xfs,drbd

All looks good, but when I generated replication traffic with simplest

rsync -av --progress /home/anton/testfile /mnt/testfile

replication traffic spread slightly unevenly across two paths, but OK.

I checked with iptraf-ng monitoring “General interface statistics”.

P. S., there two nodes mounted in the single rack and directly connected using Nvidia Connect-X5 NICs without any switches.

Yes. When I’ve looked it’s been pretty close, but never “perfectly even”. If you’re seeing wildly different numbers, then please do let us know.

P. S., there two nodes mounted in the single rack and directly connected using Nvidia Connect-X5 NICs without any switches.

Not to send you down a totally different path, but DRBD’s RDMA transport balances traffic over multiple paths.

I like RDMA, but with RoCE there is a small problem.

How to monitor RoCE traffic ?, standart tools such as nload doesn’t work, becase they looked at TCP/IP stack, with RoCE there is no TCP/IP stack.

So how to monitor RoCE network traffic ?

Something like, cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_data is not very informative…