Slowest Sync Ever

Just built a new 2-node cluster and created a 500G resource. The sync speed is extremely slow. After 12+ hours, its is at 65.8% and still running at about 5M/sec.

I have confirmed that the network, hardware, and disks are fast.

[root@ha57a 0]# linstor rg lp rg0
╭────────────────────────────────────────────────╮
┊ Key ┊ Value ┊
╞════════════════════════════════════════════════╡
┊ DrbdOptions/Net/max-buffers ┊ 81920 ┊
┊ DrbdOptions/PeerDevice/c-fill-target ┊ 2048 ┊
┊ DrbdOptions/PeerDevice/c-max-rate ┊ 2048000 ┊
┊ DrbdOptions/Resource/auto-promote ┊ no ┊
┊ PeerSlotsNewResource ┊ 3 ┊
╰────────────────────────────────────────────────╯

The build is…

Rocky 9.6
DRBD 9.31.0-3.el9
Linstor 1.31.1-1.el9
6 x nvme disks

The resyncs are pretty slow by default, because that is safest. The resync IO is “background resync” It’s effectively a “catch-up” from a disconnect that occurs simultaneously while new writes and application IO is also occurring. If the resync goes too fast, it would begin to slow down the application performance. That is usually undesirable. Are you testing and observing sync-speeds while the volumes are not in use, or not?

I have a KB article on tuning the resync speeds here: Tuning the DRBD Resync Controller | Knowledge Base

Looking at what you have presently, I would advise you:
Set c-fill-target to 1M. I can’t really explain it, but 1M just seems to always work.
Tune the max-buffers down to 40k
Verify the c-max-rate. 204800KiB/s is roughly 17Gbits/s. Can you network support 17gigabit? Is there any competing traffic on this network? Like say, the “foreground” application IO replication I mentioned earlier? Setting the c-max-rate too high can result in slower resyncs.

Rocky 9.6
DRBD 9.31.0-3.el9
Linstor 1.31.1-1.el9
6 x nvme disks

Please note that the DRBD 9.31.0-3.el9 is going to be purely the userland utilities installed. It is also important to note the kernel module version (the true DRBD software). You can query this via /proc/drbd.

Hi Devin,

This is a new cluster, but we have others, with about 400 DRBD resources between them. We’re accustomed to seeing initial syncs perform at up to 2GB/sec, and rarely lower than about 300MB, so 5MB is painful.

The network is a 100Gb backbone with 25Gb servers, and there’s low utilization and plenty of bandwidth available. iperf3 tests show approx. 24 Gbits. And since this is the first resource on the cluster, nothing is competing for resources.

[root@ha57a ~]# cat /proc/drbd
version: 9.2.13 (api:2/proto:118-122)
GIT-hash: 0457237e0448663529fe161781873b356f17b3c5 build by @buildsystem, 2025-05-13 09:42:39
Transports (api:21): tcp (9.2.13)

I’ll try your recommendations!