Hi, how to correctly configure synchronization parameters for proxmox. The thing is that each new virtual disk in proxmox is a separate drbd device and when I specify the c-max-rate parameters, they are applied to all resources separately, and not in general. That is, when a large number of resources are synchronized, even a c-max-rate of a couple of megabytes can create a disk load of about 200Mib/s, but when most of the resources are synchronized and only large resources remain, the speed will drop sharply.
I have two servers connected by a 10G network specially allocated for drbd. With the most aggressive parameters, I managed to occupy about 4 gigbits. But in this case, the logical disk representing the raid controller has already ceased to cope, so I would like to limit the overall throughput so that the disk load is no more than 200Mib/s, but at the same time so that synchronization is carried out quickly, if possible, just 200Mib/s.
here are my current settings that quickly sync single drbd devices, but with a large number of syncs, the disk bandwidth is no longer enough
Tell me if it is possible to somehow implement a dynamic sync speed or the only option would be to set less aggressive values โโfor greater stability
I will clarify that we are talking about a storage of several terabytes, synchronization of which at low values โโcan take about a week
The default unit for the c-settings is KiB, so tuning the c-max-rate down to 25600 should limit the resync traffic to 200Mib/s.
The default unit for the sndbuf-size and rcvbuf-size is B, but setting this to 0 will allow the Linux kernel to dynamically set the buffer sizes. I rarely adjust these settings personally. It might be worth testing if setting these options to 0 is all that you need to make things more stable under heavy IO load.
AFAIK, there is no โaggregate bandwidthโ setting which would let you limit the total bandwidth of all drbd replication on a given node. It might be possible to do it with tc traffic shaping I suppose.
However, your storage shouldnโt be falling over if you throw a ton of writes at it: the writes should simply take place at whatever speed your drives can handle. If your storage is falling over then thatโs a separate problem that needs explaining and investigating. What exactly do you mean by โthe disk bandwidth is no longer enoughโ? Enough for what?
Apart from that, the only reason youโre limiting sync speed in the first place is to leave some spare I/O capacity for user operations, and DRBD already does dynamic adjustment for that:
Therefore, as long as you have dynamic resync enabled (which you do, with a non-zero value for c-plan-ahead) you should be fine.
Also, note that Mib/s = bits per second (which you would typically use when discussing network traffic), whereas MiB/s = bytes per second (wich you would typically use when discussing disk I/O throughput)
Thank you for your answer. When I say that the disks are overloaded, I mean that the read and write speeds are not enough for normal operation of virtual machines together with synchronization, and lower values โโโโsolve this problem. Also, when I wrote the question, I did not quite understand that re-synchronization and primary synchronization are different. I have no problems with re-synchronization (I think), the question is specifically about primary synchronization. Letโs say now I have a c-max-rate of 204800, which is quite enough for me and everything works well and the synchronization speed is quite fast. But if I create 3 disks of one terabyte each in Proxmox with such values, then their synchronization speed is summed up and will be equal to 614400 for three resources (as far as I understand). Which will already affect the speed of virtual machines. It turns out that there are three options here as I see it 1 - set the value below to avoid disk overload 2 - manually change the value of this parameter 3 - do not create several virtual disks in proxmox at the same time. I just hoped that there was some solution with a dynamic option (for the first synchronization).
Regarding the units of measurement for the disk, I took the number from Htop, and for c-max-rate I simply specified it via the command linstor controller drbd-options --c-max-rate=200M and these values โโcoincided with these settings during the first synchronization of a new resource.
The dynamic sync speed is supposed to sort this automatically, using only spare I/O capacity for sync.
But if thatโs not working well enough for you, then yes I think youโre right: set the max rate to something low enough to cope with all use cases, and just let the initial sync take longer to complete. Itโs not as if the volume is really โdegradedโ in that state.
TBH, I would love it if Linstor+DRBD would zero out the underlying volume on both sides, by writing zeros and/or by Trim/Discard, instead of initial sync. Or if it understood LVM thin provisioning, it would know that unallocated blocks are already zero.
(Alternatively it could read from both sides, compare checksums, and only copy data when the checksums are different - but that would still give you a lot of read I/O in place of write I/O)