During heavy IO disk write timeout leads to DRBD going to diskless mode

ramilehti · August 22, 2024, 6:51am

Hi,
During heavy disk IO one of our raid sets experiences a write timeout (DID_TIME_OUT as reported by the kernel). This happens about once a week on our secondary node.

Next message in our logs is from kernel blk_update_request which reports this as an I/O error. This leads to DRBD saying “we had at least one MD IO ERROR during bitmap IO”. And then failing the disk and going into Diskless mode.
If there was a real IO error this is exactly what we want to happen. But this is a timeout, which should be handled differently.

Why does DRBD handle this as an IO error and not a disk timeout? How can we change this behaviour?

DRBD version 9.15

To recover from this we simply run: drbdadm attach
And DBRD resyncs for a bit and everything works again for a week or so.

Best regards,
Rami Lehti

BHellman · August 22, 2024, 11:55am

Would you mind sharing your drbd resource config? It could be a matter of configuring the ko-count:

ko-count number

           If a secondary node fails to complete a write request in ko-count times the timeout
           parameter, it is excluded from the cluster. The primary node then sets the connection
           to this secondary node to Standalone. To disable this feature, you should explicitly
           set it to 0; defaults may change between versions.

ramilehti · August 22, 2024, 12:34pm

resource kvmpool2 {
meta-disk internal;
device /dev/drbd2;
syncer {
verify-alg sha1;
c-max-rate 1024000;
c-min-rate 51200;
c-plan-ahead 10;
}
net {
}
on host1 {
disk /dev/disk/by-partuuid/13902e29-7e15-4465-9651-1abbd2ac341a;
address 192.168.120.2:7790;
}
on host2 {
disk /dev/disk/by-partuuid/7e9ce22d-29de-4fbc-9677-49af3dd18c3e;
address 192.168.120.3:7790;
}
}

BHellman · August 22, 2024, 12:46pm

The default for ko-count is 7; you could try increasing that and see if the issue still happens.

ramilehti · August 22, 2024, 12:48pm

Thanks, I’ll try that.

BHellman · August 22, 2024, 12:58pm

One more thing, as mentioned in the man page for drbd.conf, the default might change between versions. But you can easily see it with the command drbdsetup show --show-defaults this will also show you all kinds of other fun things you can tune.

Topic		Replies	Views
Block drbd0: local WRITE IO error sector $SECTOR on mdXpY DRBD drbd	0	54	March 21, 2025
Question about 2.15. Disk Error Handling Strategies (from the User Guide) DRBD	33	257	September 11, 2024
drbd-9.2.14-rc.1 and drbd-9.3.0-rc.1 Release Announcements drbd	0	51	May 27, 2025
INFO: task drbd_r_omd:1957 blocked for more than 120 seconds DRBD	7	264	August 9, 2024
drbd-9.2.14 Release Announcements drbd	0	67	June 3, 2025

During heavy IO disk write timeout leads to DRBD going to diskless mode

Related topics