When you say “2 x DRBD disks, 1 per physical disk”, presumably this means the physical disks are partitioned in some way. For example:
- /dev/sda1 and /dev/sda2
- /dev/sdb1 and /dev/sdb1
Then /dev/drbd0 runs between /dev/sda1 and /dev/sdb1, and /dev/drbd1 runs between /dev/sda2 and /dev/sdb2.
(Of course, it would be more common to run DRBD between disks in two different servers; otherwise DRBD doesn’t give you any value above mdraid or similar)
On top of that, you have a ZFS mirrored vdev comprising /dev/drbd0 and /dev/drbd1.
Does this describe your scenario accurately?
If one of the physical disks fails and must be replaced, what should happen? Does DRBD handle the whole thing with a resync, or does ZFS to a mirror rebuild?
The former.
When the disk fails DRBD will redirect all the I/O to the other side of the DRBD mirror. ZFS won’t even notice that something has failed. When you replace the disk, DRBD will copy the data back from the other disk (both partitions), and again ZFS won’t notice.
A subsequent ZFS scrub will check data between the two sides of the ZFS mirror - and because of checksums, it can tell if either side of the ZFS mirror has become corrupted somehow, and fix it using data from the other side (if that other copy is valid). This is something that is normally scheduled periodically, to fix “bit rot”.
DRBD can’t do this; it writes the same data to both sides, but if one side were to become corrupted it wouldn’t notice. On readback you would get data from one side or the other - usually the closest DRBD replica if they were on different hosts. However, ZFS would notice if the read has a bad checksum, in which case it would read from the other partition (i.e. the other half of the ZFS mirror). Note that it has no way to access the other DRBD replica of the same partition. As far as ZFS is concerned, there are only two accessible copies of the data, not four.
Now, if DRBD were running with a single replica (e.g. because the second disk has already failed), then ZFS mirroring is the only data recovery mechanism which can come into play. But that would only succeed if the physical drive fails in such a way that one partition becomes inaccessible but the other doesn’t. This might be the case for a spinning drive where a few sectors or tracks have gone bad.
Similarly, ZFS mirroring would be the recovery mechanism if you had DRBD replication but for some reason one partition became inaccessible in both drives simultaneously, but the other partition was accessible in at least one drive. That would be an unusual situation though.