3 node cluster / some vdisks stays on status inconsistent

fully patched proxmox 9.1.6

latest drdb patch

two nodes are fully synched - one node has some inconsistent vdisks

drbdadm status all

pm-479d5da8 role:Primary
  disk:UpToDate open:yes
  pveAMD-AI role:Secondary
    peer-disk:UpToDate
  pveAMD02 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:4.00

pm-8b1faeab role:Primary
  disk:UpToDate open:yes
  pveAMD-AI role:Secondary
    peer-disk:UpToDate
  pveAMD02 role:Secondary
    peer-disk:UpToDate

pm-8f4a8d6b role:Primary
  disk:UpToDate open:yes
  pveAMD-AI role:Secondary
    peer-disk:UpToDate
  pveAMD02 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:1.93


drbdadm adjust all

drbdadm -V
DRBDADM_BUILDTAG=GIT-hash:\ d10b5f53cdf6a445d6fc02cfc2477a129f4e7e83\ build\ by\ @buildsystem\,\ 2026-03-11\ 12:53:05
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090300
DRBD_KERNEL_VERSION=9.3.0
DRBDADM_VERSION_CODE=0x092101
DRBDADM_VERSION=9.33.1

linstor -v
linstor-client 1.27.1; GIT-hash: 9c57f040eb3834500db508e4f04d361d006cb6b5

Simply no progress for two vdisks

Any suggestions?

Thanks and happy we.

I’ve found the hack…. you have to manually down / up the resource on the corresponding node by:

drbdadm down pm-479d5da8

drbdadm up pm-479d5da8

drbdadm status pm-479d5da8

pm-479d5da8 role:Secondary
disk:UpToDate open:no
pveAMD01 role:Primary
peer-disk:UpToDate
pveAMD02 role:Secondary
peer-disk:UpToDate

It is interesting that DRBD’s resync got stuck. I’ve seen this happen when there are mis-matched frame sizes between devices on the replication network. Small replicated writes might get through, but when DRBD tries to send larger IOs, like it might during a resync process, the mis-matched frame sizes cause those larger frames to drop and DRBD’s resync will appear stuck.

Glad whatever this was seemed to be a “one off” issue. Let us know if it happens again.

I have seen this enough times on random networks, that I have this “one-line” bash function in my .bashrc that will report the maximum MTU between my workstation and some endpoint:

# MTU checker
mtu_discover_using_ping() ( target=$1 ; i=0; good=1;  bad=${2:-15000}; mtu=${3:-1400}; lmtu=$good; while (( $bad - $good > 1 )); do let i+=1; if ping -w1 -i 0.1 -c2 -M do -s $mtu $1 &>/dev/null; then good=$mtu; else bad=$mtu; fi; lmtu=$mtu; mtu=$(( (good + bad)/2 )); printf "i:%u,\t""mtu:%u,\t""bad:%6u,\t""good:%6u,\t""diff:%6d\n" $i $mtu $bad $good $(( bad-good )); done >&2 ; echo >&2 "found in $i iterations using: ping -w3 -i0.5 -c2 -M do -s \$mtu $target" ; echo MTU=$mtu )

With that, you can simply use it like a ping command:

$ mtu_discover_using_ping 192.168.111.1
i:1,	mtu:8200,	bad: 15000,	good:  1400,	diff: 13600
i:2,	mtu:4800,	bad:  8200,	good:  1400,	diff:  6800
i:3,	mtu:3100,	bad:  4800,	good:  1400,	diff:  3400
i:4,	mtu:2250,	bad:  3100,	good:  1400,	diff:  1700
i:5,	mtu:1825,	bad:  2250,	good:  1400,	diff:   850
i:6,	mtu:1612,	bad:  1825,	good:  1400,	diff:   425
i:7,	mtu:1506,	bad:  1612,	good:  1400,	diff:   212
i:8,	mtu:1453,	bad:  1506,	good:  1400,	diff:   106
i:9,	mtu:1479,	bad:  1506,	good:  1453,	diff:    53
i:10,	mtu:1466,	bad:  1479,	good:  1453,	diff:    26
i:11,	mtu:1472,	bad:  1479,	good:  1466,	diff:    13
i:12,	mtu:1475,	bad:  1479,	good:  1472,	diff:     7
i:13,	mtu:1473,	bad:  1475,	good:  1472,	diff:     3
i:14,	mtu:1472,	bad:  1473,	good:  1472,	diff:     1
found in 14 iterations using: ping -w3 -i0.5 -c2 -M do -s $mtu 192.168.111.1
MTU=1472
1 Like