Drbd-disk- / lvm-disk-size mismatch after resize

After resizing a volume, I have a size difference between the drbd-disk and the underlying lvm-disk.

I resized the drbd-volumes from 200G to 260G with

linstor vd set-size sorim 0 260G

But a running VM (Windows Server on KVM) does not see the new size after a restart, the hard disk is still 200G in size.

I did’t get any error messages when executing the command.

On both nodes I get following output

# blockdev --getsize64 /dev/drbd/by-res/sorim/0
214748577792
# blockdev --getsize64 /dev/ubuntu-vg/sorim_00000
279235788800

I ran the same commands with a newly created test-resource/volume and it worked without any problems.

Can I adjust the size of the drbd-disk in any other way?

# lvs
sorim_00000  ubuntu-vg -wi-ao---- <260.06g

#dmesg
…
drbd sorim: Preparing cluster-wide state change 3094901933 (1->-1 3/1)
[10226198.593284] drbd sorim: State change 3094901933: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFF9
[10226198.593290] drbd sorim: Committing cluster-wide state change 3094901933 (0ms)
[10226198.593299] drbd sorim: role( Secondary -> Primary ) [qemu-system-x86:594687 auto-promote]
[10226198.593407] drbd sorim/0 drbd1006: ASSERTION drbd_md_ss(device->ldev) == device->ldev->md.md_offset FAILED in drbd_md_write
…

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ fdd9a4d603a9dc99d110d8bd0e288d7c0b6f586e\ build\ by\ buildd@lcy02-amd64-086\,\ 2023-12-22\ 12:25:06
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090208
DRBD_KERNEL_VERSION=9.2.8
DRBDADM_VERSION_CODE=0x091b00
DRBDADM_VERSION=9.27.0

# linstor --version
linstor-client 1.22.0; GIT-hash: 7becbaf217ab88686b98346ff2c3af71dd49d5e2

# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

I would have expected LINSTOR to have done this automatically with the volume-definition resize. Regardless, you should be able to trigger it manually with a drbdadm.

drbdadm resize sorim

If the drbdadm utility is not installed you can do the same with the “lower level” drbdsetup utility. like so:

drbdsetup resize <minor_num>
drbdsetup check-resize <minor_num>

Unfortunately, the execution of the command did not result in any change.

drbdadm resize sorim
#dmesg
...
[10257170.233943] drbd sorim/0 drbd1006: Preparing cluster-wide size change 1170184734 (local_max_size = 272632908 KB, user_cap = 0 KB)
[10257170.234288] drbd sorim/0 drbd1006: Aborting cluster-wide size change 1170184734 (4ms) size unchanged
...
# blockdev --getsize64 /dev/drbd/by-res/sorim/0
214748577792
# blockdev --getsize64 /dev/ubuntu-vg/sorim_00000
279235788800

The machines has gone through a few version jumps. Is there perhaps a drbd-property that changes the expected normal behavior?

I have now tried to expand another older drbd volume as a test, which worked without any problems.

I cannot think of any upgrade that would have changed the resize behavior. All the nodes are on the same versions, correct?

Can you please verify that the Sorim resource is connected and healthy?

The resource is connected, healthy and in use.

# drbdadm status sorim -v
drbdsetup status sorim --verbose
sorim node-id:1 role:Primary suspended:no force-io-failures:no
  volume:0 minor:1006 disk:UpToDate backing_dev:/dev/ubuntu-vg/sorim_00000 quorum:yes blocked:no
  waimu node-id:2 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
$ linstor vd l
...
┊ sorim        ┊ 0        ┊ 1006        ┊ 260 GiB    ┊       ┊ ok    ┊
...

In the first step, I expanded it to 250G while the VM was running, then shut down the VM and restarted it on same node. As this was unsuccessful, I increased the volume to 260G in the second step with the VM switched off. Each time only the lvm volume was increased.

In the meantime, I have restarted the VM on the second node. Both nodes show the discrepancy between the drbd-size and the lvm-size.

I have now shut down the VM on the second node in order to read the disk sizes again on the first node.
But now the size of drbd-disk and lvm-disk seemed to match on the first node.

# blockdev --getsize64 /dev/drbdpool/sorim_00000
279235788800
# blockdev --getsize64 /dev/drbd/by-res/sorim/0
279176097792

I have now restarted the VM on the first node, now the operating system also showed the expected large.

However, the connection to the second node was now disrupted. NetworkError / Outdated / Reconnect alternated.

I have now deleted the resource from the second node with linstor and recreated It and let the cluster resynchronize itself.

sorim role:Primary
  disk:UpToDate
  vario role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:43.75

Maybe there is a hint in the log.

# dmesg - second node
[10304186.361753] drbd sorim waimu: Restarting receiver thread
[10304186.361760] drbd sorim waimu: conn( Unconnected -> Connecting ) [connecting]
[10304186.974960] drbd sorim waimu: Handshake to peer 2 successful: Agreed network protocol version 122
[10304186.974969] drbd sorim waimu: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[10304186.975124] drbd sorim waimu: Peer authenticated using 20 bytes HMAC
[10304186.977029] drbd sorim: Preparing cluster-wide state change 4098131736 (1->2 499/146)
[10304186.993074] drbd sorim/0 drbd1006 waimu: drbd_sync_handshake:
[10304186.993083] drbd sorim/0 drbd1006 waimu: self 510AFC5D8CBB0AA2:0000000000000000:D522B86F6F15C858:E4220F0C45450014 bits:155050 flags:120
[10304186.993093] drbd sorim/0 drbd1006 waimu: peer 285760DD8507BD53:510AFC5D8CBB0AA2:80746CD0F8B82B90:D522B86F6F15C858 bits:155051 flags:1120
[10304186.993101] drbd sorim/0 drbd1006 waimu: uuid_compare()=target-use-bitmap by rule=bitmap-peer
[10304186.998917] drbd sorim: State change 4098131736: primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFF9
[10304186.998929] drbd sorim: Committing cluster-wide state change 4098131736 (20ms)
[10304186.998973] drbd sorim waimu: conn( Connecting -> Connected ) peer( Unknown -> Primary ) [connected]
[10304186.998980] drbd sorim/0 drbd1006 waimu: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) [connected]
[10304187.025113] drbd sorim/0 drbd1006 waimu: bitmap overflow (e:68157695) while decoding bm RLE packet
[10304187.026398] drbd sorim waimu: error receiving P_COMPRESSED_BITMAP, e: -5 l: 3284!
[10304187.027660] drbd sorim waimu: conn( Connected -> ProtocolError ) peer( Primary -> Unknown )
[10304187.027666] drbd sorim/0 drbd1006 waimu: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
[10304187.036366] drbd sorim waimu: Terminating sender thread
[10304187.036398] drbd sorim waimu: Starting sender thread (from drbd_r_sorim [597697])
[10304187.057684] drbd sorim waimu: Connection closed
[10304187.057705] drbd sorim waimu: helper command: /sbin/drbdadm disconnected
[10304187.061836] drbd sorim waimu: helper command: /sbin/drbdadm disconnected exit code 0
[10304187.061867] drbd sorim waimu: conn( ProtocolError -> Unconnected ) [disconnected]