Hi Community
I built a Proxmox cluster with Linstor in my homelab and I’m currently in the experimental phase. So it’s not so serious if something breaks. Currently a 2-node cluster with quorum, as one node was DOA.
Installed software:
Summary
root@pve01 ~$ dpkg -l | grep linstor
linstor-client 1.24.0-1
linstor-common 1.30.4-1
linstor-controller 1.30.4-1
linstor-proxmox 8.1.0-1
linstor-satellite 1.30.4-1
python-linstor 1.24.0-1
root@pve01 ~$ dpkg -l | grep drbd
drbd-dkms 9.2.12-2
drbd-utils 9.30.0-1
root@pve01 ~$ pveversion -v
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.5 (running version: 8.3.5/dac3aa88bac3f300)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
frr-pythontools: 8.5.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.1
libpve-rs-perl: 0.9.2
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.3-1
proxmox-backup-file-restore: 3.3.3-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.6
pve-cluster: 8.0.10
pve-container: 5.2.4
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.4.0
pve-qemu-kvm: 9.2.0-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
Today, I received this error:
TASK ERROR: API Return-Code: 500. Message: Could not rollback cluster wide snapshot snap_pm-a203816c_Netzwerk of pm-a203816c, because...
I put the output in collapsible tags and hope that makes it more readable.
In PVE console (shortened):
Summary
TASK ERROR: API Return-Code: 500. Message: Could not rollback cluster wide snapshot snap_pm-a203816c_Netzwerk of pm-a203816c, because: [{"ret_code":34340867,"message":"Snapshot 'snap_pm-a203816c_Netzwerk' of resource 'pm-a203816c' marked down for rollback.","obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:41.881991595+01:00"},{"ret_code":36962307,"message":"(pve02) Resource 'pm-a203816c' [DRBD] adjusted.","obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:41.986547094+01:00"},{"ret_code":34340867,"message":"Deactivated resource 'pm-a203816c' on 'pve02' for rollback","obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:41.98665986+01:00"},{"ret_code":36962307,"message":"(pve01) Resource 'pm-a203816c' [DRBD] adjusted.","obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:42.006239917+01:00"},{"ret_code":34340867,"message":"Deactivated resource 'pm-a203816c' on 'pve01' for rollback","obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:42.006305288+01:00"},{"ret_code":4611686018461738777,"message":"All satellites failed the snapshot rollback. Aborting. Data remains unchanged.","obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:42.240003653+01:00"},{"ret_code":-4611686018393046042,"message":"(Node: 'pve02') Failed to rollback to snapshot linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk","details":"Command 'lvconvert --config 'devices { filter=['\"'\"'a|/dev/nvme0n1|'\"'\"','\"'\"'r|.*|'\"'\"'] }' --merge linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk' returned with exitcode 5. \n\nStandard out: \n\n\nError message: \n linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk is not a mergeable logical volume.\n\n","error_report_ids":["67E1012A-CF113-000000"],"obj_refs":{"RscDfn":"pm-a203816c","Snapshot":"snap_pm-a203816c_Netzwerk"},"created_at":"2025-03-24T08:01:42.298084003+01:00"},{"ret_code":-4611686018393046042,"message":"(Node: 'pve01') Failed to rollback to snapshot linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk","details":"Command 'lvconvert --config 'devices { filter=['\"'\"'a|/dev/nvme0n1|'\"'\"','\"'\"'r|.*|'\"'\"'] }' --merge linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk' returned with exitcode 5. \n\nStandard out: \n\n\nError message: \n linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk is not a mergeable logical volume.\n\n","error_report_ids":["67E1013B-6F0E1-000000"]
...
And error report from Linstor-GUI:
Summary
ERROR REPORT 67E1012A-CF113-000000
============================================================
Application: LINBIT® LINSTOR
Module: Satellite
Version: 1.30.4
Build ID: bef74a44609cb592c5efad2e707b50e696623c61
Build time: 2025-02-03T15:48:28+00:00
Error time: 2025-03-24 08:01:46
Node: pve02
Thread: DeviceManager
============================================================
Reported error:
===============
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69
Error message: Failed to rollback to snapshot linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk
Error context:
An error occurred while processing resource 'Node: 'pve02', Rsc: 'pm-a203816c''
ErrorContext:
Details: Command 'lvconvert --config 'devices { filter=['"'"'a|/dev/nvme0n1|'"'"','"'"'r|.*|'"'"'] }' --merge linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk' returned with exitcode 5.
Standard out:
Error message:
linstor_LinstorStorage/pm-a203816c_00000_snap_pm-a203816c_Netzwerk is not a mergeable logical volume.
Call backtrace:
Method Native Class:Line number
checkExitCode N com.linbit.extproc.ExtCmdUtils:69
genericExecutor N com.linbit.linstor.storage.utils.Commands:103
genericExecutor N com.linbit.linstor.storage.utils.Commands:63
genericExecutor N com.linbit.linstor.storage.utils.Commands:51
rollbackToSnapshot N com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:433
lambda$rollbackImpl$10 N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:371
execWithRetry N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:728
rollbackImpl N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:368
rollbackImpl N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:58
handleRollbacks N com.linbit.linstor.layer.storage.AbsStorageProvider:1325
processVolumes N com.linbit.linstor.layer.storage.AbsStorageProvider:390
processResource N com.linbit.linstor.layer.storage.StorageLayer:285
lambda$processResource$4 N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1368
processGeneric N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1411
processResource N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1364
processChild N com.linbit.linstor.layer.drbd.DrbdLayer:353
processResource N com.linbit.linstor.layer.drbd.DrbdLayer:228
lambda$processResource$4 N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1368
processGeneric N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1411
processResource N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1364
processResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:386
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:228
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:333
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1148
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:778
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:674
run N java.lang.Thread:840
END OF ERROR REPORT.
While searching the forum, I came across the post by c.duchenoy from last October and wonder if it’s the same problem?
Does anyone have any tips for me or can point me in the right direction? I’m pretty sure that rollbacks have worked in the past - it seems there’s a dependency on this VM on Node pve02. The VM itself is running on pve01.
Thank you in advance.
Best,
Hiu