We have several proxmox install with linstor as SDS, we found out that if you take snapshot, and rollback with ZFS as a storage pool, everything works flawless.
However, when you try the same thing with a lvm-thin storage pool, it fails almost everytime. Sometimes it will work but when it is not you are left with stuck ressources.
We tried :
powered on or powered off VM
the oldest snap
the most recent one
snap in the middle of the “tree”
DrbdOptions/Net/allow-two-primaries set to yes or no : same problem
We did not find a stable working method of rolling back snapshots via the proxmox web UI.
I will try the same with the linstor CLI and report back
EDIT : Yes, same problem with the linstor snapshot rollback cmd
Description:
(Node: 'vc-swarm3') Failed to rollback to snapshot pve/vm-700-disk-1_00000_snap_vm-700-disk-1_snap1
Details:
Command 'lvconvert --config devices { filter=['a|/dev/sdg3|','r|.*|'] } --merge pve/vm-700-disk-1_00000_snap_vm-700-disk-1_snap1' returned with exitcode 5.
Standard out:
Error message:
pve/vm-700-disk-1_00000_snap_vm-700-disk-1_snap1 is not a mergeable logical volume.
And we can see that we end up with a snap that is not “linked” to the parent lvm
We have three cluster in that situation one with like so :
root@vc-swarm1:~# dpkg -l | grep linstor
ii linstor-client 1.18.0-1 all Linstor client command line tool
ii linstor-common 1.22.0-1 all DRBD distributed resource management utility
ii linstor-controller 1.22.0-1 all DRBD distributed resource management utility
ii linstor-proxmox 7.0.0-1 all DRBD distributed resource management utility
hi linstor-satellite 1.22.0-1 all DRBD distributed resource management utility
ii python-linstor 1.18.0-1 all Linstor python api library
And two other :
└─$ dpkg -l | grep linstor
ii linstor-client 1.23.0-1 all Linstor client command line tool
hi linstor-common 1.29.0-1 all DRBD distributed resource management utility
ii linstor-controller 1.29.0-1 all DRBD distributed resource management utility
hi linstor-proxmox 8.0.4-1 all DRBD distributed resource management utility
ii linstor-satellite 1.29.0-1 all DRBD distributed resource management utility
ii python-linstor 1.23.0-1 all Linstor python api library
Thank you for the report. We are aware of this issue. We have a few ideas we are currently testing that could address this issue.
Some technical background: If you have an LVM volume, and create let’s say 2 snapshots of it, both snapshots will have the original volume as their “origin”. If you know run a linstor snapshot rollback (which internally runs a lvconvert --merge $vg/$snapshot as the error message states), it merges the snapshot into its origin. So far everything is as expected. After this command two things have changed: First, the LVM snapshot is now gone (since it got merged), but LINSTOR simply creates a new snapshot to “fix” this point. The second point is more problematic: The second snapshot we created in the beginning, which was completely untouched by our linstor snapshot rollback and lvconvert --merge commands, also “lost” its origin. The data is still there and fine, but this second snapshot can no longer be “merged” into the already rolled back volume.
As a workaround, instead of using linstor snapshot rollback for LVM-THIN setups, you can manually delete the resources (not the -definition) and linstor snapshot resource restore --fr $rsc --tr $rsc --fs $snapshot the snapshot into the same resource-definition.
This is actually one of our plans we are testing right now. The idea sounds fine on LVM_THIN but unfortunately does not work that easily on ZFS setups since there you cannot just delete the volume while it has snapshots (there is a very strict parent-child dependency between ZVOLs and their snapshots).
I also noticed that on a LVM-Thin storage pool a snapshor rollback doesn’t work. It appears to do the disk rollback correctly but won’t roll back memory and just stops the VM.
Error message is: “Error: start failed: QEMU exited with code 1 … blockdev: cannot open /dev/drbd/by-res/pm-e08830a6/0: Keine Daten verfügbar kvm: -drive file=/dev/drbd/by-res/pm-e08830a6/0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open ‘/dev/drbd/by-res/pm-e08830a6/0’: No data available”
Is this the same issue as above and can we expect this to be fixed?
I was very enthusiastic about the Linstor/Proxmox integration when starting the evaluation, but the nasty lowercase snapshot name issue, the inability to move storage (without temporarily tampering with storage.cfg) and the inability to rollback snapshots is disappointing.