Lost Linstor configuration, pve runs on native DRBDs - how can I make Linstor authorative again?

Hello everybody,

on Tuesday evening I made some updates to a three node Proxmox-Linstor-Cluster which ended up in a total catastrophe.

History: The cluster was not fully functional in terms of how Linstor controlled the DRBD devices, and it has been running like that for a while.

Some time ago the naming scheme for the vm disks got changed (from vm-xxx-disk-1 to pm-xxxxx) which somehow broke something in my cluster. From then on the VMs could not be instantly migrated because Proxmox wanted to put them on ‘local’ storage on the other node, and I’m not sure if migration would have worked from local to local storage (I can’t remeber exactly, I think it didn’t work).

Anyway …

Now on Tuesday - after the updates - the Linstor configuration (Linstor HA with DRBD-reactor) was completely gone … and I was not able to reactivate it.

In the middle of the night I finally managed (somehow!) to reactivate the DRBD devices ‘natively’ with Linstor down, the .res files from a /var/lib/linstor.d/.backup directory, manually editing around in the files a bit and starting DRBD directly.

Linstor satellites (only on two nodes!) and controller (on one node) are running with ‘keep-res’ so that the DRBD resources don’t get deleted.

Now I have the VMs up and running on the first node and the DRBD devices in sync over the three nodes - puh!

But still - like before the crash - the VMs in PVE don’t use the DRBD storage disks but disks on local storage (with the same content).

One example is a VM with one disk that is appearing as two:

drbdstorage: vm-disk-107-1

and

datapool: vm-disk-107-1_00000

The latter is usable but when I try to reconfigure the VMs hardware so that the first one gets used, then I’m getting an error saying the there is no resource definiton for vm-disk-107-1 and should create it.

PVE uses those disk resources on local storage with _00000 suffixing the names, not accepting the disk resources on DRBD (same content).

Does anybody have an idea on how to proceed?

(How) can I import or re-import the existing DRBD resources into the Linstor controller’s database?

I’m even willing to setup the two nodes which don’t carry the VMs from scratch (with PVE9?) and migrate the existing VMs from the active node somehow. Would that be feasible?

Addition: When I try to migrate a VM to another node PVE tells me that ‘migration of a local disk might take long’ (obviously, because it’s not on DRBD) but then the migration fails with:

2025-08-08 11:40:18 starting migration of VM 107 to node ‘ttivs3’ (10.10.10.3)
2025-08-08 11:40:18 found local disk ‘datapool:vm-107-disk-1_00000’ (attached)
2025-08-08 11:40:18 starting VM 107 on remote node ‘ttivs3’
2025-08-08 11:40:23 volume ‘datapool:vm-107-disk-1_00000’ is ‘datapool:vm-107-disk-0’ on the target
2025-08-08 11:40:23 start remote tunnel
2025-08-08 11:40:24 ssh tunnel ver 1
2025-08-08 11:40:24 starting storage migration
2025-08-08 11:40:24 scsi0: start migration to nbd:unix:/run/qemuserver/107_nbd.migrate:exportname=drive-scsi0 drive mirror is starting for drive-scsi0 drive scsi0: Cancelling block job drive-scsi0: Done.
2025-08-08 11:40:24 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: Source and target image have different sizes (io-status: ok)
2025-08-08 11:40:24 aborting phase 2 - cleanup resources
2025-08-08 11:40:24 migrate_cancel
2025-08-08 11:40:29 ERROR: migration finished with problems (duration 00:00:11)
TASK ERROR: migration problems

I’m stuck with a ‘one node cluster’ it seems and any help will be highly appreciated!

Best regards

Matthias

ps What kind of command or log output do you need from me?

To answer this question directly, out of context from the rest of your situation, this article on the LINBIT blog describes the procedure: Migrating Manually Created DRBD Resources to LINSTOR Software-Defined Storage.