Despite the open thread Differences in DRBD resources I followed Creating a Highly Available LINSTOR Cluster and were quiet happy, that everything went smooth and checking status in between and at the end seems right. All satellite nodes online, both controllers comma separated in linstor-client.conf linstor controller which showed current machine - correlating to the state of drbd-reactor. The controller and DRBDreactor services were now installed and running on my PVE nodes, PVE-1 and PVE-2, PVE-1 being active at the moment.
As there were Proxmox Updates I installed them on one PVE node (PVE-2, IP: 192.168.113.22) and rebooted the node afterwards.
The DRBD resources did not come online after the reboot. All VMs and LXC can’t be brought online and can’t be migrated.
Proxmox shows error:
TASK ERROR: could not connect to any LINSTOR controller at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 241
Really weird, cause linstor commands on the failing node show:
root@pve-2:~# linstor controller which
linstor://192.168.113.21 (which is the other node, PVE-1 -> IMHO correct)
root@pve-2:~# linstor node list -p
+------------------------------------------------------------+
| Node | NodeType | Addresses | State |
|============================================================|
| pve-1 | SATELLITE | 192.168.113.21:3366 (PLAIN) | Online |
| pve-2 | SATELLITE | 192.168.113.22:3366 (PLAIN) | Online |
| raspi-1 | SATELLITE | 192.168.111.20:3366 (PLAIN) | Online |
+------------------------------------------------------------+
Also other linstor commands like storage-pool, resource-group or resource show IMHO no failure. Only all DRBD resources (disks) that were formerly on the rebooted node show “Usage: Unused” and “State: UpToDate”.
root@pve-2:~# drbdsetup status pm-575f24e5
pm-575f24e5 role:Secondary
disk:UpToDate open:no
pve-1 role:Secondary
peer-disk:UpToDate
raspi-1 role:Secondary
peer-disk:Diskless
# can even set this to primary on the rebooted node (PVE-2)
root@pve-2:~# drbdadm primary pm-575f24e5
root@pve-2:~# drbdsetup status pm-575f24e5
pm-575f24e5 role:Primary
disk:UpToDate open:no
pve-1 role:Secondary
peer-disk:UpToDate
raspi-1 role:Secondary
peer-disk:Diskless
But the error remains. Half of my systems are down now and can’t be started.