Linstor Failure on specific node when switchover

Hi guys

I got a very strange behaviour and i dont find the problem. Maybe someone of you has a hint for me what the problem is and how i can solve that.

First - im on the newest version from the repo and my distro is debian 12 ( had some issues in the past because of wrong firewall impelmentation from iptables itself (searching for proto Id instead of short abbreviation).

I have 3 nodes - db is HA and i also have linstor gateway (NFS) and so on.

When i have to reboot one of my nodes (which is the controller atm), it should take over by the other - this is working as long the third node is not taken in place.

I can see in the journalctl that something is broken on the third node - so one node want to take the controller (node2) and want to bind the NFS IP address on the third node. But in the logs i saw the follwing entries.

On Node1: Is rebooting, controller and NFS gateway should be transferred

On Node 2:
drbd proxmox: Declined by peer linstor-node3 (id: 2) see the kernel log there

On Node 3:
sm-notify my_name ‘10.89.0.21’ is unusable: Name or service not known.

And when this happens all of the nodes are not listed anymore in linstor n l or linstor r l and my nfs export is gone and therefore all machines are tainted (makes sense, storage not mounted or accessible, so no access to the vm files).

So i guess there is something missing on the third node - but if i did compare /etc/linstor and /var/lib/drbd* on each node, there is no difference (or maybe i was blind).

So whats the way to go to find out, why my_name 10.89.0.21 is not known on the third node?

You might be hitting issues due to a conflict between the resource-agents and iptables versions shipped in Debian 12.

Take a look at this blog post for deploying LINSTOR Gateway on Proxmox (Debian 12). Pay attention to the resource agents override section (I can’t link to sections of our blog, otherwise I would). It also walks through the necessary configuration for DRBD Reactor. While this blog does not touch on making the LINSTOR controller highly available, it lays the groundwork necessary for doing so.

Hope that helps, let us know if you’re still running into issues after implementing some steps outlined in that blog post.

Thanks for your reply

No, this is not the case as i was the customer which leads Linbit into that problem last year with my installation.

There was a very good session with a guy from Linbit and he did found the problem and the solution for that on my system.

Btw it was not directly on the Proxmox - as i have 3 vm’s which are used for SDS - because i dont want to many things running on my proxmox.

But i did face some serious performance problems, which then leads into that crashes. Because if i move one machine into the SDS (lets assume Node 3 is the NFS Exporter and on Node 2 i start the synchro) the moving Node will read and push to Node 3 but Node 3 will also synch DRBD to Node 1 and 2 - and so on the whole Host become very slow and unresponsive.

So that the reason why i did decide to go completely away from SDS - but its not Linbits fault, its my own, beceause of the wrong design in the disks (Raid 10, Partition for Proxmox, rest for VM’s and so on, if the RAID got that overload, it also affects the Proxmox itself).

I did buy now a ME5024 from DELL.

Because if i did compare Linbit access to Support (which is the half price / year of the storage (and wiithout any hardware)) for my 3 nodes the costs of the storage is not that high and i got the advantage of the external storage too.