I am working on a reference architecture for Proxmox+Linstor deployments and have been beating up my test cluster pretty good for the last few months with all kinds of failure scenarios.
I’ve been having an issue where the linstor plugin for Proxmox is deleting diskless resources sometimes when a VM is migrated. I looked in linstor/Linstor.pm and the comments in sub deactivate_resource seem to imply that the controller will know if the DELETE action is targetting a diskless resource that is needed as as TieBreaker, and won’t delete it. But, I have observed different behavior.
The result is that after a node failure, recovery, and migrations to restore placement, linstor advise resource will note numerous devices that no longer have a tie breaker resource.
My first thought is that the diskless devices are missing some meta information on the controller that tells it these are TieBreakers, but I haven’t figured out how to find / set that data. My other thought is that I really would prefer to never delete a diskless device on volume deactivation, regardless of why it exists. It would be nice if this was configurable. For now I will probably maintain an internal fork with that config option added.
I suspect that some TieBreakers may have lost whatever information was marking them as a TieBreaker during my other destructive testing.
Can anyone advise on how a device is marked as a TieBreaker and how that state is set? Then I can work on a way to detect and repair unmarked TieBreakers. Thanks!
Upon further digging this seems to be from a smattering of resource definitions missing the property DrbdOptions/auto-add-quorum-tiebreaker=True. I fixed all the resource definitions and the plugin / controller behaves as documented, converting “Diskless” to “TieBreaker” when deleting that resource.
Is there any way to sync up the config of resource definitions to the resource group they were spawned from? I modified the resource group with this option but had to also go address each resource-definition individually.
Upon further digging this seems to be from a smattering of resource definitions missing the property DrbdOptions/auto-add-quorum-tiebreaker=True. I fixed all the resource definitions and the plugin / controller behaves as documented, converting “Diskless” to “TieBreaker” when deleting that resource.
This property should be by default =True, unless an object (like resource-group or even controller) states otherwise. So can you please verify if this property is set on those levels to =False? Should be as simple as linstor c lp | grep quorum and linstor rg lp <name_of_your_resource_group> | grep quorum.
If those two objects also do not have / had this property set, we might need to investigate further.
Brief explanation about properties in LINSTOR:
Properties are inherited in LINSTOR using the rule of thumb “the close a property is to a volume, the higher its priority”. That means that if a property is set on controller level and the same property is for example also set on the resource-definition level, the property will have the value of the resource-definition instead of the value from the controller - only for the resources of said resource-definition of course.
Thanks for the clarifications! I definitely had auto-add-quorum-tiebreaker set to False on some resource definitions, though it was True on the resource group.
It looks like if a tiebreaker resource is deleted with the CLI, that property then gets set to False automatically on the resource def. I had some HA failures earlier in testing where Proxmox HA would try to start the VM on a node, fail, then try on another node, and I believe tiebreaker resources were deleted in that event (not manually from the CLI). Those resource definitions had auto-add-quorum-tiebreaker set to False.
Fortunately / unfortunately I cannot reproduce that failure scenario now. The cluster is now handling all the torture tests without issue. There have been a lot of changes and tweaks so I can’t say exactly why HA failovers sometimes failed earlier, but they don’t now. I will post a thread if I discover any Linstor related reason for it.
My only theory is that failover failure was caused by slow response by the linstor client when the proxmox plugin was configured to use a list of controllers, and the active one was toward the end of the list. This definitely caused very long running “HA start” tasks even when it worked. I added a an IP address resource to the drbd-reactor service for the controller and configured the clients to connect only to that IP. This eliminated the odd lag I witnessed with the client when a controller host early in the controller list was down.
After tweaking, my resource group properties look like this. This is mostly stock + incorporating the suspend-io tip from a Linbit KB about VM environments.
@ghernadi - I have now observed resource definitions missing Tie Breakers and having DrbdOptions/auto-add-quorum-tiebreaker set to False after no direct action by me to change that value. I can re-produce it and have narrowed down a sequence of events that will result in the issue on a three node cluster with an auto-place=2 resource group:
A VM is running on a host where at least one of that VM’s disks were the TieBreaker, and the disk becomes Diskless to allow access for the running VM.
That host fails and HA restarts the VM on another host.
The failed host is restored, and the VM is migrated back to the original host.
The VM is migrated off the original host - at this time the Diskless resource that should become a TieBreaker just gets deleted.
This doesn’t happen every time, it takes several tries to provoke the behavior. I don’t actually know if the HA failover has anything to do with this. It feels like a race condition in the handler of the DELETE REST API call.
This commit by rck seems to imply that a cause for this issue was found and fixed in linstor-proxmox 8.0.4?
Can anyone confirm from the Linbit side? I have just upgraded my reference cluster and replaced my forked plugin that never deletes resources with upstream 8.0.4. I will re-run my torture test suite and post the results here.