How to add a TieBreaker node

I want to create a cluster of two nodes in which I want to organize HA for the linstor controller (all from the official instructions). I want to use the third node purely to form a quorum, but this node will be located in a different network, and for drbd there will be a separate network, while these networks will be available on different interfaces on the nodes. For example, the network for drbd replication will be 10.10.10.0/24, and the node for the quorum for the linstor-controller resource will be in the 11.11.11.0.24 network.
And so how can I make everything work and exchange data on the necessary physical interfaces. Letโ€™s say I can create interfaces for nodes for drbd replication

linstor node interface create node1 data 10.10.10.1
linstor node interface create node2 data 10.10.10.2

And I can create separately for drbd quorum

linstor node interface create node1 quorum 11.11.11.1
linstor node interface create node2 quorum 11.11.11.2
linstor node interface create qdevice quorum 11.11.11.3

Then I can specify PrefNic

linstor node set-property node1 PrefNic data
linstor node set-property node2 PrefNic data
linstor node set-property qdevice PrefNic quorum

Is that enough? If I understand, the PrefNic parameter only indicates preference, which means there is a chance that traffic will go through the wrong interfaces. Tell me, is there any way to implement my idea?

I also saw in the documentation a method via Path, but I will use linstor with proxmox, so I cannot manually specify the route for each new resource manually

Iโ€™m going to assume you mean the instructions here which store the database on a drbd resource, and use drbd-reactor - not the older instructions which used Pacemaker.

If so, thereโ€™s not really any distinction between โ€œdataโ€ and โ€œquorumโ€. Both are just drbd resources. One happens to hold your user data, and one happens to hold the Linstor controller database. They will all need three participating nodes to avoid split-brain situations.

Iโ€™m only aware of PrefNic as a node-level setting, not a resource-level setting.

Also, Iโ€™m not aware of any way you can get node1โ†โ†’node2 drbd traffic to bind to one interface address, but node1โ†โ†’node3 and node2โ†โ†’node3 drbd traffic to bind to a different address. All three nodes will need to talk to each other, even if node3 is only โ€œtiebreakerโ€.

However: as far as I know, thereโ€™s no reason why all three interfaces have to be on the same subnet: itโ€™s just TCP after all. That is, I donโ€™t see why node1 and node2 need 11.11.11 addresses. As long as you have some routability between them (e.g. a VPN tunnel) I think it would Just Workโ„ข with node1=10.10.10.1, node2=10.10.10.2, node3=11.11.11.3

I have not tested it, but if you use PrefNic, I think itโ€™s likely that packets from node1 to node3 will be sent with the chosen PrefNic source IP address (10.10.10.1) even if they egress through a different interface. Itโ€™d be up to you to make sure your tunnel accepts that, and routes the return packets correctly. But I am speculating, youโ€™ll need to test it.

You cannot separate โ€œDRBD quorumโ€ and โ€œDRBD replicationโ€ traffic, theyโ€™re the same thing.

PrefNic isnโ€™t really a preference as much as it is the default NIC that resources will use. I can understand how that could be a confusing name.

If youโ€™re unable to configure LINSTOR HA as it is written in the LINSTOR User Guide, it might be better/simpler to periodically stop the LINSTOR Controller and take a backup of the LINSTOR database. Something like this:

# stop the controller to make sure the db is consistent
systemctl stop linstor-controller

# export the database to a file named ~/db_export
/usr/share/linstor-server/bin/linstor-database export-db ~/db_export

# start the controller to resume services
systemctl start linstor-controller

# copy the backup somewhere safe
cp ~/db_export <somewhere safe>

Then, to restore on a different node should you need to:

# copy the backup from where it is stored
cp <somewhere safe> ~/db_export 

# stop the controller 
systemctl stop linstor-controller

# import the backed up database file
/usr/share/linstor-server/bin/linstor-database import-db ~/db_export

# start the controller
systemctl start linstor-controller

Thanks for the answer. Yes, I mean storing the controller database on drbd and using drbd-reactor. My main concern is that traffic does not go through the wrong interfaces. Yes, it seems to work, but I would like to clearly indicate this. The addresses 11.11.11.0/24 are service and will still be specified on the nodes, so I would like to simply use this network for the controller database quorum, because it will be easy for me to configure a participant for drbd quorum on other devices, because they also have this network.

Thanks for the answer. But HA for the controller is not just about redundancy, it is also about availability. I would like to set up HA for proxmox and use qdevice, it works quite well, but without linstor-controller it will not be able to run virtual machines, so I need HA for the controller. In one of the linbit blogs it was written that you can use rasberry pi to form a quorum, but even so I will have to put it in a separate network, because I do not want to connect all 3 nodes to each other, but only two. But I have not found instructions anywhere on how to do this, so that I can hard-link interfaces for interaction between individual nodes.

In general, it turned out that the only thing that works is if I specify the path manually

linstor resource-connection path create node1 qdevice linstor_db path1 quorum quorum
linstor resource-connection path create node2 qdevice linstor_db path2 quorum quorum

If you do not do this, then the drbd resource will be in the โ€œconnectingโ€ state and the quorum will not be formed. Tell me, is this configuration option correct? That is, it will work correctly?

Thatโ€™s not exactly true. If the controller goes down, DRBD will continue to function. Indeed, the same is true if the satellites go down. VMs continue to run, and DRBD traffic continues to flow; there is no outage while you fix the controller.

Not having a controller means you canโ€™t change the configuration of Linstor. It means the Proxmox plugin canโ€™t talk to the Linstor API, which might mean (for example) you canโ€™t start a VM on a node which doesnโ€™t already have the resource enabled.

But you should beware that doing โ€œHAโ€ for Linstor controller has downsides too. In particular, if the controller moves to a different IP address, youโ€™ll need to update the API client (i.e. Proxmox storage.cfg) to point to the new IP address. Or you can have a DNS name which you update. Or you can create a floating virtual IP with keepalived/vrrp which you link to the controller running status (which adds complexity). In most cases, itโ€™s simpler just to fix the original controller.

Yes, I know about it. But if the controller changes its address, then the official instructions indicate that you can specify several addresses separated by commas in storage.cfg, and it will work correctly. Although virtual machines will work without the linstor controller, HA proxmox will not work, because when one of the nodes is turned off, it moves the virtual machine and starts it, but without the controller it will not be able to start, because it will not be able to connect to the controller. In general, this is the only function I need for HA proxmox to work.

1 Like

And still I have questions.
If I have specified PrefNic for the node, does this mean that DRBD replication will only go through it? Because my resource files look like this

connection
{

net
{
allow-two-primaries yes;
}
host "node2" address ipv4 10.10.10.2:7002;
host "node" address ipv4 10.10.10.1:7002;
}

That is, it doesnโ€™t matter how many interfaces I have on the node, if I donโ€™t specify โ€œPathโ€, then there will be no backup connection through other interfaces?

And the linstor database config looks like this

connection
{
path
{
host "qdevice" address ipv4 11.11.11.3:7001;
host "node2" address ipv4 11.11.11.2:7001;
}
}

connection
{
host "node2" address ipv4 10.10.10.2:7001;
host "node" address ipv4 10.10.10.1:7001;
}

So will everything work correctly? Synchronization between qdevice and nodes will be via a separate network, and between nodes via a common one?

I have quorum enabled only for linstordb and nothing else.

Hereโ€™s what the implementation looks like via linstor


โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š node      โ”Š NetInterface โ”Š IP          โ”Š Port โ”Š EncryptionType โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š +         โ”Š data         โ”Š 10.10.10.1  โ”Š      โ”Š                โ”Š
โ”Š + StltCon โ”Š default      โ”Š 10.10.10.1  โ”Š 3366 โ”Š PLAIN          โ”Š
โ”Š +         โ”Š quorum       โ”Š 11.11.11.1  โ”Š      โ”Š                โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š node2     โ”Š NetInterface โ”Š IP          โ”Š Port โ”Š EncryptionType โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š +         โ”Š data         โ”Š 10.10.10.2  โ”Š      โ”Š                โ”Š
โ”Š + StltCon โ”Š default      โ”Š 10.10.10.2  โ”Š 3366 โ”Š PLAIN          โ”Š
โ”Š +         โ”Š quorum       โ”Š 11.11.11.2  โ”Š      โ”Š                โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š qdevice   โ”Š NetInterface โ”Š IP          โ”Š Port โ”Š EncryptionType โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š + StltCon โ”Š default      โ”Š 11.11.11.3  โ”Š 3366 โ”Š PLAIN          โ”Š
โ”Š +         โ”Š quorum       โ”Š 11.11.11.3  โ”Š      โ”Š                โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š Key                  โ”Š Value  โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š Paths/path1/qdevice  โ”Š quorum โ”Š
โ”Š Paths/path1/node     โ”Š quorum โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š Key                   โ”Š Value  โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š Paths/path2/qdevice   โ”Š quorum โ”Š
โ”Š Paths/path2/node      โ”Š quorum โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Thank you - I had looked for this before and you prompted me to take a more careful look.

Initially in section 8.5 it says:

The controller parameter must be set to the IP of the node that runs the LINSTOR controller service. Only one node in a cluster can run the LINSTOR controller service at any given time. If that node fails, you will need to start the LINSTOR controller service on another node and change the controller parameter value to the new nodeโ€™s IP address.

But later in that paragraph, and again in section 8.6, it makes it clear that a comma-separated list is permitted.

With an HA LINSTOR controller, the LINSTOR controller will fail over long before Proxmox determines a node is offline and attempts to restart virtual machines. Itโ€™s been a while since Iโ€™ve tested Proxmox HA, but I believe itโ€™s somewhere around 3 minutes before Proxmox rules a host as unavailable?

Also, just sharing a quick tip for HA LINSTOR controller configurations. You can simply add a virtual IP address to DRBD Reactorโ€™s promoter plugin (just like we normally do when making any other service highly available with DRBD Reactor and Pacemaker).

Simply replace the following promotor configuration (from the LINSTOR UG):

[[promoter]]
[promoter.resources.linstor_db]
start = ["var-lib-linstor.mount", "linstor-controller.service"]

To:

[[promoter]]
[promoter.resources.linstor_db]
start = [
"""
ocf:heartbeat:IPaddr2 controller-vip1 ip=11.11.11.100 \
cidr_netmask=24 iflabel=vip1""",
"var-lib-linstor.mount", "linstor-controller.service"
]

Installing the resource-agents package is required for the IPaddr2 resource agent configuration above.

Now you only have to be concerned with using a single IP address for accessing the LINSTOR controller (and the LINSTOR GUI), you can point a DNS record to the virtual IP address if needed, etc. No more need to guess โ€œwhich IP address is my LINSTOR controller running on?โ€.

Not that you have to, itโ€™s just what I would do if I were managing Proxmox and LINSTOR in a production environment.

Thank you very much for the answer. I really think vip is a bit more convenient. Although I have no problems with using multiple IP addresses for the linstor controller, specifying multiple addresses in storage.cfg also works quite well. My main problem is with organizing the quorum, because drbd-reactor uses the drbd quorum, and I have only two nodes and one TieBreaker, and the latter is in another network. I need to organize the availability of the drbd resource for the drbd quorum so that the resource with the drbd controller database could switch and drbd-reactor would start the controller. I managed to configure this only using โ€œPathโ€ as I wrote above, but no one told me whether it would work correctly or not, because I do not fully understand the drbd architecture. Now I am already starting to consider pacemaker, but I still do not know which option is better, besides, I am not sure that pacemaker will not conflict with proxmox-ha.

After looking at your configuration, I think youโ€™re likely over-complicating the networking for a small 3-node cluster.

@kermat makes a great point. To elaborate on your configuration, as long as hosts in both of the 10.10.10.0/24 and 11.11.11.0/24 subnets can reach each other (there is a valid route), you do not need to define extra node interfaces in LINSTOR. When definining nodes for your LINSTOR cluster, simply use the โ€œreplication IPsโ€ for node1 and node2, and the โ€œquorum IPโ€ for the qdevice node.

Assuming you never intend to promote your qdevice node to primary/InUse, DRBD will never replicate data back to its peers or perform disk I/O over the network, but it will use the replication network for monitoring quorum.

Summing up some advice:

  • Do not use Pacemaker for LINSTOR HA. There is no need, especially if you already have a working DRBD Reactor deployment. And youโ€™re correct, Proxmox uses Corosync, so I would just let Proxmox exclusively use Corosync.
  • Simplify your LINSTOR networking. Either redeploy your LINSTOR cluster or delete the extra interfaces from the configuration.
  • Do not install or configure DRBD Reactor on your qdevice node. This will restrict the LINSTOR controller availability to node1 or node2 only, as intended.

With this configuration in a 3 node cluster (sometimes I call this a โ€œ2.5 nodeโ€ cluster), your qdevice node will always be the TieBreaker. Both for the HA LINSTOR database, and for any Proxmox VM disks replicated by DRBD/LINSTOR.

If you ever expand your Proxmox cluster, larger clusters will likely need a custom diskless storage pool defined to make sure the TieBreaker is confined a single node, or perhaps just certain nodes. However, for 3 node clusters, simply not defining a LINSTOR storage pool on the 3rd node is enough ensure that node will always be the TieBreaker for every LINSTOR resource.

Thank you very much for your detailed answer. But the main problem is that without using interfaces and, most importantly, โ€œPathโ€, I cannot form a quorum. drbdadm status shows an eternal connecting connection for the TieBreaker node. Because the TieBreaker node has only the 11.11.11.0/24 network, and the main nodes are both 11.11.11.0/24 and 10.10.10.0/24. And if you add nodes via linstor n ั node1 10.10.10.1, then this network will be the main one for them (because it is allocated for replication) and if you look at the traffic via tcpdump, the TieBreaker node tries to access the address 10.10.10.1, but since it does not have this network, it cannot reach it. But since you say that they can communicate with each other, then my configuration is incorrect?

And I also want to know about the routes. Do you mean something like on qdevice

ip ro add 10.10.10.1 via 11.11.11.1
ip ro add 10.10.10.2 via 11.11.11.2

But if I understand correctly, then the traffic sent from qdevice to 10.10.10.1 will be replaced with an address from the 11.11.11.0/24 network, and it will still work?

Anytime. Usually there would be a router on your network that has access to both subnets, that is were you should configure routes between different subnets. Not much different than defining a route for WAN โ†” LAN, you just need another route for LAN1 โ†” LAN2.

This way your qdevice node can actually reach the other subnet through its 11.11.11.0/24 interface. No need to get fancy with trying to route traffic on the Proxmox nodes themselves.

I guess what I was getting at in my earlier post is you do not need to use two separate NICs for separating quorum/replication, or even LINSTOR management traffic between the client and controller. Just stick to one physical interface (for LINSTOR and DRBD), let your networking infrastructure handle the routing like it normally would.

I understand what you want to say, but I donโ€™t understand how to implement it. Because this can be done only in two ways, as I understand it.

  1. Create VLANs on two physical interfaces and add them to the bridge, and then connect one interface to the router and leave the other for replication
  2. Use a switch or virtual bridge on the router for replication

I donโ€™t like the first option because of the bridges, because they can create delays in traffic, and the second is too expensive for me. Therefore, I would like to have a separate connection between the nodes that would not go beyond the nodes.

But I was really surprised that I did not think of simply creating routes. I just checked, deleted all the interfaces and simply created routes on the TieBreaker node

ip ro add 10.10.10.1 via 11.11.11.1
ip ro add 10.10.10.2 via 11.11.11.2

And it worked. Now I have a quorum without an overly complicated configuration. You helped me solve a problem that I was thinking about for a week. Thank you

1 Like

But now Iโ€™m wondering if simple routes are really better than Path which is used in drbd in my configuration above? In theory, itโ€™s the same thing, only when using Path you donโ€™t have to use forwarding, which I usually turn off in Linux systems for greater security.

No, you donโ€™t need to turn forwarding on to use static routes - only if the machine is receiving IP datagrams on one interface and forwarding them out of another. After all, your default route is just a static route.

You also donโ€™t have multiple paths AFAICS, but in any case there are some issues with path selection at the moment: Multiple DRBD Paths Behavior (which means probably best to steer clear for now)

1 Like

Ah, yeah, if you have a crossover network for node1 and node2, then you wonโ€™t be able to route โ€œthroughโ€ an external router.

But yeah, same end result (a working cluster!) by defining a couple of static routes on the qdevice node to reach the replication network through node1 and node2 in this case :+1:

1 Like