DRBD + KeepaliveD notify_* scripts

cwik · April 15, 2024, 2:08pm

We love to keep things simple, and for us KeepaliveD offers a simple way to create a high-availability floating IP address. We use it a lot for things like load balancers and SMTP servers.

Now we would like to add a DRBD resource that should be mounted on the active node. We got it to work in a failover situation with a simple:

drbdadm primary xvdb
mount /dev/drbd1 /drbd

in the notify_master script on the secondary node.

What we haven’t yet figured out is how to fail back gracefully when the primary comes back online. It seems like we’ll need a bit of co-ordination between the nodes to get the sequence right:

Wait for DRBD to resync
Unmount DRBD on backup node
Fail over IP and mount DRBD on primary node

It seems to me like if we could somehow configure a dependency in the keepalived systemd unit so it doesn’t start until the resync is complete, that should do the trick.

Has anyone tackled this already, so we don’t have to re-invent the wheel? Or is there really a strong case for using DRBD Reactor and/or LINSTOR instead of or in addition to KeepaliveD?

BHellman · April 15, 2024, 2:39pm

I don’t think automatic fail back is wise. If there is a memory issue for example that causes the node to reboot or another hardware failure that causes random reboots, it will cause more problems/downtime than not.

I’d recommend giving Reactor a try, LINBIT (we) designed it specifically for reducing complexity in 3 node cluster setups. There is a tech guide here.

Is there a reason you’re intersted in failing back automatically?

cwik · April 15, 2024, 4:05pm

Thanks for your reply!

It seems we were trying to solve a problem that didn’t need to be solved. Your suggestion is a good one: when failover happens, don’t fail back. Once the failed server is back online and has resynced, it becomes the new backup.

I downloaded the guide you referenced to see how you recommend setting it up. It wasn’t entirely obvious at first read that this is how it works with drbd-reactor and OCF resource agent, but I assume so.

I think we can achieve the same result as what you describe by using nopreempt and setting state to BACKUP on both nodes of our KeepaliveD cluster.

Topic		Replies	Views
Configure drbd on proxmox Proxmox VE	2	843	September 11, 2024
DRBD multipath load balancing and failover test DRBD drbd	0	123	November 23, 2024
Optimize NFS Client "reconnect" with DRBD reactor HA setup? DRBD Reactor drbd , drbd-reactor	0	27	April 29, 2025
Linstor HA with Proxmox: systemd "degraded" after reboot of backup node LINSTOR drbd , linstor , drbd-reactor	7	141	May 13, 2025
drbd-reactor v1.9.0 Release Announcements drbd	0	5	July 8, 2025

DRBD + KeepaliveD notify_* scripts

Related topics