For sync/tcp drbd replication there are two links.
If I unplug the first link, then the failover to the second link works fine. However, if I unplug the second link, both cluster nodes are immediately powered off.
Is it configurable behaviour ?, any other options if all drbd links failed ?
Well, if both links are offline and the nodes do not see each other anymore, what would you expect them to do? I think you can configure them to stay online, but then you are in for a split-brain sooner than later.
Well, if both links are offline and the nodes do not see each other anymore, what would you expect them to do?
The cluster nodes still see each other, dedicated links are offline only for drbd replication. For instance, drbd/pacemaker may allow reads and block all writes from clients to drbd primary volumes.
Or even accept writes from clients for drbd primary and waiting while drbd replication links will back online.
I’m fully agree that it may cause split brain in the future, but I would clarify, the cluster nodes shutdown is the only way in this case, or there are other possibilities…
This is most likely configured using the fencing and fence-peer keyword in your DRBD configuration. If the nodes are powering off then you likely have it configured with fencing resource-and-stonith;. The default here is dont-care.
It might also be possible that Pacemaker is fencing, and powering-down, the nodes. Perhaps the Corosync communication is configured to use the same links?
You may also want to consider both changing the fencing action from off to reboot, and configuring a pcmk_delay_base to the fencing action of one of the nodes. By delaying the fencing action of one node you should always be left with one survivor and not have both power down or reboot at the same time.