I am currently using a three nodes DRBD setup with version 9.2.13. All three replicas are diskful, and the auto-promote
parameter is enabled. Through a series of operations, a split-brain scenario is triggered between two nodes. The process is as follows:
-
Mount the storage volume on Node C to a directory, causing C to automatically become the primary node.
-
Disconnect the network between Nodes B and C, then write data to C.
-
Restore the B-C network connection. At this point, B synchronizes data from C, and its state becomes
Inconsistent
. -
During synchronization, unmount C and mount B, causing B to automatically become the primary node.
-
Disconnect B and C again. Now, C is
Outdated
, and B remainsInconsistent
. -
Restore the B-C connection and wait for synchronization to complete. Now, both B and C become
Outdated
, but B remains the primary node. At this stage, all three nodes (A, B, C) share the samecurrent UUID
. -
Unmount B. During this step, there is a certain probability that the
current UUID
of A and C changes, while B’scurrent UUID
remains unchanged, resulting in a split-brain condition, as shown in the figure below.
Step 7 sometimes causes problems, while other times it does not, requiring repeated testing.
I suspect the issue occurs because unmounting in Step 7 might modify data. If A and C modify data while B does not, this could lead to a mismatch in current UUID
.
I would greatly appreciate your help in resolving this issue. Thank you very much!