DRBD 9.3.0: resync may stall in a 3-node cluster leading to blocked I/O when promoting an Inconsistent node

buaafanrui · March 8, 2026, 3:59am

Environment

DRBD version: 9.3.0
Cluster size: 3 diskful nodes
Resource: r1
Nodes: drbd1, drbd2, drbd3
Protocol: C
Single primary (no dual-primary)

Problem Description

In a 3-node DRBD cluster we occasionally encounter a situation where resynchronization stalls and all write I/O to the mounted filesystem blocks indefinitely.

When this happens:

resync progress stops
writes to the mounted filesystem hang
the filesystem cannot be unmounted (umount blocks)
DRBD status shows peer / dependency suspended states between nodes

Restarting DRBD on one node (drbdadm down/up) causes the resync topology to change and the system recovers.

Reproduction Scenario

The issue can be reproduced with the following sequence.

Nodes involved:

drbd1
drbd2
drbd3

Initial state: all nodes are UpToDate.

Step 1

Promote drbd3 to Primary and mount the filesystem.

Step 2

Disconnect the network between drbd3 and drbd2.

Step 3

Write data on drbd3.

Result:

drbd2 becomes Outdated

Step 4

Restore the network between drbd3 and drbd2.

Result:

drbd3 → drbd2 resync starts

Step 5

Promote drbd2 to Primary and mount the filesystem.

Then disconnect the network between drbd2 and drbd3.

Because drbd2 is now Primary and receives writes:

drbd3 becomes Outdated

Step 6

Restore the network between drbd2 and drbd3.

Current state becomes:

drbd1 : UpToDate
drbd2 : Inconsistent (Primary)
drbd3 : Outdated

DRBD chooses the following resync direction:

drbd3 → drbd2

Step 7

Write data on drbd2.

At this point the problem occurs.

Observed Behavior

The resync from drbd3 → drbd2 stalls (no further progress).
The filesystem cannot be unmounted.

DRBD status shows the following relationship:

drbd1 → drbd2
    drbd1: resync-suspended: peer
    drbd2: resync-suspended: dependency

This suggests that:

drbd1 → drbd2 resync is waiting for another resync to complete
drbd2 ← drbd3 is expected to complete first

However:

drbd3 → drbd2 resync shows "suspended: no"
but makes no progress.

As a result, the whole system appears to be stuck.

Recovery

If we restart DRBD on drbd3:

drbdadm down r1
drbdadm up r1

The resync topology changes to:

drbd1 → drbd2
drbd1 → drbd3

Both nodes resync from drbd1, and the cluster returns to normal operation.

Question

Is this behavior expected in DRBD 9.3.0 when a node that is still Inconsistent is promoted to Primary and receives writes?

Or could this indicate a resync scheduling issue where DRBD chooses a suboptimal sync source (Outdated node) and the resync pipeline becomes stalled?

Any guidance on how to avoid or diagnose this situation would be appreciated.

buaafanrui · March 8, 2026, 4:04am

this is drbdadm status results

Topic		Replies	Views
drbd-9.2.14-rc.1 and drbd-9.3.0-rc.1 Release Announcements drbd	0	142	May 27, 2025
3 node cluster / some vdisks stays on status inconsistent DRBD proxmox	2	88	April 28, 2026
3rd node can't join as secondary with 2 primaries DRBD drbd	2	116	July 11, 2025
Verify consistently fails after rebooting secondary node DRBD drbd	1	113	January 31, 2025
drbd-9.2.13 Release Announcements drbd	0	166	March 24, 2025