DRBD 9.2.13 Testing on 3-Replica Volumes – All Replicas Became Outdated

Hello!

We recently conducted tests with DRBD version 9.2.13 in a Kubernetes environment, during which we observed that all replicas became outdated. For your reference, the kernel logs and the resource dump are attached.

Test Overview

  1. Pod and PVC Creation:
  • A large number of pods is created in a loop by deploying corresponding PersistentVolumeClaims (PVCs) and deployments.
  • Initially, the pods are scheduled on nodes that host the DRBD data replicas.
  1. Pod Relocation
  • The script then moves the pods to nodes that do not host local DRBD replicas.
  • This relocation is performed by checking the DRBD status and patching the deployments to force the pods onto nodes without local replicas.
  1. Chaos Monkey – DRBD Connection Disruption
    The script repeatedly executes a chaos monkey routine that simulates network disruptions affecting DRBD connections. For each iteration:
  • Node Iteration: The script iterates over the nodes in a random order.
  • Traffic Drop: On each node, it identifies the DRBD resource ports used by the test-created PVCs and applies iptables rules to drop traffic on both the INPUT and OUTPUT chains.
  • Random Break Duration: The connection disruption lasts for a random duration ranging from 15 to 45 seconds.
  • Traffic Restoration: After the break, the iptables rules are removed to restore traffic.
  • Random Pause: The script then pauses for a random duration between 5 to 15 seconds before processing the next node.

During the tests, all replicas of one of the resources switched to an outdated state.

Recovery of these resources remains unclear. I attempted to recover them by performing drbdadm disconnect and drbdadm connect, as well as by disabling quorum with the command:
drbdsetup resource-options $resource_name --quorum off

However, this approach did not help because on the node with the diskless replica the process hangs in a D state, making it impossible to recover the resource without rebooting the node.

I would appreciate any insights or suggestions on how to effectively recover these resources—specifically, if anyone has encountered a similar issue with a diskless replica hanging in a D state and found a workaround that avoids rebooting the node, and whether it is even possible to prevent the occurrence of outdated replicas altogether.

Kernel logs:

Resource dump:

drbdadm dump pvc-742a2fd2-a6de-4450-a600-dac7693c758d
# resource pvc-742a2fd2-a6de-4450-a600-dac7693c758d on storage-load-test-0: not ignored, not stacked
# defined at /var/lib/linstor.d/pvc-742a2fd2-a6de-4450-a600-dac7693c758d.res:6
resource pvc-742a2fd2-a6de-4450-a600-dac7693c758d {
    on storage-load-test-0 {
        node-id 3;
        volume 0 {
            disk {
                discard-zeroes-if-aligned  no;
                rs-discard-granularity 4096;
            }
            device       minor 1002;
            disk         none;
            meta-disk    internal;
        }
    }
    on storage-load-test-1 {
        node-id 1;
        volume 0 {
            disk {
                discard-zeroes-if-aligned  no;
                rs-discard-granularity 4096;
            }
            device       minor 1002;
            disk         /dev/drbd/this/is/not/used;
            meta-disk    internal;
        }
    }
    on storage-load-test-2 {
        node-id 0;
        volume 0 {
            disk {
                discard-zeroes-if-aligned  no;
                rs-discard-granularity 4096;
            }
            device       minor 1002;
            disk         /dev/drbd/this/is/not/used;
            meta-disk    internal;
        }
    }
    on storage-load-test-3 {
        node-id 2;
        volume 0 {
            disk {
                discard-zeroes-if-aligned  no;
                rs-discard-granularity 4096;
            }
            device       minor 1002;
            disk         /dev/drbd/this/is/not/used;
            meta-disk    internal;
        }
    }
    connection {
        host storage-load-test-0         address         ipv4 172.17.1.2:7002;
        host storage-load-test-1         address         ipv4 172.17.1.3:7002;
        net {
            _name        storage-load-test-1;
        }
    }
    connection {
        host storage-load-test-0         address         ipv4 172.17.1.2:7002;
        host storage-load-test-2         address         ipv4 172.17.1.4:7002;
        net {
            _name        storage-load-test-2;
        }
    }
    connection {
        host storage-load-test-0         address         ipv4 172.17.1.2:7002;
        host storage-load-test-3         address         ipv4 172.17.1.5:7002;
        net {
            _name        storage-load-test-3;
        }
    }
    options {
        on-no-data-accessible suspend-io;
        on-no-quorum     suspend-io;
        on-suspended-primary-outdated force-secondary;
        quorum           majority;
        quorum-minimum-redundancy   2;
    }
    net {
        cram-hmac-alg    sha1;
        shared-secret    yDRvjadO/pjhwazpAHUF;
        protocol           C;
        rr-conflict      retry-connect;
        verify-alg       crct10dif-pclmul;
    }
}

resource status

linstor r l -r pvc-742a2fd2-a6de-4450-a600-dac7693c758d
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node                ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-742a2fd2-a6de-4450-a600-dac7693c758d ┊ storage-load-test-0 ┊ 7002 ┊ InUse  ┊ Ok    ┊ Diskless ┊ 2025-03-30 22:07:58 ┊
┊ pvc-742a2fd2-a6de-4450-a600-dac7693c758d ┊ storage-load-test-1 ┊ 7002 ┊ Unused ┊ Ok    ┊ Outdated ┊ 2025-03-30 22:05:02 ┊
┊ pvc-742a2fd2-a6de-4450-a600-dac7693c758d ┊ storage-load-test-2 ┊ 7002 ┊ Unused ┊ Ok    ┊ Outdated ┊ 2025-03-30 22:04:59 ┊
┊ pvc-742a2fd2-a6de-4450-a600-dac7693c758d ┊ storage-load-test-3 ┊ 7002 ┊ Unused ┊ Ok    ┊ Outdated ┊ 2025-03-30 22:05:02 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


drbdadm status pvc-742a2fd2-a6de-4450-a600-dac7693c758d
pvc-742a2fd2-a6de-4450-a600-dac7693c758d role:Primary suspended:no-data,quorum
  disk:Diskless quorum:no open:yes blocked:upper
  storage-load-test-1 role:Secondary
    peer-disk:Outdated
  storage-load-test-2 connection:StandAlone
  storage-load-test-3 role:Secondary
    peer-disk:Outdated


drbdadm disconnect pvc-742a2fd2-a6de-4450-a600-dac7693c758d

drbdadm status pvc-742a2fd2-a6de-4450-a600-dac7693c758d
pvc-742a2fd2-a6de-4450-a600-dac7693c758d role:Primary suspended:no-data,quorum
  disk:Diskless quorum:no open:yes blocked:upper
  storage-load-test-1 connection:StandAlone
  storage-load-test-2 connection:StandAlone
  storage-load-test-3 connection:StandAlone

drbdadm connect pvc-742a2fd2-a6de-4450-a600-dac7693c758d
[root@storage-load-test-0 /]# drbdadm status pvc-742a2fd2-a6de-4450-a600-dac7693c758d
pvc-742a2fd2-a6de-4450-a600-dac7693c758d role:Primary suspended:no-data,quorum
  disk:Diskless quorum:no open:yes blocked:upper
  storage-load-test-1 role:Secondary
    peer-disk:Outdated
  storage-load-test-2 role:Secondary
    peer-disk:Outdated
  storage-load-test-3 role:Secondary
    peer-disk:Outdated


@phil_reisner hello. maybe you can help us with this issue?

It appears that pvc-742a2fd2-a6de-4450-a600-dac7693c758d is InUse on storage-load-test-0. What service is using it? As it appears to be blocked, those processes must be killed first. Once that is done you may perform a drbdadm secondary --force for the resource on that node.

If you can share the /proc/<pid>/stack for the ‘D’ state processes here that could provide some more insight on what is happening in these tests.

Hi!

We reproduced the situation.

$ linstor r l -r pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node                ┊ Port ┊ Usage  ┊ Conns                           ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 ┊ storage-load-test-0 ┊ 7100 ┊ InUse  ┊ StandAlone(storage-load-test-2) ┊ Diskless ┊ 2025-04-16 15:50:05 ┊
┊ pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 ┊ storage-load-test-1 ┊ 7100 ┊ Unused ┊ Ok                              ┊ Outdated ┊ 2025-04-16 15:48:55 ┊
┊ pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 ┊ storage-load-test-2 ┊ 7100 ┊ Unused ┊ Connecting(storage-load-test-0) ┊ Outdated ┊ 2025-04-16 15:48:59 ┊
┊ pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 ┊ storage-load-test-3 ┊ 7100 ┊ Unused ┊ Ok                              ┊ Outdated ┊ 2025-04-16 15:48:59 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

[root@storage-load-test-0 /]# drbdadm status pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5
pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 role:Primary suspended:no-data,quorum
  disk:Diskless quorum:no open:yes blocked:upper
  storage-load-test-1 role:Secondary
    peer-disk:Outdated
  storage-load-test-2 connection:StandAlone
  storage-load-test-3 role:Secondary
    peer-disk:Outdated


root@storage-load-test-0:~# ps uax | grep -P '\s+D.*'
root     1735529  0.0  0.0      0     0 ?        D    18:50   0:00 [jbd2/drbd1100-8]
root     1850867  0.0  0.0   1600     4 ?        D    19:04   0:00 sync /var/log/flog/fake_wauSjuhzkGeY.log

root@storage-load-test-0:~# cat /proc/1735529/stack 
[<0>] __wait_on_buffer+0x34/0x40
[<0>] jbd2_journal_commit_transaction+0x12a8/0x1670
[<0>] kjournald2+0xa9/0x280
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x1f/0x30

root@storage-load-test-0:~# cat /proc/1850867/stack 
[<0>] jbd2_log_wait_commit+0xaf/0x120
[<0>] jbd2_complete_transaction+0x64/0xb0
[<0>] ext4_fc_commit+0x19a/0x1d0
[<0>] ext4_sync_file+0x304/0x330
[<0>] vfs_fsync_range+0x46/0x90
[<0>] __x64_sys_fsync+0x38/0x70
[<0>] do_syscall_64+0x59/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x62/0xcc

Thanks for that output. It seems as though you have filesystem processes in the D state here that appear to be using the DRBD device. The state of InUse on the resource listing, and the open:yes blocked:upper in the drbdadm status for the pvc on the node support this.

These filesystem processes are using this DRBD device, so in order to resolve the cluster state in terms of DRBD, you must unmount the filesystem on that cluster node first, or otherwise resolve the issues on the upper (filesystem) level that is causing them to use the DRBD device.

Unfortunately, the process (sync /var/log/flog/fake_wauSjuhzkGeY.log) is in D state and cannot be easily removed.
As a result, the situation is a deadlock — the process can’t be killed, and DRBD can’t be recovered :frowning:

Below is the full output of the resource status on all cluster nodes:

--storage-load-test-0
drbdsetup status pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 --verbose
pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 node-id:3 role:Primary suspended:no-data force-io-failures:no
  volume:0 minor:1100 disk:Diskless client:yes backing_dev:none quorum:yes open:yes blocked:upper
  storage-load-test-1 node-id:0 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no
  storage-load-test-2 node-id:1 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no
  storage-load-test-3 node-id:2 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no

--storage-load-test-1
drbdsetup status pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 --verbose
pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 node-id:0 role:Secondary suspended:quorum force-io-failures:no
  volume:0 minor:1100 disk:Outdated backing_dev:/dev/ssd-nvme/pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5_00000 quorum:no open:no blocked:upper
  storage-load-test-0 node-id:3 connection:Connected role:Primary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Diskless peer-client:yes resync-suspended:no
  storage-load-test-2 node-id:1 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:WFBitMapT peer-disk:Outdated resync-suspended:no
  storage-load-test-3 node-id:2 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no

--storage-load-test-2
drbdsetup status pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 --verbose
pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 node-id:1 role:Secondary suspended:quorum force-io-failures:no
  volume:0 minor:1100 disk:Outdated backing_dev:/dev/ssd-nvme/pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5_00000 quorum:no open:no blocked:upper
  storage-load-test-0 node-id:3 connection:Connected role:Primary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Diskless peer-client:yes resync-suspended:no
  storage-load-test-1 node-id:0 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no
  storage-load-test-3 node-id:2 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no

--storage-load-test-3
drbdsetup status pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 --verbose
pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5 node-id:2 role:Secondary suspended:quorum force-io-failures:no
  volume:0 minor:1100 disk:Outdated backing_dev:/dev/ssd-nvme/pvc-eea2b998-7321-46cc-9a6a-59cc576a7bd5_00000 quorum:no open:no blocked:upper
  storage-load-test-0 node-id:3 connection:Connected role:Primary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Diskless peer-client:yes resync-suspended:no
  storage-load-test-1 node-id:0 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no
  storage-load-test-2 node-id:1 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:WFBitMapT peer-disk:Outdated resync-suspended:no

Am I correct in understanding that the most up-to-date data is on the storage-load-test-2 node, since the replication state of its volumes is Established?

The replication status of Established would not necessarily indicate that, no, this would only indicate the status of the tcp connection in this case.

Next to disk is where you would find the status that would indicate UpToDate status, in this case all of your Diskful replicas are showing as Outdated. This can happen when DRBD cannot be sure of which copy has the most current data (due to issues such as lack of proper fencing or quorum during certain sequences of events) so you as the administrator must choose. You might do so by disconnecting the replicas and mounting them separately to examine the data, and then once you have decided, perform a drbdadm primary <res> with the --force flag on the node that you wish to become the SyncSource.