Newbie here on DRDB and reactor, my current setup is to use both (from PPA repo) to deploy a Beegfs management HA cluster (2 nodes + 1 tiebreaker). Basic setup with DRBD resource went well, I can failover drbd resource with ‘drbdadm –primary’ cmd, but when setup with drbd-reactor, always observed an error.
Below is my promoter config, bassically start 2 ocf agents to configure VIPs, and mount drbd volume backing file system, then last start beegfs-mgmtd.service which will access mounted filesystem.
[[promoter]]
id = "r0"
[promoter.resources.r0]
dependencies-as = "Requires"
target-as = "Requires"
start = [
"""ocf:heartbeat:IPaddr2 vip_host_mgmt \
ip=10.14.74.200 \
cidr_netmask=21 \
nic=bond0 \
flush_routes=1 \
op monitor interval=30s""",
"""ocf:heartbeat:IPaddr2 vip_host_beegfs \
ip=172.100.1.2 \
cidr_netmask=24 \
nic=bond1 \
flush_routes=1 \
op monitor interval=30s""",
"""ocf:heartbeat:Filesystem fs_mgmt_tgt \
device=/dev/drbd/by-res/r0/0 \
directory=/mnt/mgmt_tgt \
fstype=ext4 \
options=sync,noatime,nodiratime \
run_fsck=no""",
"beegfs-mgmtd.service",
]
on-drbd-demote-failure = "reboot"
on-quorum-loss = "freeze"
On primary node, after executing ‘disable --now’ cmd, the failover didn’t finish in an expected timeframe, e.g., a few seconds, the above services will only be up on secondary node after a long time, seems after some timeouts. The below is verbose status log from primary node, seems it didn’t stop services in reversed order of start array. Instead, it deactivated both VIPs first, then tried to umount volume, but mgmtd.service still running, so umount failed. Then last stopped mgmtd service.
My understanding is that witth systemd as default runner, no need to explicitly define stop order, reactor should automatically reverse with start array list, not sure if anything I missed?
beegfs@beegfs-mgmt02:/etc/drbd-reactor.d$ sudo drbd-reactorctl status --verbose r0
/etc/drbd-reactor.d/r0.toml:
Promoter: Currently active on node 'beegfs-mgmt01'
○ drbd-services@r0.target - Services for DRBD resource r0
Loaded: loaded (/usr/lib/systemd/system/drbd-services@.target; static)
Drop-In: /run/systemd/system/drbd-services@r0.target.d
└─reactor-50-before.conf, reactor.conf
Active: inactive (dead) since Mon 2025-12-08 13:04:32 CST; 2h 47min ago
Docs: man:drbd-services@.target(7)
Dec 08 11:58:11 beegfs-mgmt02 systemd[1]: Reached target drbd-services@r0.target - Services for DRBD resource r0.
Dec 08 13:04:32 beegfs-mgmt02 systemd[1]: Stopped target drbd-services@r0.target - Services for DRBD resource r0.
○ drbd-promote@r0.service - Promotion of DRBD resource r0
Loaded: loaded (/usr/lib/systemd/system/drbd-promote@.service; static)
Drop-In: /run/systemd/system/drbd-promote@r0.service.d
└─reactor.conf
Active: inactive (dead) since Mon 2025-12-08 13:08:15 CST; 2h 43min ago
Duration: 1h 10min 4.126s
Docs: man:drbd-promote@.service
CPU: 7ms
Dec 08 11:58:11 beegfs-mgmt02 systemd[1]: Starting drbd-promote@r0.service - Promotion of DRBD resource r0...
Dec 08 11:58:11 beegfs-mgmt02 systemd[1]: Finished drbd-promote@r0.service - Promotion of DRBD resource r0.
Dec 08 13:08:15 beegfs-mgmt02 systemd[1]: Stopping drbd-promote@r0.service - Promotion of DRBD resource r0...
Dec 08 13:08:15 beegfs-mgmt02 drbd-r0[37081]: r0: State change failed: (-12) Device is held open by someone
Dec 08 13:08:15 beegfs-mgmt02 drbd-r0[37081]: additional info from kernel:
Dec 08 13:08:15 beegfs-mgmt02 drbd-r0[37081]: /dev/drbd0 open_cnt:1, writable:1; list of openers follows
Dec 08 13:08:15 beegfs-mgmt02 drbd-r0[37081]: drbd0 opened by mount (pid 17834) at 2025-12-08 17:58:11.178
Dec 08 13:08:15 beegfs-mgmt02 systemd[1]: drbd-promote@r0.service: Deactivated successfully.
Dec 08 13:08:15 beegfs-mgmt02 systemd[1]: Stopped drbd-promote@r0.service - Promotion of DRBD resource r0.
○ ocf.rs@vip_host_mgmt_r0.service - drbd-reactor controlled ocf.rs@vip_host_mgmt_r0
Loaded: loaded (/usr/lib/systemd/system/ocf.rs@.service; static)
Drop-In: /run/systemd/system/ocf.rs@vip_host_mgmt_r0.service.d
└─reactor.conf
Active: inactive (dead) since Mon 2025-12-08 13:08:15 CST; 2h 43min ago
Duration: 1h 10min 3.090s
Main PID: 17612 (code=exited, status=0/SUCCESS)
Status: "IPaddr2:vip_host_mgmt_r0: about to exec stop"
CPU: 4.975s
Dec 08 11:58:11 beegfs-mgmt02 ocf-rs-wrapper[17616]: Dec 08 11:58:11 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /run/resource-agents/send_arp-10.14.74.200 bond0 10.14.74.200 auto not_used not_used
Dec 08 11:58:11 beegfs-mgmt02 ocf-rs-wrapper[17612]: INFO [ocf_rs_wrapper] IPaddr2:vip_host_mgmt_r0: monitoring every 30 seconds
Dec 08 11:58:11 beegfs-mgmt02 systemd[1]: Started ocf.rs@vip_host_mgmt_r0.service - drbd-reactor controlled ocf.rs@vip_host_mgmt_r0.
Dec 08 11:58:12 beegfs-mgmt02 ocf-rs-wrapper[17686]: Dec 08 11:58:12 INFO:
Dec 08 13:08:14 beegfs-mgmt02 systemd[1]: Stopping ocf.rs@vip_host_mgmt_r0.service - drbd-reactor controlled ocf.rs@vip_host_mgmt_r0...
Dec 08 13:08:15 beegfs-mgmt02 ocf-rs-wrapper[17612]: INFO [ocf_rs_wrapper] IPaddr2:vip_host_mgmt_r0: about to exec stop
Dec 08 13:08:15 beegfs-mgmt02 ocf-rs-wrapper[37027]: Dec 08 13:08:15 INFO: IP status = ok, IP_CIP=
Dec 08 13:08:15 beegfs-mgmt02 systemd[1]: ocf.rs@vip_host_mgmt_r0.service: Deactivated successfully.
Dec 08 13:08:15 beegfs-mgmt02 systemd[1]: Stopped ocf.rs@vip_host_mgmt_r0.service - drbd-reactor controlled ocf.rs@vip_host_mgmt_r0.
Dec 08 13:08:15 beegfs-mgmt02 systemd[1]: ocf.rs@vip_host_mgmt_r0.service: Consumed 4.975s CPU time, 4.5M memory peak, 0B memory swap peak.
○ ocf.rs@vip_host_beegfs_r0.service - drbd-reactor controlled ocf.rs@vip_host_beegfs_r0
Loaded: loaded (/usr/lib/systemd/system/ocf.rs@.service; static)
Drop-In: /run/systemd/system/ocf.rs@vip_host_beegfs_r0.service.d
└─reactor.conf
Active: inactive (dead) since Mon 2025-12-08 13:08:14 CST; 2h 43min ago
Duration: 1h 10min 2.149s
Main PID: 17688 (code=exited, status=0/SUCCESS)
Status: "IPaddr2:vip_host_beegfs_r0: about to exec stop"
CPU: 4.900s
Dec 08 11:58:11 beegfs-mgmt02 ocf-rs-wrapper[17690]: Dec 08 11:58:11 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /run/resource-agents/send_arp-172.100.1.2 bond1 172.100.1.2 auto not_used not_used
Dec 08 11:58:11 beegfs-mgmt02 ocf-rs-wrapper[17688]: INFO [ocf_rs_wrapper] IPaddr2:vip_host_beegfs_r0: monitoring every 30 seconds
Dec 08 11:58:11 beegfs-mgmt02 systemd[1]: Started ocf.rs@vip_host_beegfs_r0.service - drbd-reactor controlled ocf.rs@vip_host_beegfs_r0.
Dec 08 11:58:12 beegfs-mgmt02 ocf-rs-wrapper[17755]: Dec 08 11:58:12 INFO:
Dec 08 13:08:13 beegfs-mgmt02 systemd[1]: Stopping ocf.rs@vip_host_beegfs_r0.service - drbd-reactor controlled ocf.rs@vip_host_beegfs_r0...
Dec 08 13:08:14 beegfs-mgmt02 ocf-rs-wrapper[17688]: INFO [ocf_rs_wrapper] IPaddr2:vip_host_beegfs_r0: about to exec stop
Dec 08 13:08:14 beegfs-mgmt02 ocf-rs-wrapper[36973]: Dec 08 13:08:14 INFO: IP status = ok, IP_CIP=
Dec 08 13:08:14 beegfs-mgmt02 systemd[1]: ocf.rs@vip_host_beegfs_r0.service: Deactivated successfully.
Dec 08 13:08:14 beegfs-mgmt02 systemd[1]: Stopped ocf.rs@vip_host_beegfs_r0.service - drbd-reactor controlled ocf.rs@vip_host_beegfs_r0.
Dec 08 13:08:14 beegfs-mgmt02 systemd[1]: ocf.rs@vip_host_beegfs_r0.service: Consumed 4.900s CPU time, 4.5M memory peak, 0B memory swap peak.
× ocf.rs@fs_mgmt_tgt_r0.service - drbd-reactor controlled ocf.rs@fs_mgmt_tgt_r0
Loaded: loaded (/usr/lib/systemd/system/ocf.rs@.service; static)
Drop-In: /run/systemd/system/ocf.rs@fs_mgmt_tgt_r0.service.d
└─reactor.conf
Active: failed (Result: exit-code) since Mon 2025-12-08 13:08:13 CST; 2h 43min ago
Duration: 1h 9min 21.398s
Main PID: 17757 (code=exited, status=1/FAILURE)
Status: "Filesystem:fs_mgmt_tgt_r0: about to exec stop"
CPU: 6.121s
Dec 08 13:07:53 beegfs-mgmt02 ocf-rs-wrapper[36831]: Dec 08 13:07:53 INFO: Running stop for /dev/drbd/by-res/r0/0 on /mnt/mgmt_tgt
Dec 08 13:07:53 beegfs-mgmt02 ocf-rs-wrapper[36831]: Dec 08 13:07:53 INFO: Trying to unmount /mnt/mgmt_tgt
Dec 08 13:08:03 beegfs-mgmt02 ocf-rs-wrapper[36831]: Killed
Dec 08 13:08:13 beegfs-mgmt02 ocf-rs-wrapper[36831]: Killed
Dec 08 13:08:13 beegfs-mgmt02 ocf-rs-wrapper[36831]: ocf-exit-reason:Couldn't unmount /mnt/mgmt_tgt within given timeout
Dec 08 13:08:13 beegfs-mgmt02 ocf-rs-wrapper[36831]: ocf-exit-reason:Couldn't unmount /mnt/mgmt_tgt, giving up!
Dec 08 13:08:13 beegfs-mgmt02 systemd[1]: ocf.rs@fs_mgmt_tgt_r0.service: Control process exited, code=exited, status=1/FAILURE
Dec 08 13:08:13 beegfs-mgmt02 systemd[1]: ocf.rs@fs_mgmt_tgt_r0.service: Failed with result 'exit-code'.
Dec 08 13:08:13 beegfs-mgmt02 systemd[1]: Stopped ocf.rs@fs_mgmt_tgt_r0.service - drbd-reactor controlled ocf.rs@fs_mgmt_tgt_r0.
Dec 08 13:08:13 beegfs-mgmt02 systemd[1]: ocf.rs@fs_mgmt_tgt_r0.service: Consumed 6.121s CPU time, 6.2M memory peak, 0B memory swap peak.
○ beegfs-mgmtd.service - drbd-reactor controlled beegfs-mgmtd
Loaded: loaded (/usr/lib/systemd/system/beegfs-mgmtd.service; disabled; preset: enabled)
Drop-In: /etc/systemd/system/beegfs-mgmtd.service.d
└─override.conf
/run/systemd/system/beegfs-mgmtd.service.d
└─reactor.conf
Active: inactive (dead) since Mon 2025-12-08 13:07:32 CST; 2h 44min ago
Duration: 1h 6min 21.147s
Docs: http://www.beegfs.com/content/documentation/
Main PID: 17838 (code=exited, status=0/SUCCESS)
CPU: 7.492s
Dec 08 11:58:11 beegfs-mgmt02 systemd[1]: Started beegfs-mgmtd.service - drbd-reactor controlled beegfs-mgmtd.
Dec 08 13:04:32 beegfs-mgmt02 systemd[1]: Stopping beegfs-mgmtd.service - drbd-reactor controlled beegfs-mgmtd...
Dec 08 13:07:32 beegfs-mgmt02 systemd[1]: beegfs-mgmtd.service: Deactivated successfully.
Dec 08 13:07:32 beegfs-mgmt02 systemd[1]: Stopped beegfs-mgmtd.service - drbd-reactor controlled beegfs-mgmtd.
Dec 08 13:07:32 beegfs-mgmt02 systemd[1]: beegfs-mgmtd.service: Consumed 7.492s CPU time, 6.3M memory peak, 0B memory swap peak.
Eventually previous secondary node will become primary and services finished to up, after re-enabling promoter on primary node, and checking its status, I can see below failed filesystem ocf agent service.
beegfs@beegfs-mgmt02:/etc/drbd-reactor.d$ sudo drbd-reactorctl status r0
/etc/drbd-reactor.d/r0.toml:
Promoter: Currently active on node 'beegfs-mgmt01'
○ drbd-services@r0.target
○ ├─ drbd-promote@r0.service
○ ├─ ocf.rs@vip_host_mgmt_r0.service
○ ├─ ocf.rs@vip_host_beegfs_r0.service
× ├─ ocf.rs@fs_mgmt_tgt_r0.service
○ └─ beegfs-mgmtd.service