Failover cluster resource groups hangs DRBD

Hello

There is two node active/standby pacemaker cluster, configured with qdevice and wait_for_all: 1

The cluster is based on Rocky Linux 10 (6.12.0-55.25.1.el10_0.x86_64) and kmod-drbd9x-9.2.14-1.el10_0
There are two DRBD based resource groups (ha_nfs and ha_iscsi), configured to always run on the same cluster node.

Colocation Constraints:
  Started resource 'g_nfs' with Promoted resource 'ha-nfs-clone' (id: colocation-g_nfs-ha-nfs-clone-INFINITY)
    score=INFINITY
  Started resource 'g_iscsi' with Promoted resource 'ha-iscsi-clone' (id: colocation-g_iscsi-ha-iscsi-clone-INFINITY)
    score=INFINITY
  resource 'g_iscsi' with resource 'g_nfs' (id: colocation-g_iscsi-g_nfs-INFINITY)
    score=INFINITY
  resource 'ha-nfs-clone' with resource 'ha-iscsi-clone' (id: colocation-ha-nfs-clone-ha-iscsi-clone-INFINITY)
    score=INFINITY
Order Constraints:
  promote resource 'ha-nfs-clone' then start resource 'g_nfs' (id: order-ha-nfs-clone-g_nfs-mandatory)
  stop resource 'g_nfs' then demote resource 'ha-nfs-clone' (id: order-g_nfs-ha-nfs-clone-mandatory)
  promote resource 'ha-iscsi-clone' then start resource 'g_iscsi' (id: order-ha-iscsi-clone-g_iscsi-mandatory)
  stop resource 'g_iscsi' then demote resource 'ha-iscsi-clone' (id: order-g_iscsi-ha-iscsi-clone-mandatory)

There is no any I/O intensive activity, the cluster in idle state yet.
After “pcs cluster stop” (or even “pcs node standby”) on the active node, standby node can’t start p_iscsi_lun_drbd3 due to timeout, in result two resource groups failover failed.
Here is below logs:

[root@memverge2 ~]# cat /var/log/messages|grep p_iscsi_lun_drbd3
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of start operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 07:52:21 memverge2 pacemaker-controld[2525]: error: Result of start operation for p_iscsi_lun_drbd3 on memverge2: Timed out after 20s (Resource agent did not complete within 20s)
Aug 18 07:52:21 memverge2 pacemaker-attrd[2523]: notice: Setting last-failure-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> 1755492741
Aug 18 07:52:21 memverge2 pacemaker-attrd[2523]: notice: Setting fail-count-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> INFINITY
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Stop       p_iscsi_lun_drbd3              (                        memverge2 )  due to node availability
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Initiating stop operation p_iscsi_lun_drbd3_stop_0 locally on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of stop operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Result of stop operation for p_iscsi_lun_drbd3 on memverge2: OK
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:01:06 memverge2 pacemaker-attrd[2523]: notice: Setting last-failure-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: 1755492741 -> (unset)
Aug 18 08:01:06 memverge2 pacemaker-attrd[2523]: notice: Setting fail-count-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: INFINITY -> (unset)
Aug 18 08:01:06 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_lun_drbd3              (                        memverge2 )
Aug 18 08:01:06 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_lun_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:06 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 08:01:06 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_lun_drbd3 on memverge2: Not running
Aug 18 08:01:36 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_lun_drbd3              (                        memverge2 )
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_lun_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_lun_drbd3 on memverge2: Not running

[root@memverge2 ~]# cat /var/log/messages|grep -i drbd
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 2376668298: 27->all role( Secondary )
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: Committing remote state change 2376668298 (primary_nodes=0)
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: peer( Primary -> Secondary ) [remote]
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 2754260736: 27->28 conn( Disconnecting )
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: Committing remote state change 2754260736 (primary_nodes=0)
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi/31 drbd3 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: meta connection shut down by peer.
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: Terminating sender thread
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: Starting sender thread (peer-node-id 27)
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: Connection closed
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected exit code 0
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: conn( TearDown -> Unconnected ) [disconnected]
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: Restarting receiver thread
Aug 18 07:51:58 memverge2 kernel: drbd ha-iscsi memverge: conn( Unconnected -> Connecting ) [connecting]
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm fence-peer
Aug 18 07:51:59 memverge2 crm-fence-peer.9.sh[14663]: DRBD_BACKING_DEV_31=/dev/block_nfs_vg/ha_block_exports_lv DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/block_nfs_vg/ha_block_exports_lv DRBD_MINOR=3 DRBD_MINOR_31=3 DRBD_MY_ADDRESS=192.168.0.8 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=28 DRBD_NODE_ID_27=memverge DRBD_NODE_ID_28=memverge2 DRBD_PEER_ADDRESS=192.168.0.6 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=27 DRBD_RESOURCE=ha-iscsi DRBD_VOLUME=31 UP_TO_DATE_NODES=0x10000000 /usr/lib/drbd/crm-fence-peer.9.sh
Aug 18 07:51:59 memverge2 crm-fence-peer.9.sh[14663]: INFO peers are reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ha-iscsi-ha-iscsi-clone'
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm fence-peer exit code 4 (0x400)
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi memverge: fence-peer helper returned 4 (peer was fenced)
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi/31 drbd3 memverge: pdsk( DUnknown -> Outdated ) [primary]
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi: Preparing cluster-wide state change 637846353: 28->all role( Primary )
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi: Committing cluster-wide state change 637846353 (0ms)
Aug 18 07:51:59 memverge2 kernel: drbd ha-iscsi: role( Secondary -> Primary ) [primary]
Aug 18 07:51:59 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of start operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 18 07:51:59 memverge2 pacemaker-controld[2525]: notice: Result of start operation for p_iscsi_portblock_on_drbd3 on memverge2: OK
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of monitor operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Result of monitor operation for p_iscsi_portblock_on_drbd3 on memverge2: OK
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs: Preparing remote state change 299836223: 27->all role( Secondary )
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: Committing remote state change 299836223 (primary_nodes=0)
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: peer( Primary -> Secondary ) [remote]
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of start operation for p_iscsi_target_drbd3 on memverge2
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs: Preparing remote state change 230622501: 27->28 conn( Disconnecting )
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: Committing remote state change 230622501 (primary_nodes=0)
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs/29 drbd1 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs/30 drbd2 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: Terminating sender thread
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: Starting sender thread (peer-node-id 27)
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: Connection closed
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected exit code 0
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: conn( TearDown -> Unconnected ) [disconnected]
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: Restarting receiver thread
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: conn( Unconnected -> Connecting ) [connecting]
Aug 18 07:52:00 memverge2 iSCSITarget(p_iscsi_target_drbd3)[15235]: INFO: Parameter auto_add_default_portal is now 'false'.
Aug 18 07:52:00 memverge2 iSCSITarget(p_iscsi_target_drbd3)[15276]: INFO: Created target iqn.2025-03.com.example:drbd3. Created TPG 1.
Aug 18 07:52:00 memverge2 iSCSITarget(p_iscsi_target_drbd3)[15290]: INFO: Using default IP port 3260 Created network portal 192.168.21.20:3260.
Aug 18 07:52:00 memverge2 iSCSITarget(p_iscsi_target_drbd3)[15304]: INFO: Using default IP port 3260 Created network portal 192.168.22.20:3260.
Aug 18 07:52:00 memverge2 iSCSITarget(p_iscsi_target_drbd3)[15317]: INFO: Parameter authentication is now '0'. Parameter demo_mode_write_protect is now '0'. Parameter generate_node_acls is now '1'. Parameter cache_dynamic_acls is now '1'.
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Result of start operation for p_iscsi_target_drbd3 on memverge2: OK
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of monitor operation for p_iscsi_target_drbd3 on memverge2
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of start operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Result of monitor operation for p_iscsi_target_drbd3 on memverge2: OK
Aug 18 07:52:00 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm fence-peer
Aug 18 07:52:00 memverge2 crm-fence-peer.9.sh[15467]: DRBD_BACKING_DEV_29=/dev/block_nfs_vg/ha_nfs_internal_lv DRBD_BACKING_DEV_30=/dev/block_nfs_vg/ha_nfs_exports_lv DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/block_nfs_vg/ha_nfs_internal_lv\ /dev/block_nfs_vg/ha_nfs_exports_lv DRBD_MINOR=1\ 2 DRBD_MINOR_29=1 DRBD_MINOR_30=2 DRBD_MY_ADDRESS=192.168.0.8 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=28 DRBD_NODE_ID_27=memverge DRBD_NODE_ID_28=memverge2 DRBD_PEER_ADDRESS=192.168.0.6 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=27 DRBD_RESOURCE=ha-nfs DRBD_VOLUME=29\ 30 UP_TO_DATE_NODES=0x10000000 /usr/lib/drbd/crm-fence-peer.9.sh
Aug 18 07:52:01 memverge2 crm-fence-peer.9.sh[15467]: INFO peers are reachable, my disk is UpToDate UpToDate: placed constraint 'drbd-fence-by-handler-ha-nfs-ha-nfs-clone'
Aug 18 07:52:21 memverge2 pacemaker-controld[2525]: error: Result of start operation for p_iscsi_lun_drbd3 on memverge2: Timed out after 20s (Resource agent did not complete within 20s)
Aug 18 07:52:21 memverge2 pacemaker-attrd[2523]: notice: Setting last-failure-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> 1755492741
Aug 18 07:52:21 memverge2 pacemaker-attrd[2523]: notice: Setting fail-count-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> INFINITY
Aug 18 07:54:50 memverge2 kernel: INFO: task drbdsetup:15464 blocked for more than 122 seconds.
Aug 18 07:54:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15464 tgid:15464 ppid:1      flags:0x00000006
Aug 18 07:54:50 memverge2 kernel: drbd_khelper.cold+0x18e/0x4c8 [drbd]
Aug 18 07:54:50 memverge2 kernel: conn_try_outdate_peer+0x11e/0x260 [drbd]
Aug 18 07:54:50 memverge2 kernel: ? change_role+0x92/0x100 [drbd]
Aug 18 07:54:50 memverge2 kernel: drbd_set_role+0x349/0x8e0 [drbd]
Aug 18 07:54:50 memverge2 kernel: ? drbd_find_resource+0x7a/0xb0 [drbd]
Aug 18 07:54:50 memverge2 kernel: drbd_adm_set_role+0x124/0x260 [drbd]
Aug 18 07:54:50 memverge2 kernel: ? __pfx_drbd_adm_set_role+0x10/0x10 [drbd]
Aug 18 07:54:50 memverge2 kernel: INFO: task drbdsetup:15533 blocked for more than 122 seconds.
Aug 18 07:54:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15533 tgid:15533 ppid:15466  flags:0x00000002
Aug 18 07:54:50 memverge2 kernel: INFO: task drbdsetup:15562 blocked for more than 122 seconds.
Aug 18 07:54:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15562 tgid:15562 ppid:1      flags:0x00000006
Aug 18 07:54:50 memverge2 kernel: INFO: task drbdsetup:16645 blocked for more than 122 seconds.
Aug 18 07:54:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:16645 tgid:16645 ppid:1      flags:0x00000006
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Stop       p_iscsi_portblock_on_drbd3     (                        memverge2 )  due to node availability
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Stop       p_iscsi_target_drbd3           (                        memverge2 )  due to node availability
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Stop       p_iscsi_lun_drbd3              (                        memverge2 )  due to node availability
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Initiating stop operation p_iscsi_lun_drbd3_stop_0 locally on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of stop operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Result of stop operation for p_iscsi_lun_drbd3 on memverge2: OK
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Initiating stop operation p_iscsi_target_drbd3_stop_0 locally on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of stop operation for p_iscsi_target_drbd3 on memverge2
Aug 18 07:55:03 memverge2 iSCSITarget(p_iscsi_target_drbd3)[21867]: INFO: Deleted Target iqn.2025-03.com.example:drbd3.
Aug 18 07:55:03 memverge2 pacemaker-controld[2525]: notice: Result of stop operation for p_iscsi_target_drbd3 on memverge2: OK
Aug 18 07:55:03 memverge2 pacemaker-controld[2525]: notice: Initiating stop operation p_iscsi_portblock_on_drbd3_stop_0 locally on memverge2
Aug 18 07:55:03 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of stop operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 18 07:55:03 memverge2 pacemaker-controld[2525]: notice: Result of stop operation for p_iscsi_portblock_on_drbd3 on memverge2: OK
Aug 18 07:56:53 memverge2 kernel: INFO: task drbdsetup:15464 blocked for more than 245 seconds.
Aug 18 07:56:53 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15464 tgid:15464 ppid:1      flags:0x00000006
Aug 18 07:56:53 memverge2 kernel: drbd_khelper.cold+0x18e/0x4c8 [drbd]
Aug 18 07:56:53 memverge2 kernel: conn_try_outdate_peer+0x11e/0x260 [drbd]
Aug 18 07:56:53 memverge2 kernel: ? change_role+0x92/0x100 [drbd]
Aug 18 07:56:53 memverge2 kernel: drbd_set_role+0x349/0x8e0 [drbd]
Aug 18 07:56:53 memverge2 kernel: ? drbd_find_resource+0x7a/0xb0 [drbd]
Aug 18 07:56:53 memverge2 kernel: drbd_adm_set_role+0x124/0x260 [drbd]
Aug 18 07:56:53 memverge2 kernel: ? __pfx_drbd_adm_set_role+0x10/0x10 [drbd]
Aug 18 07:56:53 memverge2 kernel: INFO: task drbdsetup:15533 blocked for more than 245 seconds.
Aug 18 07:56:53 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15533 tgid:15533 ppid:15466  flags:0x00000002
Aug 18 07:56:53 memverge2 kernel: INFO: task drbdsetup:15562 blocked for more than 245 seconds.
Aug 18 07:56:53 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15562 tgid:15562 ppid:1      flags:0x00000006
Aug 18 07:56:53 memverge2 kernel: INFO: task drbdsetup:16645 blocked for more than 245 seconds.
Aug 18 07:56:53 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:16645 tgid:16645 ppid:1      flags:0x00000006
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_portblock_on_drbd3     (                        memverge2 )
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_target_drbd3           (                        memverge2 )
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:01:06 memverge2 pacemaker-attrd[2523]: notice: Setting last-failure-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: 1755492741 -> (unset)
Aug 18 08:01:06 memverge2 pacemaker-attrd[2523]: notice: Setting fail-count-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: INFINITY -> (unset)
Aug 18 08:01:06 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_portblock_on_drbd3     (                        memverge2 )
Aug 18 08:01:06 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_target_drbd3           (                        memverge2 )
Aug 18 08:01:06 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_lun_drbd3              (                        memverge2 )
Aug 18 08:01:06 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_portblock_off_drbd3    (                        memverge2 )
Aug 18 08:01:06 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_lun_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:06 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 08:01:06 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_lun_drbd3 on memverge2: Not running
Aug 18 08:01:36 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_portblock_on_drbd3     (                        memverge2 )
Aug 18 08:01:36 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_target_drbd3           (                        memverge2 )
Aug 18 08:01:36 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_lun_drbd3              (                        memverge2 )
Aug 18 08:01:36 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Start      p_iscsi_portblock_off_drbd3    (                        memverge2 )
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_portblock_on_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_target_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_target_drbd3 on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_lun_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Initiating monitor operation p_iscsi_portblock_off_drbd3_monitor_0 locally on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of probe operation for p_iscsi_portblock_off_drbd3 on memverge2
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_portblock_on_drbd3 on memverge2: Not running
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_target_drbd3 on memverge2: Not running
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_portblock_off_drbd3 on memverge2: Not running
Aug 18 08:01:36 memverge2 pacemaker-controld[2525]: notice: Result of probe operation for p_iscsi_lun_drbd3 on memverge2: Not running

[root@memverge2 ~]# dmesg|grep -i drbd
[  812.017923] drbd ha-iscsi: Preparing remote state change 2376668298: 27->all role( Secondary )
[  812.043208] drbd ha-iscsi memverge: Committing remote state change 2376668298 (primary_nodes=0)
[  812.043557] drbd ha-iscsi memverge: peer( Primary -> Secondary ) [remote]
[  812.164480] drbd ha-iscsi: Preparing remote state change 2754260736: 27->28 conn( Disconnecting )
[  812.190378] drbd ha-iscsi memverge: Committing remote state change 2754260736 (primary_nodes=0)
[  812.191243] drbd ha-iscsi memverge: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
[  812.191527] drbd ha-iscsi/31 drbd3 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
[  812.209053] drbd ha-iscsi memverge: meta connection shut down by peer.
[  812.217596] drbd ha-iscsi memverge: Terminating sender thread
[  812.217936] drbd ha-iscsi memverge: Starting sender thread (peer-node-id 27)
[  812.252361] drbd ha-iscsi memverge: Connection closed
[  812.252609] drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected
[  812.277456] drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected exit code 0
[  812.277682] drbd ha-iscsi memverge: conn( TearDown -> Unconnected ) [disconnected]
[  812.277904] drbd ha-iscsi memverge: Restarting receiver thread
[  812.278122] drbd ha-iscsi memverge: conn( Unconnected -> Connecting ) [connecting]
[  813.196096] drbd ha-iscsi memverge: helper command: /sbin/drbdadm fence-peer
[  813.340847] drbd ha-iscsi memverge: helper command: /sbin/drbdadm fence-peer exit code 4 (0x400)
[  813.341207] drbd ha-iscsi memverge: fence-peer helper returned 4 (peer was fenced)
[  813.341503] drbd ha-iscsi/31 drbd3 memverge: pdsk( DUnknown -> Outdated ) [primary]
[  813.341757] drbd ha-iscsi: Preparing cluster-wide state change 637846353: 28->all role( Primary )
[  813.341974] drbd ha-iscsi: Committing cluster-wide state change 637846353 (0ms)
[  813.342194] drbd ha-iscsi: role( Secondary -> Primary ) [primary]
[  813.549048] drbd ha-nfs: Preparing remote state change 299836223: 27->all role( Secondary )
[  813.572725] drbd ha-nfs memverge: Committing remote state change 299836223 (primary_nodes=0)
[  813.573290] drbd ha-nfs memverge: peer( Primary -> Secondary ) [remote]
[  813.694972] drbd ha-nfs: Preparing remote state change 230622501: 27->28 conn( Disconnecting )
[  813.719162] drbd ha-nfs memverge: Committing remote state change 230622501 (primary_nodes=0)
[  813.719383] drbd ha-nfs memverge: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
[  813.719581] drbd ha-nfs/29 drbd1 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
[  813.719775] drbd ha-nfs/30 drbd2 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
[  813.720026] drbd ha-nfs memverge: Terminating sender thread
[  813.720241] drbd ha-nfs memverge: Starting sender thread (peer-node-id 27)
[  813.759177] drbd ha-nfs memverge: Connection closed
[  813.759366] drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected
[  813.783239] drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected exit code 0
[  813.783422] drbd ha-nfs memverge: conn( TearDown -> Unconnected ) [disconnected]
[  813.783600] drbd ha-nfs memverge: Restarting receiver thread
[  813.783774] drbd ha-nfs memverge: conn( Unconnected -> Connecting ) [connecting]
[  814.383498] drbd ha-nfs memverge: helper command: /sbin/drbdadm fence-peer
[  983.916096] INFO: task drbdsetup:15464 blocked for more than 122 seconds.
[  983.917301] task:drbdsetup       state:D stack:0     pid:15464 tgid:15464 ppid:1      flags:0x00000006
[  983.918715]  drbd_khelper.cold+0x18e/0x4c8 [drbd]
[  983.918923]  conn_try_outdate_peer+0x11e/0x260 [drbd]
[  983.919120]  ? change_role+0x92/0x100 [drbd]
[  983.919307]  drbd_set_role+0x349/0x8e0 [drbd]
[  983.919493]  ? drbd_find_resource+0x7a/0xb0 [drbd]
[  983.919679]  drbd_adm_set_role+0x124/0x260 [drbd]
[  983.920183]  ? __pfx_drbd_adm_set_role+0x10/0x10 [drbd]
[  983.929967] INFO: task drbdsetup:15533 blocked for more than 122 seconds.
[  983.930357] task:drbdsetup       state:D stack:0     pid:15533 tgid:15533 ppid:15466  flags:0x00000002
[  983.934819] INFO: task drbdsetup:15562 blocked for more than 122 seconds.
[  983.935202] task:drbdsetup       state:D stack:0     pid:15562 tgid:15562 ppid:1      flags:0x00000006
[  983.939462] INFO: task drbdsetup:16645 blocked for more than 122 seconds.
[  983.939843] task:drbdsetup       state:D stack:0     pid:16645 tgid:16645 ppid:1      flags:0x00000006
[ 1106.794515] INFO: task drbdsetup:15464 blocked for more than 245 seconds.
[ 1106.795030] task:drbdsetup       state:D stack:0     pid:15464 tgid:15464 ppid:1      flags:0x00000006
[ 1106.796032]  drbd_khelper.cold+0x18e/0x4c8 [drbd]
[ 1106.796191]  conn_try_outdate_peer+0x11e/0x260 [drbd]
[ 1106.796352]  ? change_role+0x92/0x100 [drbd]
[ 1106.796500]  drbd_set_role+0x349/0x8e0 [drbd]
[ 1106.796648]  ? drbd_find_resource+0x7a/0xb0 [drbd]
[ 1106.796797]  drbd_adm_set_role+0x124/0x260 [drbd]
[ 1106.797195]  ? __pfx_drbd_adm_set_role+0x10/0x10 [drbd]
[ 1106.806718] INFO: task drbdsetup:15533 blocked for more than 245 seconds.
[ 1106.807107] task:drbdsetup       state:D stack:0     pid:15533 tgid:15533 ppid:15466  flags:0x00000002
[ 1106.811587] INFO: task drbdsetup:15562 blocked for more than 245 seconds.
[ 1106.811968] task:drbdsetup       state:D stack:0     pid:15562 tgid:15562 ppid:1      flags:0x00000006
[ 1106.816245] INFO: task drbdsetup:16645 blocked for more than 245 seconds.
[ 1106.816630] task:drbdsetup       state:D stack:0     pid:16645 tgid:16645 ppid:1      flags:0x00000006

Anton

Hello Anton,

It looks like p_iscsi_lun_drbd3 failed to start within the 20 second timeout

Aug 18 07:52:00 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of start operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 07:52:21 memverge2 pacemaker-controld[2525]: error: Result of start operation for p_iscsi_lun_drbd3 on memverge2: Timed out after 20s (Resource agent did not complete within 20s)
Aug 18 07:52:21 memverge2 pacemaker-attrd[2523]: notice: Setting last-failure-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> 1755492741
Aug 18 07:52:21 memverge2 pacemaker-attrd[2523]: notice: Setting fail-count-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> INFINITY
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 07:55:02 memverge2 pacemaker-schedulerd[2524]: notice: Actions: Stop       p_iscsi_lun_drbd3              (                        memverge2 )  due to node availability
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Initiating stop operation p_iscsi_lun_drbd3_stop_0 locally on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Requesting local execution of stop operation for p_iscsi_lun_drbd3 on memverge2
Aug 18 07:55:02 memverge2 pacemaker-controld[2525]: notice: Result of stop operation for p_iscsi_lun_drbd3 on memverge2: OK
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 07:58:24 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 18 07:52:00 2025
Aug 18 08:00:04 memverge2 pacemaker-schedulerd[2524]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)

Once you check the dmesg logs it becomes pretty apparent why.

[  983.929967] INFO: task drbdsetup:15533 blocked for more than 122 seconds.
[  983.930357] task:drbdsetup       state:D stack:0     pid:15533 tgid:15533 ppid:15466  flags:0x00000002
[  983.934819] INFO: task drbdsetup:15562 blocked for more than 122 seconds.
[  983.935202] task:drbdsetup       state:D stack:0     pid:15562 tgid:15562 ppid:1      flags:0x00000006
[  983.939462] INFO: task drbdsetup:16645 blocked for more than 122 seconds.
[  983.939843] task:drbdsetup       state:D stack:0     pid:16645 tgid:16645 ppid:1      flags:0x00000006
[ 1106.794515] INFO: task drbdsetup:15464 blocked for more than 245 seconds.
[ 1106.795030] task:drbdsetup       state:D stack:0     pid:15464 tgid:15464 ppid:1      flags:0x00000006

Looks like you have multiple drbdsetup processes in an uninterruptible sleep state (D state). These are likely holding everything up. I can’t say why that happened with the info provided, but at this point the likely only action forward is to reboot that node to clear those stuck processes.

Looks like you have multiple drbdsetup processes in an uninterruptible sleep state (D state). These are likely holding everything up. I can’t say why that happened with the info provided, but at this point the likely only action forward is to reboot that node to clear those stuck processes.

The trouble is even if I do clean entire cluster start -

  1. stop running cluster (pcs cluster stop –all)
  2. reboot both cluster nodes simultaneously
  3. first booted node reboot second node using ipmilan_fence
  4. when second node booted, cluster started and run resources
[root@memverge anton]# pcs status
Cluster name: cluster_anton
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: memverge2 (version 3.0.0-5.1.el10_0-48413c8) - partition with quorum
  * Last updated: Wed Aug 20 10:06:29 2025 on memverge
  * Last change:  Wed Aug 20 09:51:46 2025 by root via root on memverge
  * 2 nodes configured
  * 19 resource instances configured

Node List:
  * Online: [ memverge memverge2 ]

Full List of Resources:
  * ipmi-fence-memverge2        (stonith:fence_ipmilan):         Started memverge
  * ipmi-fence-memverge (stonith:fence_ipmilan):         Started memverge2
  * Clone Set: ha-nfs-clone [ha-nfs] (promotable):
    * Promoted: [ memverge ]
    * Unpromoted: [ memverge2 ]
  * Resource Group: g_nfs:
    * p_pb_block        (ocf:heartbeat:portblock):       Started memverge
    * p_virtip  (ocf:heartbeat:IPaddr2):         Started memverge
    * p_fs_nfs_internal_info_HA (ocf:heartbeat:Filesystem):      Started memverge
    * p_fs_nfsshare_exports_HA  (ocf:heartbeat:Filesystem):      Started memverge
    * p_nfsserver       (ocf:heartbeat:nfsserver):       Started memverge
    * p_expfs_nfsshare_exports_HA       (ocf:heartbeat:exportfs):        Started memverge
    * p_pb_unblock      (ocf:heartbeat:portblock):       Started memverge
  * Clone Set: ha-iscsi-clone [ha-iscsi] (promotable):
    * Promoted: [ memverge ]
    * Unpromoted: [ memverge2 ]
  * Resource Group: g_iscsi:
    * p_iscsi_portblock_on_drbd3        (ocf:heartbeat:portblock):       Started memverge
    * p_iscsi_ip0       (ocf:heartbeat:IPaddr2):         Started memverge
    * p_iscsi_ip1       (ocf:heartbeat:IPaddr2):         Started memverge
    * p_iscsi_target_drbd3      (ocf:heartbeat:iSCSITarget):     Started memverge
    * p_iscsi_lun_drbd3 (ocf:heartbeat:iSCSILogicalUnit):        Started memverge
    * p_iscsi_portblock_off_drbd3       (ocf:heartbeat:portblock):       Started memverge

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

There are no drbdsetup processes on both nodes

[root@memverge anton]# ps -ef|grep -i drbds
root       45561   25998  0 10:07 pts/0    00:00:00 grep --color=auto -i drbds

[root@memverge2 ~]# ps -ef|grep -i drbds
root       12069    5498  0 10:07 pts/0    00:00:00 grep --color=auto -i drbds

So when I stop active cluster node

[root@memverge anton]# pcs cluster stop
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...
[root@memverge anton]#

I still have the same hangs behavior

[root@memverge2 ~]# cat /var/log/messages|grep -i drbd
Aug 20 10:09:12 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Move       p_iscsi_portblock_on_drbd3    (            memverge -> memverge2 )
Aug 20 10:09:12 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Move       p_iscsi_target_drbd3          (            memverge -> memverge2 )
Aug 20 10:09:12 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Move       p_iscsi_lun_drbd3             (            memverge -> memverge2 )
Aug 20 10:09:12 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Move       p_iscsi_portblock_off_drbd3   (            memverge -> memverge2 )
Aug 20 10:09:12 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_portblock_off_drbd3_stop_0 on memverge
Aug 20 10:09:12 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_lun_drbd3_stop_0 on memverge
Aug 20 10:09:13 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_target_drbd3_stop_0 on memverge
Aug 20 10:09:13 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_portblock_on_drbd3_stop_0 on memverge
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 870094945: 27->all role( Secondary )
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: Committing remote state change 870094945 (primary_nodes=0)
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: peer( Primary -> Secondary ) [remote]
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 322480972: 27->28 conn( Disconnecting )
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: Committing remote state change 322480972 (primary_nodes=0)
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi/31 drbd3 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: Terminating sender thread
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: Starting sender thread (peer-node-id 27)
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: Connection closed
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected exit code 0
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: conn( TearDown -> Unconnected ) [disconnected]
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: Restarting receiver thread
Aug 20 10:09:13 memverge2 kernel: drbd ha-iscsi memverge: conn( Unconnected -> Connecting ) [connecting]
Aug 20 10:09:14 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_portblock_on_drbd3    (                        memverge2 )
Aug 20 10:09:14 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_target_drbd3          (                        memverge2 )
Aug 20 10:09:14 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_lun_drbd3             (                        memverge2 )
Aug 20 10:09:14 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_portblock_off_drbd3   (                        memverge2 )
Aug 20 10:09:14 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm fence-peer
Aug 20 10:09:14 memverge2 crm-fence-peer.9.sh[13359]: DRBD_BACKING_DEV_31=/dev/block_nfs_vg/ha_block_exports_lv DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/block_nfs_vg/ha_block_exports_lv DRBD_MINOR=3 DRBD_MINOR_31=3 DRBD_MY_ADDRESS=192.168.0.8 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=28 DRBD_NODE_ID_27=memverge DRBD_NODE_ID_28=memverge2 DRBD_PEER_ADDRESS=192.168.0.6 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=27 DRBD_RESOURCE=ha-iscsi DRBD_VOLUME=31 UP_TO_DATE_NODES=0x10000000 /usr/lib/drbd/crm-fence-peer.9.sh
Aug 20 10:09:14 memverge2 kernel: drbd ha-nfs: Preparing remote state change 137192293: 27->all role( Secondary )
Aug 20 10:09:14 memverge2 kernel: drbd ha-nfs memverge: Committing remote state change 137192293 (primary_nodes=0)
Aug 20 10:09:14 memverge2 kernel: drbd ha-nfs memverge: peer( Primary -> Secondary ) [remote]
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs: Preparing remote state change 1303054671: 27->28 conn( Disconnecting )
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: Committing remote state change 1303054671 (primary_nodes=0)
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [remote]
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs/29 drbd1 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs/30 drbd2 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [remote]
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: meta connection shut down by peer.
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: Terminating sender thread
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: Starting sender thread (peer-node-id 27)
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: Connection closed
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected exit code 0
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: conn( TearDown -> Unconnected ) [disconnected]
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: Restarting receiver thread
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: conn( Unconnected -> Connecting ) [connecting]
Aug 20 10:09:15 memverge2 crm-fence-peer.9.sh[13359]: INFO peers are reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ha-iscsi-ha-iscsi-clone'
Aug 20 10:09:15 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm fence-peer exit code 4 (0x400)
Aug 20 10:09:15 memverge2 kernel: drbd ha-iscsi memverge: fence-peer helper returned 4 (peer was fenced)
Aug 20 10:09:15 memverge2 kernel: drbd ha-iscsi/31 drbd3 memverge: pdsk( DUnknown -> Outdated ) [primary]
Aug 20 10:09:15 memverge2 kernel: drbd ha-iscsi: Preparing cluster-wide state change 1232119433: 28->all role( Primary )
Aug 20 10:09:15 memverge2 kernel: drbd ha-iscsi: Committing cluster-wide state change 1232119433 (0ms)
Aug 20 10:09:15 memverge2 kernel: drbd ha-iscsi: role( Secondary -> Primary ) [primary]
Aug 20 10:09:15 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_portblock_on_drbd3    (                        memverge2 )
Aug 20 10:09:15 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_target_drbd3          (                        memverge2 )
Aug 20 10:09:15 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_lun_drbd3             (                        memverge2 )
Aug 20 10:09:15 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_portblock_off_drbd3   (                        memverge2 )
Aug 20 10:09:15 memverge2 pacemaker-controld[2376]: notice: Initiating start operation p_iscsi_portblock_on_drbd3_start_0 locally on memverge2
Aug 20 10:09:15 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of start operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 20 10:09:15 memverge2 pacemaker-controld[2376]: notice: Result of start operation for p_iscsi_portblock_on_drbd3 on memverge2: OK
Aug 20 10:09:15 memverge2 pacemaker-controld[2376]: notice: Initiating monitor operation p_iscsi_portblock_on_drbd3_monitor_5000 locally on memverge2
Aug 20 10:09:15 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of monitor operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 20 10:09:15 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm fence-peer
Aug 20 10:09:15 memverge2 crm-fence-peer.9.sh[13654]: DRBD_BACKING_DEV_29=/dev/block_nfs_vg/ha_nfs_internal_lv DRBD_BACKING_DEV_30=/dev/block_nfs_vg/ha_nfs_exports_lv DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/block_nfs_vg/ha_nfs_internal_lv\ /dev/block_nfs_vg/ha_nfs_exports_lv DRBD_MINOR=1\ 2 DRBD_MINOR_29=1 DRBD_MINOR_30=2 DRBD_MY_ADDRESS=192.168.0.8 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=28 DRBD_NODE_ID_27=memverge DRBD_NODE_ID_28=memverge2 DRBD_PEER_ADDRESS=192.168.0.6 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=27 DRBD_RESOURCE=ha-nfs DRBD_VOLUME=29\ 30 UP_TO_DATE_NODES=0x10000000 /usr/lib/drbd/crm-fence-peer.9.sh
Aug 20 10:09:15 memverge2 pacemaker-controld[2376]: notice: Result of monitor operation for p_iscsi_portblock_on_drbd3 on memverge2: OK
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Initiating start operation p_iscsi_target_drbd3_start_0 locally on memverge2
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of start operation for p_iscsi_target_drbd3 on memverge2
Aug 20 10:09:16 memverge2 iSCSITarget(p_iscsi_target_drbd3)[14039]: INFO: Parameter auto_add_default_portal is now 'false'.
Aug 20 10:09:16 memverge2 iSCSITarget(p_iscsi_target_drbd3)[14052]: INFO: Created target iqn.2025-03.com.example:drbd3. Created TPG 1.
Aug 20 10:09:16 memverge2 iSCSITarget(p_iscsi_target_drbd3)[14066]: INFO: Using default IP port 3260 Created network portal 192.168.21.20:3260.
Aug 20 10:09:16 memverge2 iSCSITarget(p_iscsi_target_drbd3)[14080]: INFO: Using default IP port 3260 Created network portal 192.168.22.20:3260.
Aug 20 10:09:16 memverge2 iSCSITarget(p_iscsi_target_drbd3)[14093]: INFO: Parameter authentication is now '0'. Parameter demo_mode_write_protect is now '0'. Parameter generate_node_acls is now '1'. Parameter cache_dynamic_acls is now '1'.
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Result of start operation for p_iscsi_target_drbd3 on memverge2: OK
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Initiating monitor operation p_iscsi_target_drbd3_monitor_5000 locally on memverge2
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of monitor operation for p_iscsi_target_drbd3 on memverge2
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Initiating start operation p_iscsi_lun_drbd3_start_0 locally on memverge2
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of start operation for p_iscsi_lun_drbd3 on memverge2
Aug 20 10:09:16 memverge2 pacemaker-controld[2376]: notice: Result of monitor operation for p_iscsi_target_drbd3 on memverge2: OK
Aug 20 10:09:37 memverge2 pacemaker-controld[2376]: error: Result of start operation for p_iscsi_lun_drbd3 on memverge2: Timed out after 20s (Resource agent did not complete within 20s)
Aug 20 10:09:37 memverge2 pacemaker-controld[2376]: notice: Transition 9 aborted by operation p_iscsi_lun_drbd3_start_0 'modify' on memverge2: Event failed
Aug 20 10:09:37 memverge2 pacemaker-controld[2376]: notice: Transition 9 action 87 (p_iscsi_lun_drbd3_start_0 on memverge2): expected 'OK' but got 'Error occurred'
Aug 20 10:09:37 memverge2 pacemaker-attrd[2373]: notice: Setting last-failure-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> 1755673777
Aug 20 10:09:37 memverge2 pacemaker-attrd[2373]: notice: Setting fail-count-p_iscsi_lun_drbd3#start_0[memverge2] in instance_attributes: (unset) -> INFINITY
Aug 20 10:09:37 memverge2 pacemaker-controld[2376]: notice: Transition 9 aborted by status-28-last-failure-p_iscsi_lun_drbd3.start_0 doing create last-failure-p_iscsi_lun_drbd3#start_0=1755673777: Transient attribute change
Aug 20 10:11:47 memverge2 kernel: INFO: task drbdsetup:13640 blocked for more than 122 seconds.
Aug 20 10:11:47 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:13640 tgid:13640 ppid:1      flags:0x00000006
Aug 20 10:11:47 memverge2 kernel: drbd_khelper.cold+0x18e/0x4c8 [drbd]
Aug 20 10:11:47 memverge2 kernel: conn_try_outdate_peer+0x11e/0x260 [drbd]
Aug 20 10:11:47 memverge2 kernel: ? change_role+0x92/0x100 [drbd]
Aug 20 10:11:47 memverge2 kernel: drbd_set_role+0x349/0x8e0 [drbd]
Aug 20 10:11:47 memverge2 kernel: ? drbd_find_resource+0x7a/0xb0 [drbd]
Aug 20 10:11:47 memverge2 kernel: drbd_adm_set_role+0x124/0x260 [drbd]
Aug 20 10:11:47 memverge2 kernel: ? __pfx_drbd_adm_set_role+0x10/0x10 [drbd]
Aug 20 10:11:47 memverge2 kernel: INFO: task drbdsetup:14215 blocked for more than 122 seconds.
Aug 20 10:11:47 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:14215 tgid:14215 ppid:13646  flags:0x00000002
Aug 20 10:11:47 memverge2 kernel: INFO: task drbdsetup:14243 blocked for more than 122 seconds.
Aug 20 10:11:47 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:14243 tgid:14243 ppid:1      flags:0x00000006
Aug 20 10:12:16 memverge2 pacemaker-schedulerd[2374]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 20 10:09:16 2025
Aug 20 10:12:16 memverge2 pacemaker-schedulerd[2374]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 20 10:09:16 2025
Aug 20 10:12:16 memverge2 pacemaker-schedulerd[2374]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 20 10:12:16 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Stop       p_iscsi_portblock_on_drbd3    (                        memverge2 )  due to node availability
Aug 20 10:12:16 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Stop       p_iscsi_target_drbd3          (                        memverge2 )  due to node availability
Aug 20 10:12:16 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Stop       p_iscsi_lun_drbd3             (                        memverge2 )  due to node availability
Aug 20 10:12:16 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_lun_drbd3_stop_0 locally on memverge2
Aug 20 10:12:16 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of stop operation for p_iscsi_lun_drbd3 on memverge2
Aug 20 10:12:16 memverge2 pacemaker-controld[2376]: notice: Result of stop operation for p_iscsi_lun_drbd3 on memverge2: OK
Aug 20 10:12:16 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_target_drbd3_stop_0 locally on memverge2
Aug 20 10:12:16 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of stop operation for p_iscsi_target_drbd3 on memverge2
Aug 20 10:12:17 memverge2 iSCSITarget(p_iscsi_target_drbd3)[19935]: INFO: Deleted Target iqn.2025-03.com.example:drbd3.
Aug 20 10:12:17 memverge2 pacemaker-controld[2376]: notice: Result of stop operation for p_iscsi_target_drbd3 on memverge2: OK
Aug 20 10:12:17 memverge2 pacemaker-controld[2376]: notice: Initiating stop operation p_iscsi_portblock_on_drbd3_stop_0 locally on memverge2
Aug 20 10:12:17 memverge2 pacemaker-controld[2376]: notice: Requesting local execution of stop operation for p_iscsi_portblock_on_drbd3 on memverge2
Aug 20 10:12:17 memverge2 pacemaker-controld[2376]: notice: Result of stop operation for p_iscsi_portblock_on_drbd3 on memverge2: OK
Aug 20 10:13:50 memverge2 kernel: INFO: task drbdsetup:13640 blocked for more than 245 seconds.
Aug 20 10:13:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:13640 tgid:13640 ppid:1      flags:0x00000006
Aug 20 10:13:50 memverge2 kernel: drbd_khelper.cold+0x18e/0x4c8 [drbd]
Aug 20 10:13:50 memverge2 kernel: conn_try_outdate_peer+0x11e/0x260 [drbd]
Aug 20 10:13:50 memverge2 kernel: ? change_role+0x92/0x100 [drbd]
Aug 20 10:13:50 memverge2 kernel: drbd_set_role+0x349/0x8e0 [drbd]
Aug 20 10:13:50 memverge2 kernel: ? drbd_find_resource+0x7a/0xb0 [drbd]
Aug 20 10:13:50 memverge2 kernel: drbd_adm_set_role+0x124/0x260 [drbd]
Aug 20 10:13:50 memverge2 kernel: ? __pfx_drbd_adm_set_role+0x10/0x10 [drbd]
Aug 20 10:13:50 memverge2 kernel: INFO: task drbdsetup:14215 blocked for more than 245 seconds.
Aug 20 10:13:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:14215 tgid:14215 ppid:13646  flags:0x00000002
Aug 20 10:13:50 memverge2 kernel: INFO: task drbdsetup:14243 blocked for more than 245 seconds.
Aug 20 10:13:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:14243 tgid:14243 ppid:1      flags:0x00000006
Aug 20 10:13:50 memverge2 kernel: INFO: task drbdsetup:15349 blocked for more than 122 seconds.
Aug 20 10:13:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:15349 tgid:15349 ppid:1      flags:0x00000006
Aug 20 10:13:50 memverge2 kernel: INFO: task drbdsetup:16423 blocked for more than 122 seconds.
Aug 20 10:13:50 memverge2 kernel: task:drbdsetup       state:D stack:0     pid:16423 tgid:16423 ppid:1      flags:0x00000006
Aug 20 10:15:40 memverge2 pacemaker-schedulerd[2374]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 20 10:09:16 2025
Aug 20 10:15:40 memverge2 pacemaker-schedulerd[2374]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 20 10:15:40 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_portblock_on_drbd3    (                        memverge2 )
Aug 20 10:15:40 memverge2 pacemaker-schedulerd[2374]: notice: Actions: Start      p_iscsi_target_drbd3          (                        memverge2 )
Aug 20 10:17:20 memverge2 pacemaker-schedulerd[2374]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 20 10:09:16 2025
Aug 20 10:17:20 memverge2 pacemaker-schedulerd[2374]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)
Aug 20 10:17:20 memverge2 pacemaker-schedulerd[2374]: warning: Unexpected result (Error occurred: Resource agent did not complete within 20s) was recorded for start of p_iscsi_lun_drbd3 on memverge2 at Aug 20 10:09:16 2025
Aug 20 10:17:20 memverge2 pacemaker-schedulerd[2374]: warning: p_iscsi_lun_drbd3 cannot run on memverge2 due to reaching migration threshold (clean up resource to allow again)

Update.

In my initial description, there was a colocation constraint for running g_nfs and g_iscsi resource groups always on the same node.

Now I added order constraint, forcing which of two resource groups will start 1st and which will start 2nd.

Colocation Constraints:
  Started resource 'g_nfs' with Started resource 'g_iscsi' (id: colocation-g_nfs-g_iscsi-INFINITY)
    score=INFINITY
Order Constraints:
  promote resource 'ha-iscsi-clone' then start resource 'g_iscsi' (id: order-ha-iscsi-clone-g_iscsi-mandatory)
  stop resource 'g_iscsi' then demote resource 'ha-iscsi-clone' (id: order-g_iscsi-ha-iscsi-clone-mandatory)
  promote resource 'ha-nfs-clone' then start resource 'g_nfs' (id: order-ha-nfs-clone-g_nfs-mandatory)
  stop resource 'g_nfs' then demote resource 'ha-nfs-clone' (id: order-g_nfs-ha-nfs-clone-mandatory)
  start resource 'g_nfs' then start resource 'g_iscsi' (id: order-g_nfs-g_iscsi-mandatory)

Not sure why it helped, at least I already checked it ten times..

Anton

1 Like