Hello
There is two nodes active/standby cluster, based on Rocky Linux 9.6 and kmod-drbd9x-9.2.14-1.el9_6.elrepo
There are two services (ha-nfs, ha-iscsi) which are always run together on the same cluster node.
If I reboot the active cluster node memverge2, ha-nfs resource successfully switches to standby node memverge, but ha-iscsi resource fails.
Looking for the difference, in the logs -
resource ha-nfs
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-nfs on memverge
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-nfs on memverge: ok
Jun 7 09:50:13 memverge kernel: drbd ha-nfs: Preparing remote state change 1671163414: 28->all role( Secondary )
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: Committing remote state change 1671163414 (primary_nodes=0)
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: peer( Primary → Secondary ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-nfs/29 drbd1: Enabling local AL-updates
Jun 7 09:50:13 memverge kernel: drbd ha-nfs/30 drbd2: Enabling local AL-updates
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-nfs on memverge
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-nfs on memverge: ok
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-nfs on memverge
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-nfs on memverge: ok
Jun 7 09:50:13 memverge kernel: drbd ha-nfs: Preparing remote state change 374911319: 28->27 conn( Disconnecting )
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: Committing remote state change 374911319 (primary_nodes=0)
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: conn( Connected → TearDown ) peer( Secondary → Unknown ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-nfs/29 drbd1 memverge2: pdsk( UpToDate → DUnknown ) repl( Established → Off ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: pdsk( UpToDate → DUnknown ) repl( Established → Off ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: Terminating sender thread
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: Starting sender thread (peer-node-id 28)
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: Connection closed
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: helper command: /sbin/drbdadm disconnected
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: helper command: /sbin/drbdadm disconnected exit code 0
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: conn( TearDown → Unconnected ) [disconnected]
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: Restarting receiver thread
Jun 7 09:50:13 memverge kernel: drbd ha-nfs memverge2: conn( Unconnected → Connecting ) [connecting]
Jun 7 09:50:13 memverge pacemaker-attrd[2777]: notice: Setting master-ha-nfs[memverge2] in instance_attributes: 10000 → (unset)
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-nfs on memverge
Jun 7 09:50:13 memverge pacemaker-attrd[2777]: notice: Setting master-ha-nfs[memverge] in instance_attributes: 10000 → 1000
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-nfs on memverge: ok
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-nfs on memverge
Jun 7 09:50:14 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-nfs on memverge: ok
Jun 7 09:50:14 memverge pacemaker-controld[2780]: notice: Requesting local execution of promote operation for ha-nfs on memverge
Jun 7 09:50:14 memverge kernel: drbd ha-nfs memverge2: helper command: /sbin/drbdadm fence-peer
Jun 7 09:50:14 memverge crm-fence-peer.9.sh[4985]: DRBD_BACKING_DEV_29=/dev/block_nfs_vg/ha_nfs_internal_lv DRBD_BACKING_DEV_30=/dev/block_nfs_vg/ha_nfs_exports_lv DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/block_nfs_vg/ha_nfs_internal_lv\ /dev/block_nfs_vg/ha_nfs_exports_lv DRBD_MINOR=1\ 2 DRBD_MINOR_29=1 DRBD_MINOR_30=2 DRBD_MY_ADDRESS=192.168.0.6 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=27 DRBD_NODE_ID_27=memverge DRBD_NODE_ID_28=memverge2 DRBD_PEER_ADDRESS=192.168.0.8 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=28 DRBD_RESOURCE=ha-nfs DRBD_VOLUME=29\ 30 UP_TO_DATE_NODES=0x08000000 /usr/lib/drbd/crm-fence-peer.9.sh
Jun 7 09:50:14 memverge crm-fence-peer.9.sh[4985]: INFO peers are reachable, my disk is UpToDate UpToDate: placed constraint ‘drbd-fence-by-handler-ha-nfs-ha-nfs-clone’
Jun 7 09:50:14 memverge kernel: drbd ha-nfs memverge2: helper command: /sbin/drbdadm fence-peer exit code 4 (0x400)
Jun 7 09:50:14 memverge kernel: drbd ha-nfs memverge2: fence-peer helper returned 4 (peer was fenced)
Jun 7 09:50:14 memverge kernel: drbd ha-nfs/29 drbd1 memverge2: pdsk( DUnknown → Outdated ) [primary]
Jun 7 09:50:14 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: pdsk( DUnknown → Outdated ) [primary]
Jun 7 09:50:14 memverge kernel: drbd ha-nfs: Preparing cluster-wide state change 700209656: 27->all role( Primary )
Jun 7 09:50:14 memverge kernel: drbd ha-nfs: Committing cluster-wide state change 700209656 (0ms)
Jun 7 09:50:14 memverge kernel: drbd ha-nfs: role( Secondary → Primary ) [primary]
resource ha-iscsi, looks pacemaker is waiting while node memverge2 is booted and promoted it there
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-iscsi on memverge
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-iscsi on memverge: ok
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi: Preparing remote state change 155406647: 28->all role( Secondary )
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: Committing remote state change 155406647 (primary_nodes=0)
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: peer( Primary → Secondary ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi/31 drbd3: Enabling local AL-updates
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-iscsi on memverge
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-iscsi on memverge: ok
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-iscsi on memverge
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-iscsi on memverge: ok
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi: Preparing remote state change 2424298786: 28->27 conn( Disconnecting )
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: Committing remote state change 2424298786 (primary_nodes=0)
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: conn( Connected → TearDown ) peer( Secondary → Unknown ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: pdsk( UpToDate → DUnknown ) repl( Established → Off ) [remote]
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: Terminating sender thread
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: Starting sender thread (peer-node-id 28)
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: Connection closed
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: helper command: /sbin/drbdadm disconnected
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: helper command: /sbin/drbdadm disconnected exit code 0
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: conn( TearDown → Unconnected ) [disconnected]
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: Restarting receiver thread
Jun 7 09:50:13 memverge kernel: drbd ha-iscsi memverge2: conn( Unconnected → Connecting ) [connecting]
Jun 7 09:50:13 memverge pacemaker-attrd[2777]: notice: Setting master-ha-iscsi[memverge2] in instance_attributes: 10000 → (unset)
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-iscsi on memverge
Jun 7 09:50:13 memverge pacemaker-attrd[2777]: notice: Setting master-ha-iscsi[memverge] in instance_attributes: 10000 → 1000
Jun 7 09:50:13 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-iscsi on memverge: ok
Jun 7 09:53:25 memverge pacemaker-schedulerd[2779]: notice: Actions: Start ha-iscsi:1 ( memverge2 )
Jun 7 09:53:25 memverge pacemaker-controld[2780]: notice: Initiating monitor operation ha-iscsi:1_monitor_0 on memverge2
Jun 7 09:53:25 memverge pacemaker-controld[2780]: notice: Initiating notify operation ha-iscsi_pre_notify_start_0 locally on memverge
Jun 7 09:53:25 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-iscsi on memverge
Jun 7 09:53:25 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-iscsi on memverge: ok
Jun 7 09:53:25 memverge pacemaker-controld[2780]: notice: Initiating start operation ha-iscsi:1_start_0 on memverge2
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi memverge2: Handshake to peer 28 successful: Agreed network protocol version 122
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi memverge2: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi: Preparing cluster-wide state change 1709502368: 27->28 role( Secondary ) conn( Connected )
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: drbd_sync_handshake:
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: self A0B1026CDF591CD6:0000000000000000:29A475E429B9542C:5C9280E42D30ABB6 bits:0 flags:120
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: peer A0B1026CDF591CD6:0000000000000000:66B01940CA59D348:CB1BE80494B0304E bits:0 flags:1020
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: uuid_compare()=no-sync by rule=lost-quorum
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi: State change 1709502368: primary_nodes=0, weak_nodes=0
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi: Committing cluster-wide state change 1709502368 (14ms)
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi memverge2: conn( Connecting → Connected ) peer( Unknown → Secondary ) [connected]
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: pdsk( DUnknown → Consistent ) repl( Off → Established ) [connected]
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: cleared bm UUID and bitmap A0B1026CDF591CD6:0000000000000000:29A475E429B9542C:5C9280E42D30ABB6
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: pdsk( Consistent → UpToDate ) [peer-state]
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi memverge2: helper command: /sbin/drbdadm unfence-peer
Jun 7 09:53:27 memverge kernel: drbd ha-iscsi memverge2: helper command: /sbin/drbdadm unfence-peer exit code 0
Jun 7 09:53:27 memverge pacemaker-attrd[2777]: notice: Setting master-ha-iscsi[memverge2] in instance_attributes: (unset) → 10000
Jun 7 09:53:27 memverge pacemaker-controld[2780]: notice: Transition 2 aborted by status-28-master-ha-iscsi doing create master-ha-iscsi=10000: Transient attribute change
Jun 7 09:53:27 memverge pacemaker-controld[2780]: notice: Initiating notify operation ha-iscsi_post_notify_start_0 locally on memverge
Jun 7 09:53:27 memverge pacemaker-controld[2780]: notice: Requesting local execution of notify operation for ha-iscsi on memverge
Jun 7 09:53:27 memverge pacemaker-controld[2780]: notice: Initiating notify operation ha-iscsi:1_post_notify_start_0 on memverge2
Jun 7 09:53:27 memverge pacemaker-controld[2780]: notice: Result of notify operation for ha-iscsi on memverge: ok
Jun 7 09:53:49 memverge pacemaker-schedulerd[2779]: notice: Actions: Promote ha-iscsi:0 ( Unpromoted → Promoted memverge2 )
Finally drbdadm status shows
[root@memverge anton]# drbdadm status
ha-iscsi role:Secondary
volume:31 disk:UpToDate
memverge2 role:Primary
volume:31 peer-disk:UpToDate
ha-nfs role:Primary
volume:29 disk:UpToDate
volume:30 disk:UpToDate
memverge2 role:Secondary
volume:29 peer-disk:UpToDate
volume:30 peer-disk:UpToDate
As result resource ha-iscsi failed because it can’t be started on the same cluster node as resource ha-nfs.
Any ideas why there is the difference in the behavior between ha-nfs and ha-iscsi resources ?
Anton