There is two-node DRBD-based sync replication active/standby Pacemaker cluster.
Two level STONITH/fencing - fence_ipmilan and diskless sbd (hpwdt, /dev/watchdog).
DRBD replication links are directly connected.
The two cluster nodes and qdevice host running on Rocky Linux 10.1
Pacemaker version 3.0.1
Corosync version 3.1.9
DRBD version 9.3.1
There are two stress tests:
Test1 - Configuration with Corosync quorum running on qdevice host and configured DRBD handlers and “fencing resource-and-stonith”.
active node memverge, run on memverge “reboot -f -n” and resources switch to memverge2.
Test2 - Configuration with DRBD diskless quorum and Corosync quorum running together on qdevice host.
active node memverge, run on memverge “reboot -f -n” and resources didn’t switch to memverge2.
The real problem with the Test2 is DRBD marks volumes as “Outdated” before STONITH/fencing succeed a few seconds later, on the remaining healthy node memverge2. Is there a way to recover from “Outdated” on the remaining healthy memverge2 node after successful fencing memverge ?
The even more problem, with enabled DRBD handlers and set “fencing resource-only”, there is still no switch to memverge2. And according to logs below, it looks like the handlers didn’t even call…
Here is below one of two DRBD resource config (with latest Test2 “The even more problem, with enabled DRBD handlers and set “fencing resource-only”, there is still no switch to memverge2.”), secondary resource config has exactly the same in critical sections.
[root@memverge ~]# cat /etc/drbd.d/ha-nfs.res
resource ha-nfs {
options {
auto-promote yes;
quorum majority;
on-no-quorum suspend-io;
on-no-data-accessible suspend-io;
on-suspended-primary-outdated force-secondary;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
unfence-peer "/usr/lib/drbd/crm-unfence-peer.9.sh";
}
disk {
c-plan-ahead 0;
resync-rate 3G;
c-max-rate 4G;
c-min-rate 2G;
# max-buffers 8000;
al-extents 65536;
c-fill-target 16M;
# no-disk-flushes;
# no-md-flushes;
# no-disk-barrier;
# no-disk-drain;
}
volume 1 {
device /dev/drbd1;
disk /dev/mapper/object_block_nfs_vg-ha_nfs_exports_lv_with_vdo_1x8;
meta-disk internal;
}
volume 2 {
device /dev/drbd2;
disk /dev/mapper/object_block_nfs_vg-ha_nfs_internal_lv_without_vdo;
meta-disk internal;
}
volume 5 {
device /dev/drbd5;
disk /dev/mapper/object_block_nfs_vg-ha_samba_exports_lv_with_vdo_1x8;
meta-disk internal;
}
on memverge {
address 10.72.14.152:7900;
node-id 27;
}
on memverge2 {
address 10.72.14.154:7900;
node-id 28;
}
on qdevice {
address 10.72.14.186:7900;
node-id 29;
volume 1 {
disk none;
}
volume 2 {
disk none;
}
volume 5 {
disk none;
}
}
# connection-mesh {
# hosts memverge memverge2 qdevice;
# }
net
{
transport tcp;
protocol C;
sndbuf-size 64M;
rcvbuf-size 64M;
max-buffers 128K;
max-epoch-size 16K;
timeout 15; # 1.5 seconds (must be < Token), active replication (non-idle), time for waiting an expected response packet from the partner
ping-timeout 5; # 0.5 second, no active replication (idle), check if its partner is still alive
ping-int 3; # 3 seconds, no active replication (idle), interval between two keep-alive packet to check if its partner is still alive
connect-int 3; # 3 seconds, link failed, interval between DRBD keeps on trying to connect
fencing resource-only;
}
connection
{
host memverge address 192.168.0.6:7900;
host memverge2 address 192.168.0.8:7900;
path
{
host memverge address 192.168.0.6:7900;
host memverge2 address 192.168.0.8:7900;
}
path
{
host memverge address 1.1.1.6:7900;
host memverge2 address 1.1.1.8:7900;
}
}
connection
{
host memverge address 10.72.14.152:7900;
host qdevice address 10.72.14.186:7900;
path {
host memverge address 10.72.14.152:7900;
host qdevice address 10.72.14.186:7900;
}
}
connection
{
host memverge2 address 10.72.14.154:7900;
host qdevice address 10.72.14.186:7900;
path {
host memverge2 address 10.72.14.154:7900;
host qdevice address 10.72.14.186:7900;
}
}
}
[root@memverge ~]#
And here is log file from remaining healthy node memverge2:
[root@memverge2 ~]# cat /var/log/messages
Mar 15 10:22:01 memverge2 corosync[760961]: [KNET ] link: host: 27 link: 1 is down
Mar 15 10:22:01 memverge2 corosync[760961]: [KNET ] link: host: 27 link: 2 is down
Mar 15 10:22:01 memverge2 corosync[760961]: [KNET ] host: host: 27 (passive) best link: 0 (pri: 2)
Mar 15 10:22:01 memverge2 corosync[760961]: [KNET ] host: host: 27 (passive) best link: 0 (pri: 2)
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: PingAck did not arrive in time.
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/1 drbd1: Would lose quorum, but using tiebreaker logic to keep
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/2 drbd2: Would lose quorum, but using tiebreaker logic to keep
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/5 drbd5: Would lose quorum, but using tiebreaker logic to keep
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/1 drbd1: disk( UpToDate -> Consistent )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/1 drbd1 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/2 drbd2: disk( UpToDate -> Consistent )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/2 drbd2 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/5 drbd5: disk( UpToDate -> Consistent )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/5 drbd5 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/1 drbd1: Enabling local AL-updates
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/2 drbd2: Enabling local AL-updates
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/5 drbd5: Enabling local AL-updates
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: Terminating sender thread
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: Starting sender thread (peer-node-id 27)
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs: Preparing remote state change 3384123242: 27->all empty
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: Connection closed
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: helper command: /sbin/drbdadm disconnected exit code 0
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: conn( NetworkFailure -> Unconnected ) [disconnected]
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: Restarting receiver thread
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs memverge: conn( Unconnected -> Connecting ) [connecting]
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs qdevice: Committing remote state change 3384123242 (primary_nodes=8000000)
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/1 drbd1: disk( Consistent -> Outdated ) [far-away]
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/2 drbd2: disk( Consistent -> Outdated ) [far-away]
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs/5 drbd5: disk( Consistent -> Outdated ) [far-away]
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs: Preparing cluster-wide state change 1847087918: 28->all empty
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs: State change 1847087918: primary_nodes=8000000, weak_nodes=FFFFFFFFD7FFFFFF
Mar 15 10:22:01 memverge2 kernel: drbd ha-nfs: Committing cluster-wide state change 1847087918 (26ms)
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: PingAck did not arrive in time.
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/3 drbd3: Would lose quorum, but using tiebreaker logic to keep
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/4 drbd4: Would lose quorum, but using tiebreaker logic to keep
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/3 drbd3: disk( UpToDate -> Consistent )
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/3 drbd3 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/4 drbd4: disk( UpToDate -> Consistent )
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/4 drbd4 memverge: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/3 drbd3: Enabling local AL-updates
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/4 drbd4: Enabling local AL-updates
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: Terminating sender thread
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: Starting sender thread (peer-node-id 27)
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: Connection closed
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: helper command: /sbin/drbdadm disconnected exit code 0
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: conn( NetworkFailure -> Unconnected ) [disconnected]
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: Restarting receiver thread
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi memverge: conn( Unconnected -> Connecting ) [connecting]
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 1224727545: 27->all empty
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi qdevice: Committing remote state change 1224727545 (primary_nodes=8000000)
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/3 drbd3: disk( Consistent -> Outdated ) [far-away]
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi/4 drbd4: disk( Consistent -> Outdated ) [far-away]
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi: Preparing cluster-wide state change 4009752090: 28->all empty
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi: State change 4009752090: primary_nodes=8000000, weak_nodes=FFFFFFFFD7FFFFFF
Mar 15 10:22:02 memverge2 kernel: drbd ha-iscsi: Committing cluster-wide state change 4009752090 (35ms)
Mar 15 10:22:04 memverge2 pacemaker-attrd[760992]: notice: Setting master-ha-iscsi[memverge2] in instance_attributes: 10000 -> (unset)
Mar 15 10:22:04 memverge2 pacemaker-attrd[760992]: notice: Setting master-ha-nfs[memverge2] in instance_attributes: 10000 -> (unset)
Mar 15 10:22:18 memverge2 kernel: mlx5_core 0000:d8:00.0 ens7f0np0: Link down
Mar 15 10:22:18 memverge2 kernel: mlx5_core 0000:d8:00.0 mlx5_0: Port: 1 Link DOWN
Mar 15 10:22:18 memverge2 kernel: mlx5_core 0000:d8:00.1 ens7f1np1: Link down
Mar 15 10:22:19 memverge2 kernel: mlx5_core 0000:d8:00.1 mlx5_1: Port: 1 Link DOWN
Mar 15 10:22:20 memverge2 kernel: drbd ha-nfs: Preparing remote state change 1819376490: 29->all empty
Mar 15 10:22:20 memverge2 kernel: drbd ha-nfs qdevice: Committing remote state change 1819376490 (primary_nodes=0)
Mar 15 10:22:20 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 1736200617: 29->all empty
Mar 15 10:22:20 memverge2 kernel: drbd ha-iscsi qdevice: Aborting remote state change 1736200617
Mar 15 10:22:20 memverge2 kernel: drbd ha-iscsi: Preparing remote state change 2304927656: 29->all empty
Mar 15 10:22:20 memverge2 kernel: drbd ha-iscsi qdevice: Committing remote state change 2304927656 (primary_nodes=0)
Mar 15 10:22:21 memverge2 corosync[760961]: [KNET ] link: host: 27 link: 0 is down
Mar 15 10:22:21 memverge2 corosync[760961]: [KNET ] host: host: 27 (passive) best link: 0 (pri: 2)
Mar 15 10:22:21 memverge2 corosync[760961]: [KNET ] host: host: 27 has no active links
Mar 15 10:22:22 memverge2 corosync[760961]: [TOTEM ] Token has not been received in 5250 ms
Mar 15 10:22:24 memverge2 corosync[760961]: [TOTEM ] A processor failed, forming new configuration: token timed out (7000ms), waiting 8400ms for consensus.
Mar 15 10:22:26 memverge2 kernel: mlx5_core 0000:d8:00.0 ens7f0np0: Link up
Mar 15 10:22:26 memverge2 NetworkManager[1828]: <info> [1773562946.3024] device (ens7f0np0): carrier: link connected
Mar 15 10:22:26 memverge2 kernel: mlx5_core 0000:d8:00.0 mlx5_0: Port: 1 Link ACTIVE
Mar 15 10:22:26 memverge2 kernel: mlx5_core 0000:d8:00.1 ens7f1np1: Link up
Mar 15 10:22:26 memverge2 NetworkManager[1828]: <info> [1773562946.5014] device (ens7f1np1): carrier: link connected
Mar 15 10:22:26 memverge2 kernel: mlx5_core 0000:d8:00.1 mlx5_1: Port: 1 Link ACTIVE
Mar 15 10:22:33 memverge2 corosync[760961]: [QUORUM] Sync members[1]: 28
Mar 15 10:22:33 memverge2 corosync[760961]: [QUORUM] Sync left[1]: 27
Mar 15 10:22:33 memverge2 corosync[760961]: [VOTEQ ] waiting for quorum device Qdevice poll (but maximum for 10000 ms)
Mar 15 10:22:33 memverge2 corosync[760961]: [TOTEM ] A new membership (1c.4f93) was formed. Members left: 27
Mar 15 10:22:33 memverge2 corosync[760961]: [TOTEM ] Failed to receive the leave message. failed: 27
Mar 15 10:22:33 memverge2 pacemaker-controld[760994]: notice: Our peer on the DC (memverge) is dead
Mar 15 10:22:33 memverge2 pacemaker-attrd[760992]: notice: Lost attribute writer memverge
Mar 15 10:22:33 memverge2 pacemaker-controld[760994]: notice: State transition S_NOT_DC -> S_ELECTION
Mar 15 10:22:33 memverge2 pacemaker-based[760989]: notice: Node memverge state is now lost
Mar 15 10:22:33 memverge2 pacemaker-based[760989]: notice: Removed 1 inactive node with cluster layer ID 27 from the membership cache
Mar 15 10:22:33 memverge2 pacemaker-attrd[760992]: notice: Node memverge state is now lost
Mar 15 10:22:33 memverge2 pacemaker-attrd[760992]: notice: Removing all memverge attributes for node loss
Mar 15 10:22:33 memverge2 pacemaker-attrd[760992]: notice: Removed 1 inactive node with cluster layer ID 27 from the membership cache
Mar 15 10:22:33 memverge2 pacemaker-fenced[760990]: notice: Node memverge state is now lost
Mar 15 10:22:33 memverge2 pacemaker-fenced[760990]: notice: Removed 1 inactive node with cluster layer ID 27 from the membership cache
Mar 15 10:22:34 memverge2 corosync[760961]: [QUORUM] Members[1]: 28
Mar 15 10:22:34 memverge2 corosync[760961]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Node memverge state is now lost
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: State transition S_ELECTION -> S_INTEGRATION
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Finalizing join-1 for 1 node (sync'ing CIB 0.6307.198 with schema pacemaker-4.0 and feature set 3.20.1 from memverge2)
Mar 15 10:22:34 memverge2 pacemaker-attrd[760992]: notice: Recorded local node as attribute writer (was unset)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: Cluster node memverge will be fenced: peer is no longer part of the cluster
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: memverge is unclean
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: pb_nfs_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: pb_nfs_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ip0_nfs_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ip0_nfs_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: fs_nfs_internal_info_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: fs_nfs_internal_info_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: fs_nfsshare_exports_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: fs_nfsshare_exports_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: nfsserver_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: nfsserver_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: expfs_nfsshare_exports_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: expfs_nfsshare_exports_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: samba_service_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: samba_service_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: fs_sambashare_exports_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: fs_sambashare_exports_HA_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: punb_nfs_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: punb_nfs_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: pb_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: pb_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ip0_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ip0_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ip1_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ip1_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: iscsi_target_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: iscsi_target_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: iscsi_lun_drbd3_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: iscsi_lun_drbd3_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: iscsi_lun_drbd4_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: iscsi_lun_drbd4_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: punb_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: punb_iscsi_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-nfs:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_demote_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ha-iscsi:0_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ipmi-fence-memverge2_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: ipmi-fence-memverge2_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: watchdog_stop_0 on memverge is unrunnable (node is offline)
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: Scheduling node memverge for fencing
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Fence (reboot) memverge 'peer is no longer part of the cluster'
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop pb_nfs ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop ip0_nfs ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop fs_nfs_internal_info_HA ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop fs_nfsshare_exports_HA ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop nfsserver ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop expfs_nfsshare_exports_HA ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop samba_service ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop fs_sambashare_exports_HA ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop punb_nfs ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop pb_iscsi ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop ip0_iscsi ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop ip1_iscsi ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop iscsi_target ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop iscsi_lun_drbd3 ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop iscsi_lun_drbd4 ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop punb_iscsi ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop ha-nfs:0 ( Promoted memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop ha-iscsi:0 ( Promoted memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Stop ipmi-fence-memverge2 ( memverge ) due to node availability
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: notice: Actions: Move watchdog ( memverge -> memverge2 )
Mar 15 10:22:34 memverge2 pacemaker-schedulerd[760993]: warning: Calculated transition 1 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-3838.bz2
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Requesting fencing (reboot) targeting node memverge
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-nfs on memverge2
Mar 15 10:22:34 memverge2 pacemaker-fenced[760990]: notice: Client pacemaker-controld.760994 wants to fence (reboot) memverge using any device
Mar 15 10:22:34 memverge2 pacemaker-fenced[760990]: notice: Requesting peer fencing (reboot) targeting memverge
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-iscsi on memverge2
Mar 15 10:22:34 memverge2 pacemaker-fenced[760990]: notice: Requesting that memverge2 perform 'reboot' action targeting memverge using ipmi-fence-memverge
Mar 15 10:22:34 memverge2 pacemaker-fenced[760990]: notice: Delaying 'reboot' action targeting memverge using ipmi-fence-memverge for 3s
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of start operation for watchdog on memverge2
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-nfs on memverge2: OK
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-iscsi on memverge2: OK
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Result of start operation for watchdog on memverge2: OK
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of monitor operation for watchdog on memverge2
Mar 15 10:22:34 memverge2 pacemaker-controld[760994]: notice: Result of monitor operation for watchdog on memverge2: OK
Mar 15 10:22:43 memverge2 kernel: mlx5_core 0000:d8:00.0 ens7f0np0: Link down
Mar 15 10:22:43 memverge2 kernel: mlx5_core 0000:d8:00.0 mlx5_0: Port: 1 Link DOWN
Mar 15 10:22:43 memverge2 kernel: mlx5_core 0000:d8:00.1 ens7f1np1: Link down
Mar 15 10:22:44 memverge2 kernel: mlx5_core 0000:d8:00.1 mlx5_1: Port: 1 Link DOWN
Mar 15 10:22:55 memverge2 pacemaker-fenced[760990]: notice: Operation 'reboot' [763039] targeting memverge using ipmi-fence-memverge returned 0
Mar 15 10:22:55 memverge2 pacemaker-fenced[760990]: notice: Action 'reboot' targeting memverge using ipmi-fence-memverge on behalf of pacemaker-controld.760994@memverge2: Done
Mar 15 10:22:55 memverge2 pacemaker-fenced[760990]: notice: Operation 'reboot' targeting memverge by memverge2 for pacemaker-controld.760994@memverge2: OK (Done)
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Peer memverge was terminated (reboot) by memverge2 on behalf of pacemaker-controld.760994@memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-attrd[760992]: notice: Removing all memverge attributes for node memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-nfs on memverge2
Mar 15 10:22:55 memverge2 pacemaker-attrd[760992]: notice: Removing all memverge attributes for node memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-iscsi on memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-nfs on memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-nfs on memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-iscsi on memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-iscsi on memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-nfs on memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-nfs on memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-iscsi on memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Requesting local execution of notify operation for ha-iscsi on memverge2
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-nfs on memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Result of notify operation for ha-iscsi on memverge2: OK
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: Transition 1 (Complete=65, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-3838.bz2): Complete
Mar 15 10:22:55 memverge2 pacemaker-controld[760994]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Mar 15 10:23:01 memverge2 kernel: mlx5_core 0000:d8:00.1 ens7f1np1: Link up
Mar 15 10:23:01 memverge2 NetworkManager[1828]: <info> [1773562981.0393] device (ens7f1np1): carrier: link connected
Mar 15 10:23:01 memverge2 kernel: mlx5_core 0000:d8:00.1 mlx5_1: Port: 1 Link ACTIVE
Mar 15 10:23:01 memverge2 NetworkManager[1828]: <info> [1773562981.2379] device (ens7f0np0): carrier: link connected
Mar 15 10:23:01 memverge2 kernel: mlx5_core 0000:d8:00.0 ens7f0np0: Link up
Mar 15 10:23:01 memverge2 kernel: mlx5_core 0000:d8:00.0 mlx5_0: Port: 1 Link ACTIVE
[root@memverge2 ~]#
[root@memverge2 ~]# drbdadm status
ha-iscsi role:Secondary
volume:3 disk:Outdated open:no
volume:4 disk:Outdated open:no
memverge connection:Connecting
qdevice role:Secondary
volume:3 peer-disk:Diskless
volume:4 peer-disk:Diskless
ha-nfs role:Secondary
volume:1 disk:Outdated open:no
volume:2 disk:Outdated open:no
volume:5 disk:Outdated open:no
memverge connection:Connecting
qdevice role:Secondary
volume:1 peer-disk:Diskless
volume:2 peer-disk:Diskless
volume:5 peer-disk:Diskless
[root@memverge2 ~]#