Testing pacemaker/drbd failover on two nodes active-standby cluster based on rocky 9.5/pacemaker 2.1.8/drbd 9.2.12

Hi

I’m not sure that this question below is definitely a drbd question or closer to pacemaker, but maybe someone already saw such behaviour…

There are two services (ha-nfs and ha-iscsi) colocated to always run on the same node.

If I set active node as standby -

“pcs node standby memverge”

there are two additional records appear in the cib file,

expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-iscsi-expr-28-ha-iscsi-clone”/

expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-nfs-expr-28-ha-nfs-clone”/

and all services successfully started on node memverge2.

But if I do node memverge back online -

“pcs node unstandby memverge”

There is still one record in cib file,

expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-iscsi-expr-28-ha-iscsi-clone”/

And this doesn’t allow failover ha-iscsi service to node memverge after command “pcs node standby memverge2”

Anton

I suspect what is happening is here is that the the ha-iscsi-clone might simply be taking longer to resync. Can you check the drbdadm status during this test and see if the resync has completed?

With resource level fencing like below:

        net {
                fencing resource-only;
        }
        handlers {
                fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
                after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
        }

The fence-peer handler is fencing the peer upon a disconnect and places the Pacemaker constraint as you’ve observed. It is then the job of the after-resync-target handler to clear that location constraint. This handler is invoked once a resync completes. If one constraint is clearing but not the other, then I suspect that one resource hasn’t completed the sync.

1 Like

Hello

ha-iscsi has same size as ha-nfs.

ha-iscsi - 1Tbyte size
ha-nfs (lvm comressed and deduplicated) 1TByte physical size, 4TByte logical size

There is no big I/O-activity for both ha-nfs and ha-iscsi, both resources are mainly idle now.

Both resources have

handlers {
fence-peer “/usr/lib/drbd/crm-fence-peer.9.sh”;
after-resync-target “/usr/lib/drbd/crm-unfence-peer.9.sh”;
}

net
{

load-balance-paths yes;

    transport tcp;
    protocol  C;
    sndbuf-size 10M;
    rcvbuf-size 10M;
    max-buffers 80K;
    max-epoch-size 20000;
    timeout 90;
    ping-timeout 10;
    ping-int 15;
    connect-int 15;
    fencing resource-and-stonith;
}

Can you check the drbdadm status during this test and see if the resync has completed?

Yes, it looks your suspection is correct.

Here is “drbdadm status” output after unstandby previously standby node -

ha-iscsi role:Primary
volume:31 disk:UpToDate
memverge connection:Connecting

ha-nfs role:Primary
volume:29 disk:UpToDate
volume:30 disk:UpToDate
memverge role:Secondary
volume:29 peer-disk:UpToDate
volume:30 peer-disk:UpToDate

and only 5-6 seconds later ha-iscsi become in UpToDate state.

So now the question is what to do with this ?

Anton

Assuming that after 5-6 seconds, when the ha-iscsi resource does finally connect, that the location constraint is then cleared in Pacemaker via the crm-unfence-peer.9.sh. Then everything would be working as expected.

What I would advise would be to simply wait 5-6 seconds. :smile:

No.

crm-unfence-peer.9.sh works when drbdadm in this status

ha-iscsi role:Primary
volume:31 disk:UpToDate
memverge connection:Connecting

ha-nfs role:Primary
volume:29 disk:UpToDate
volume:30 disk:UpToDate
memverge role:Secondary
volume:29 peer-disk:UpToDate
volume:30 peer-disk:UpToDate

And there is the reason why we see in cib file next record -

expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-iscsi-expr-28-ha-iscsi-clone”/

After that no matter how long to wait, there always will be next record in the cib file -

expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-iscsi-expr-28-ha-iscsi-clone”/

until I manually remove the record somewhen.

Anton

Does the resource ever leave the Connecting state? Until the ha-iscsi resource connects to the peer the constraint should remain. That is expected.

Does the resource ever leave the Connecting state?

Yes. But the problem is that the script crm-unfence-peer.9.sh is already done for that time. That is why we don’t see constraint for ha-nfs resource and still see constraint for ha-iscsi in the cib.

This makes impossible further moving ha-iscsi resource between nodes without manual editing cib file.

It has never ran for the ha-iscsi resource only for the ha-nfs resource. The ha-iscsi resource has never connected to the peer, thus it has never completed a resync, and thus is has never triggered/executed the after-resync-target handler.

You must determine why the ha-iscsi resource is left in the Connecting state trying to reach the memverge peer.

I checked /var/log/messages.

Firstly ha-nfs resource (where all ok),

Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: uuid_compare()=target-use-bitmap by rule=bitmap-peer
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2: disk( Consistent → Outdated ) [connected]
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: pdsk( DUnknown → UpToDate ) repl( Off → WFBitMapT ) [connected]
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2: Disabling local AL-updates (optimization)
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2: Setting exposed data uuid: 5F2B9C748BC8CB4A
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 31(1), total 31; compression: 100.0%
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 31(1), total 31; compression: 100.0%
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: helper command: /sbin/drbdadm before-resync-target
Mar 28 15:48:02 memverge kernel: drbd ha-nfs/30 drbd2 memverge2: helper command: /sbin/drbdadm before-resync-target exit code 0

and now ha-iscsi resource,

Mar 28 15:47:50 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: uuid_compare()=no-sync by rule=lost-quorum
Mar 28 15:47:50 memverge kernel: drbd ha-iscsi/31 drbd3: disk( Consistent → UpToDate ) [connected]
Mar 28 15:47:50 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: pdsk( DUnknown → UpToDate ) repl( Off → Established ) [connected]
Mar 28 15:47:50 memverge kernel: drbd ha-iscsi/31 drbd3: Disabling local AL-updates (optimization)
Mar 28 15:47:50 memverge kernel: drbd ha-iscsi/31 drbd3: Setting exposed data uuid: 207B5A47B5CA6B3E
Mar 28 15:47:50 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: cleared bm UUID and bitmap 207B5A47B5CA6B3E:0000000000000000:79DBD5BC8D44D194:058C9509E2B518CE

So there is the difference in the uuid_compare()=, but I can’t understand where the difference comes from.

Anton

Ah hah!
So the uuid_compare()= is going to be where DRBD compares the generation identifier UUIDs as discussed in the User’s Guide here: https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-gi-tuple

Note the “no-sync” for the ha-iscsi resource. When the ha-iscsi resource connects they compare UUIDs and determine that no data has changed since they’ve been disconnected, so they connect without the need for any resync. If there is no resync occurring at all, then no resync finishes, and we never trigger the after-resync-target handler.

Try your test again, but this time when the one node is in standby, write something to the iSCSI device.

1 Like

Hello

Try your test again, but this time when the one node is in standby, write something to the iSCSI device.

Yes, that works.

[root@memverge anton]# drbdadm status
ha-iscsi role:Primary
volume:31 disk:UpToDate
memverge2 role:Secondary
volume:31 peer-disk:UpToDate

ha-nfs role:Primary
volume:29 disk:UpToDate
volume:30 disk:UpToDate
memverge2 role:Secondary
volume:29 peer-disk:UpToDate
volume:30 peer-disk:UpToDate

[root@memverge anton]#
[root@memverge anton]# pcs node standby memverge
[root@memverge anton]#
[root@memverge anton]# drbdadm status

No currently configured DRBD found.

[root@memverge anton]#
[root@memverge anton]# pcs cluster cib cib.txt
[root@memverge anton]# cat cib.txt |grep uname
node id=“27” uname=“memverge”>
node id=“28” uname=“memverge2”>
expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-iscsi-expr-28-ha-iscsi-clone”/>
expression attribute=“#uname” operation=“ne” value=“memverge2” id=“drbd-fence-by-handler-ha-nfs-expr-28-ha-nfs-clone”/>
node_state id=“28” uname=“memverge2” in_ccm=“1743401842” crmd=“1743401842” crm-debug-origin=“controld_update_resource_history” join=“member” expected=“member”>
node_state id=“27” uname=“memverge” in_ccm=“1743166191” crmd=“1743166191” crm-debug-origin=“controld_update_resource_history” join=“member” expected=“member”>
[root@memverge anton]#

[root@memverge2 ~]# drbdadm status
ha-iscsi role:Primary
volume:31 disk:UpToDate
memverge connection:Connecting

ha-nfs role:Primary
volume:29 disk:UpToDate
volume:30 disk:UpToDate
memverge connection:Connecting

[root@memverge2 ~]# dd if=/dev/urandom of=/dev/drbd3 bs=4k count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 0.934165 s, 438 MB/s
[root@memverge2 ~]#

[root@memverge anton]# pcs node unstandby memverge
[root@memverge anton]#

Mar 31 09:32:15 memverge kernel: drbd ha-iscsi/31 drbd3 memverge2: uuid_compare()=target-use-bitmap by rule=bitmap-peer

[root@memverge anton]# pcs cluster cib cib.txt
[root@memverge anton]# cat cib.txt |grep uname
node id=“27” uname=“memverge”>
node id=“28” uname=“memverge2”>
node_state id=“28” uname=“memverge2” in_ccm=“1743401842” crmd=“1743401842” crm-debug-origin=“do_state_transition” join=“member” expected=“member”>
node_state id=“27” uname=“memverge” in_ccm=“1743166191” crmd=“1743166191” crm-debug-origin=“controld_update_resource_history” join=“member” expected=“member”>
[root@memverge anton]#
[root@memverge anton]# drbdadm status
ha-iscsi role:Secondary
volume:31 disk:UpToDate
memverge2 role:Primary
volume:31 peer-disk:UpToDate

ha-nfs role:Secondary
volume:29 disk:UpToDate
volume:30 disk:UpToDate
memverge2 role:Primary
volume:29 peer-disk:UpToDate
volume:30 peer-disk:UpToDate

So it may not be bad idea update handlers to trigger after-resync-target for uuid_compare()=“no sync” too.

Anton