One of my servers downgraded from drbd 9.2.x to 8.4.x

I am a newbie with drbd-utils but I have experience with Debian GNU/Linux.

My setup is very simple, Debian v12, LINBIT DRBD 9.2.x, on top of a /dev/sda (all flash) in one machine and bcache in the secondary node. For now I am not using any HA software so I can put this third pair in production quickly. It is my third pair of nodes, that I have setup with drbd 9.2.x and when was doing the last checks something broke.

When doing reboots to test my manual procedures to promote one node to primary or secondary, DRBD stopped working, the synchronization was no longer flowing between nodes and no error message anywhere to explain the problem, only kernel messages for drbd being stuck for too long.

Some reboots later it started again to work without explanation. What I found is without using apt or dpkg commands, the fastest the node is using DRBD module from 8.4 and the other is still on 9.x, cat /proc/drbd in one worked as before and the other node only drbdadm status gives information. Should I extract the little information I have on the pair and reinstall the machines or can I follow the procedure to upgrade the metadata from 8.x to 9.x.

The information in /dev/drbd0 is not very important, but I want to deal with it as real data, to learn how to use drbd.

Some extra information that maybe usefull.

Fastest node:
dpkg --list | grep drbd
ii  drbd-dkms                            9.2.14-1                                all          RAID 1 over TCP/IP for Linux module source
ii  drbd-utils                           9.32.0-1                                amd64        RAID 1 over TCP/IP for Linux (user utilities)
wipefs /dev/sda
DEVICE OFFSET         TYPE UUID                                 LABEL
sda    0x37e2bffff03c drbd 263774bd15de66f5                     
sda    0x0            xfs  cf3596a0-ac15-4821-a591-4ab4e67d8486 




other node:
ii  drbd-dkms                            9.2.14-1                                all          RAID 1 over TCP/IP for Linux module source
ii  drbd-utils                           9.31.0-1                                amd64        RAID 1 over TCP/IP for Linux (user utilities)
wipefs /dev/bcache0 
DEVICE  OFFSET         TYPE UUID                                 LABEL
bcache0 0x3a36ffffd03c drbd f3957fac668095f2                     
bcache0 0x0            xfs  cf3596a0-ac15-4821-a591-4ab4e67d8486 

What are your recomendation?

Something in broken on my system, The modprobe command was loading the old drbd from kernel, not the one from LINBIT. Doing:

  • apt remove drbd-utils drbd-dkms
  • apt install drbd-utils drbd-dkms

Solved the problem for drbd module.

2 Likes

I had a diskless node, without understanding why.

“drbdadm create-md res” solved the problem. The /dev/sda on faster node had an downgraded v08 flexible-size internal meta data block. The command upgraded the meta data.

I believe this command can be dangerous, in case anyone else is reading this.

If DRBD detects an IO error beneath it, from the backing storage, it will detach and go into a diskless state. In this mode, the diskless node will still be able to access data from a peer in the cluster.

That was probably unnecessary. A drbdadm adjust all should reattach a diskless DRBD resource to its backing storage without having to recreate metadata.

This section of the DRBD user guide talks about IO error handling if you’re curious about alternatives: