Former primary node stuck in emergency mode after reboot

Hey Linstor community,

my first post after setting up my first Linstor cluster - please bear with meโ€ฆ :slight_smile:

Setting
My cluster consists of 3 nodes (each one a Debian 12 VM in Incus) and everything works fine after setting the cluster up with an Ansible playbook. I am using a local ZFS pool called linstor-pool as the low-level storage on all 3 nodes and it all sets up fine initially. I can use the cluster successfully as an external Linstor controller to provide PVC storage to a Kubernetes cluster via the Piraeus and I have successfully set up DRBD Reactor to add HA capabilities to the storage.

Issue
When I ssh into the primary Linstor node VM and shut it down to simulate node failure, Reactor successfully fails over to another node and after about 30-50sec, my Kubernetes pod (Nextcloud) reconnects the PVC automatically to the new Linstor primary node as expected.

However, when I boot up the former primary node VM again, it gets stuck in Debianโ€™s emergency mode and I have not found a way to recover it.

Analysis (so far):

The logging output I could find is extremely sparse, yet it appears that Debian is attempting to execute the โ€œvar-lib-linstor.mountโ€ systemd unit upon boot, which however is supposed to be managed by Reactor. But after the reboot, /dev/drbd does not exist anymore (as it did after setting up Reactor) and hence the mount fails due to the systemd mount unit looking for /dev/drbd1000/linstor_db/0 and eventually timing out, preventing the system to reach the local-fs.target and ultimately falling back to emergency mode.

I have tried to disable all DRBD/Reactor-related services on the node, yet it again and again ends up in emergency mode.

Iโ€™ve gone through the official docs repeatedly but couldnโ€™t find any hint about what I could be missing - so not sure what information would be helpful to share here to start withโ€ฆ

Can somebody please help and point me into the right direction?

Thanks heaps!

Based on the failed mount attempt, it sounds like DRBD on that node may have been attempting to promote; is auto-promote enabled? What are the settings you see on the running controller when you run a linstor resource-group list-properties linstor_db?

Hi @liniac ,

thanks for picking this up - really appreciate it!

The original controller (bootstrap node) was zfs01, after shutting it down zfs03 took over. On zfs03 I get:

root@zfs03:~# drbdadm status linstor_db
linstor_db role:Primary
  disk:UpToDate open:yes
  zfs01 connection:Connecting
  zfs02 role:Secondary
    peer-disk:Diskless

root@zfs03:~# linstor resource-group list-properties linstor_db
No property map found for this entry.
root@zfs03:~# linstor node list
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š Node   โ”Š NodeType  โ”Š Addresses                    โ”Š State   โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š kube01 โ”Š SATELLITE โ”Š 192.168.111.107:3366 (PLAIN) โ”Š Online  โ”Š
โ”Š kube02 โ”Š SATELLITE โ”Š 192.168.111.108:3366 (PLAIN) โ”Š Online  โ”Š
โ”Š kube03 โ”Š SATELLITE โ”Š 192.168.111.109:3366 (PLAIN) โ”Š Online  โ”Š
โ”Š zfs01  โ”Š SATELLITE โ”Š 192.168.111.104:3366 (PLAIN) โ”Š OFFLINE โ”Š
โ”Š zfs02  โ”Š SATELLITE โ”Š 192.168.111.105:3366 (PLAIN) โ”Š Online  โ”Š
โ”Š zfs03  โ”Š SATELLITE โ”Š 192.168.111.106:3366 (PLAIN) โ”Š Online  โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
root@zfs03:~# linstor resource list
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”Š ResourceName                             โ”Š Node   โ”Š Layers       โ”Š Usage  โ”Š Conns             โ”Š      State โ”Š CreatedOn           โ”Š
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”Š linstor_db                               โ”Š zfs01  โ”Š DRBD,STORAGE โ”Š        โ”Š                   โ”Š    Unknown โ”Š 2025-12-15 19:09:15 โ”Š
โ”Š linstor_db                               โ”Š zfs02  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š TieBreaker โ”Š 2025-12-15 19:09:15 โ”Š
โ”Š linstor_db                               โ”Š zfs03  โ”Š DRBD,STORAGE โ”Š InUse  โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:09:15 โ”Š
โ”Š pvc-0f392c14-02dd-423f-b52c-acd86583e5cb โ”Š kube01 โ”Š DRBD,STORAGE โ”Š InUse  โ”Š Connecting(zfs01) โ”Š   Diskless โ”Š 2025-12-15 19:16:17 โ”Š
โ”Š pvc-0f392c14-02dd-423f-b52c-acd86583e5cb โ”Š zfs01  โ”Š DRBD,STORAGE โ”Š        โ”Š                   โ”Š    Unknown โ”Š 2025-12-15 19:16:12 โ”Š
โ”Š pvc-0f392c14-02dd-423f-b52c-acd86583e5cb โ”Š zfs02  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:16:14 โ”Š
โ”Š pvc-0f392c14-02dd-423f-b52c-acd86583e5cb โ”Š zfs03  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:16:14 โ”Š
โ”Š pvc-5c6a92ca-3b3b-4f5b-924a-94556a5dad67 โ”Š kube02 โ”Š DRBD,STORAGE โ”Š InUse  โ”Š Connecting(zfs01) โ”Š   Diskless โ”Š 2025-12-15 19:15:31 โ”Š
โ”Š pvc-5c6a92ca-3b3b-4f5b-924a-94556a5dad67 โ”Š zfs01  โ”Š DRBD,STORAGE โ”Š        โ”Š                   โ”Š    Unknown โ”Š 2025-12-15 19:14:37 โ”Š
โ”Š pvc-5c6a92ca-3b3b-4f5b-924a-94556a5dad67 โ”Š zfs02  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:14:38 โ”Š
โ”Š pvc-5c6a92ca-3b3b-4f5b-924a-94556a5dad67 โ”Š zfs03  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:14:38 โ”Š
โ”Š pvc-9aa13002-2530-49ee-8233-11fae59244cb โ”Š kube03 โ”Š DRBD,STORAGE โ”Š InUse  โ”Š Connecting(zfs01) โ”Š   Diskless โ”Š 2025-12-15 19:16:22 โ”Š
โ”Š pvc-9aa13002-2530-49ee-8233-11fae59244cb โ”Š zfs01  โ”Š DRBD,STORAGE โ”Š        โ”Š                   โ”Š    Unknown โ”Š 2025-12-15 19:16:04 โ”Š
โ”Š pvc-9aa13002-2530-49ee-8233-11fae59244cb โ”Š zfs02  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:16:05 โ”Š
โ”Š pvc-9aa13002-2530-49ee-8233-11fae59244cb โ”Š zfs03  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:16:05 โ”Š
โ”Š pvc-a3074e16-4ce4-46d8-a2c8-74366d21aea3 โ”Š kube01 โ”Š DRBD,STORAGE โ”Š InUse  โ”Š Connecting(zfs01) โ”Š   Diskless โ”Š 2025-12-15 19:16:17 โ”Š
โ”Š pvc-a3074e16-4ce4-46d8-a2c8-74366d21aea3 โ”Š zfs01  โ”Š DRBD,STORAGE โ”Š        โ”Š                   โ”Š    Unknown โ”Š 2025-12-15 19:16:08 โ”Š
โ”Š pvc-a3074e16-4ce4-46d8-a2c8-74366d21aea3 โ”Š zfs02  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:16:10 โ”Š
โ”Š pvc-a3074e16-4ce4-46d8-a2c8-74366d21aea3 โ”Š zfs03  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:16:09 โ”Š
โ”Š pvc-ca049b80-15e8-49ba-8e0e-3b56e05b9072 โ”Š kube01 โ”Š DRBD,STORAGE โ”Š InUse  โ”Š Connecting(zfs01) โ”Š   Diskless โ”Š 2025-12-15 19:17:09 โ”Š
โ”Š pvc-ca049b80-15e8-49ba-8e0e-3b56e05b9072 โ”Š zfs01  โ”Š DRBD,STORAGE โ”Š        โ”Š                   โ”Š    Unknown โ”Š 2025-12-15 19:17:05 โ”Š
โ”Š pvc-ca049b80-15e8-49ba-8e0e-3b56e05b9072 โ”Š zfs02  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:17:07 โ”Š
โ”Š pvc-ca049b80-15e8-49ba-8e0e-3b56e05b9072 โ”Š zfs03  โ”Š DRBD,STORAGE โ”Š Unused โ”Š Connecting(zfs01) โ”Š   UpToDate โ”Š 2025-12-15 19:17:07 โ”Š
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

zfs0{0...3} are the Debian 12 VMs (storage cluster), kube0{1...3} are Talos OS VMs which run the K8 cluster and connect to the external Linstor controller via Piraeus.

The cloud-init for zfs0{1..3} installs tooling, wipes the NVMe passthrough disk from the virtualisation host node, forms the ZFS storage pool on it and builds the DRBD kernel module (plus the usual system setup:

#cloud-config

# =========================================================
#           LINSTOR VM cloud-init configuration
# =========================================================

hostname: ${hostname}

locale: en_NZ.UTF-8
timezone: Etc/UTC

debconf_selections:

  # ZFS CDDL license acceptance
  - zfs-dkms zfs-dkms/note-incompatible-licenses note true
  - zfs-dkms zfs-dkms/stop-build-for-unknown-kernel boolean true
  
  # Keyboard configuration
  - keyboard-configuration/layoutcode string de
  - keyboard-configuration/modelcode string pc105
  - keyboard-configuration/variant select German (no dead keys)
  - keyboard-configuration/xkb-keymap select de(nodeadkeys)
  
  # Console setup
  - console-setup/charmap47 select UTF-8
  - console-setup/codeset47 select
  - console-setup/fontface47 select Fixed
  - console-setup/fontsize-fb47 select 16
  - console-setup/fontsize-text47 select 16

# Package updates and installation
package_update: true
package_upgrade: true

packages:
  - build-essential
  - console-setup
  - curl
  - gpg
  - wget
  - tree
  - vim
  - htop
  - net-tools
  - lvm2
  - thin-provisioning-tools
  - openssh-server
  - nvme-cli
  - coccinelle
  - parted
  - prometheus-node-exporter
  - linux-image-amd64
  - linux-headers-amd64
  - zfs-dkms
  - zfs-zed
  - zfsutils-linux
  - cryptsetup
  - rsync
  - parted
  - gdisk
  - less
  - tasksel

# SSH Configuration
ssh_authorized_keys:
  - ${manager_pubkey}
  - ${automation_pubkey}

# User configuration
users:
  # human manager
  - name: manager
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: sudo
    shell: /bin/bash
    ssh_authorized_keys:
      - ${manager_pubkey}

  # Ansible user
  - name: automation
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: sudo
    shell: /bin/bash
    ssh_authorized_keys:
      - ${automation_pubkey}
      - ${automationtester_pubkey}


# Files to be created - run BEFORE the 'runcmd' block!
write_files:

  # kernel modules to be loaded upon start
  - path: /etc/modules-load.d/drbd.conf
    content: |
      zfs
      handshake
      drbd
      drbd_transport_tcp
    permissions: '0644'

  # kernel modules to be included into initramfs
  - path: /etc/initramfs-tools/modules
    content: |
      # ZFS-related
      zfs
      zunicode
      zzstd
      zlua
      zavl
      icp
      zcommon
      znvpair
      spl

      # DRBD-related - ORDER MATTERS!!!
      handshake
      drbd
      drbd_transport_tcp

  # status file
  - path: /var/lib/linstor-node-ready
    permissions: '0644'

  # Completion check script
  - path: /usr/local/bin/check-cloud-init-complete
    content: |
      #!/bin/bash
      cloud-init status --wait > /dev/null 2>&1
      echo "Cloud-init for $(hostname) completed at $(date)" | tee -a /var/log/linstor-ready.signal
    permissions: '0755'
    

# Commands to be run
runcmd:

  # hostname config
  - hostnamectl set-hostname ${hostname}
  - echo "127.0.0.1 ${hostname}" >> /etc/hosts
  
  # install a regular system
  - tasksel install standard

  # Build and install DRBD from source
  - |
    echo "Building DRBD kernel module..." | tee -a /var/log/linstor-ready.log
    cd /tmp
    wget -O drbd-sources.tar.gz "${drbd_url}"
    mkdir drbd-sources
    tar xzf drbd-sources.tar.gz --strip-components=1 -C drbd-sources
    cd drbd-sources
    make -j$(nproc)
    make install
    depmod -a $(uname -r)
    cd /tmp
    rm -rf drbd-sources drbd-sources.tar.gz

  # Load DRBD kernel modules in CORRECT order !!!
  - modprobe handshake && modprobe drbd && modprobe drbd_transport_tcp

    # Verify ZFS DKMS build completed
  - modprobe zfs

  # update initramfs
  - update-initramfs -u -k all

  # install Linstor repository 
  - curl -fsSL https://packages.linbit.com/package-signing-pubkey.asc | gpg --dearmor -o /usr/share/keyrings/linbit-keyring.gpg
  - echo "deb [signed-by=/usr/share/keyrings/linbit-keyring.gpg] http://packages.linbit.com/public bookworm misc" > /etc/apt/sources.list.d/linbit.list  
  - apt-get update

  # install Linstor tooling
  - apt-get install -y drbd-utils linstor-satellite linstor-controller linstor-client linstor-gui

    # Reload systemd to recognize new units
  - systemctl daemon-reload

  # Enable satellite, but keep controller disabled (controller invocation is managed by drbd-reactor)
  - systemctl enable linstor-satellite linstor-controller
  - systemctl start linstor-satellite

  # wipe passthrough NVMe
  - |
    wipefs -a /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage && \
    blkdiscard -f /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage && \
    sgdisk --zap-all /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage && \
    partprobe

  # Create ZFS pool on passthrough NVMe
  - |
    echo "Creating ZFS pool on /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage" | tee -a /var/log/linstor-ready.log

    if [ -b /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage ]; then

      # Wait for device to be ready
      sleep 5
    
      # Create ZFS pool for LINSTOR
      zpool create -f \
        -o ashift=12 \
        -O compression=lz4 \
        -O atime=off \
        -O xattr=sa \
        -O acltype=posixacl \
        -O mountpoint=none \
        linstor-pool /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage
      
      # show pool status
      zpool status linstor-pool

      # log success
      echo "ZFS pool linstor-pool created successfully" | tee -a /var/log/linstor-ready.log

    else

      # show error messsage and exit with non-zero exit code
      echo "ERROR: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_incus_storage not found, unable to create ZFS pool." | tee -a /var/log/linstor-ready.log
      exit 1

    fi

  # Check kmod's loaded
  - |
    echo 'Loaded ZFS and DRBD kernel modules:' | tee -a /var/log/linstor-ready.log
    lsmod | grep -E "(zfs|drbd)" || echo "WARNING: Not all required kernel modules not loaded!" | tee -a /var/log/linstor-ready.log

  # Signal readiness for cluster formation
  - |
    if systemctl is-active --quiet linstor-satellite && zpool status linstor-pool >/dev/null 2>&1; then
      echo "Node $(hostname) ready for HA controller cluster formation at $(date)" | tee -a /var/log/linstor-ready.log
    else
      echo "ERROR: Node $(hostname) not ready - check services" | tee -a /var/log/linstor-ready.log
      exit 1
    fi

  # Log completion
  - /usr/local/bin/check-cloud-init-complete

  # disable cloud-init
  - touch /etc/cloud/cloud-init.disabled


final_message: |
  ${hostname} HA controller provisioning completed.
  Ready for DRBD resource creation and cluster formation (Ansible, manual...)

Once those VMs signal readiness, my Ansible playbook (just sharing the relevant play here) forms the cluster, migrates to Reactor and ensures the API is available via the HTTP endpoint provided by HAproxy on my external OPNsense firewall.

Relevant Ansible inventory snippet:

    # Linstor PROD cluster:
    linstor_prod_cluster:
      children:

        controllers:
          hosts:
            zfs01.prod.lab.goettner.nz: {}
            
        satellites:
          hosts:
            zfs02.prod.lab.goettner.nz: {}
            zfs03.prod.lab.goettner.nz: {}

Playbook:

# ============================================
#       PLAY 5: BOOTSTRAP LINSTOR CLUSTER
# ============================================
- name: Bootstrap Linstor Cluster
  hosts: linstor_prod_cluster
  gather_facts: true
  vars:
    ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'  # because the host keys will change during a cluster rebuild and this breaks the playbook
  tags:
    - initiate_linstor
    - play_5
  vars_files:
    - vars/family-cloud.yml
    - secrets/linstor_secrets.yaml  

  tasks:

    - name: Create nodes
      ansible.builtin.command: "linstor node create {{ item.split('.')[0] }} {{ lookup('dig', item) }}"
      loop: "{{ groups['linstor_prod_cluster'] | list }}"
      register: linstor_nodes
      failed_when: "'SUCCESS' not in linstor_nodes.stdout"
      when: "'controllers' in group_names"


    - name: Create storage pools
      ansible.builtin.command: "linstor storage-pool create zfs {{ item.split('.')[0] }} {{ linstor_pool_name }} {{ linstor_pool_name }}" # physical ZFS pool and Linstor storage pool have the same name!
      loop: "{{ groups['linstor_prod_cluster'] | list }}"
      register: linstor_storage_pool
      failed_when: "'SUCCESS' not in linstor_storage_pool.stdout"
      when: "'controllers' in group_names"  


    - name: Check for existing shared drbd-reactor resource 'linstor_db'
      ansible.builtin.command: drbdadm status linstor_db
      register: linstor_db_exists
      failed_when: false
      changed_when: false
      when: "'controllers' in group_names"


    - name: Create linstor_db resource definition
      ansible.builtin.command: linstor resource-definition create linstor_db
      register: resource_def_created
      when:
        - "'controllers' in group_names"
        - linstor_db_exists.rc != 0


    - name: Create linstor_db volume definition
      ansible.builtin.command: linstor volume-definition create linstor_db 500M
      when:
        - "'controllers' in group_names"
        - resource_def_created is changed


    - name: Create linstor_db resources with 3-way replication
      ansible.builtin.command: |
        linstor resource create linstor_db
          --storage-pool {{ linstor_pool_name }}
          --auto-place 3
      register: resource_created
      failed_when: "'successfully autoplaced on 3 nodes' not in resource_created.stdout"          
      when:
        - "'controllers' in group_names"
        - resource_def_created is changed


    - name: Wait for DRBD to stabilize
      ansible.builtin.pause:
        seconds: 15
      when: resource_created is changed


    - name: Disable DRBD auto-promote for reactor management
      ansible.builtin.command: linstor resource-definition drbd-options --auto-promote no linstor_db
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Set DRBD suspended-primary-outdated behavior
      ansible.builtin.command: linstor resource-definition drbd-options --on-suspended-primary-outdated force-secondary linstor_db
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Get DRBD role
      ansible.builtin.shell: drbdadm role linstor_db 2>/dev/null | grep -q Primary && echo "primary" || echo "secondary"
      register: drbd_role_check
      changed_when: false
      when: "'controllers' in group_names"
      failed_when: drbd_role_check.rc != 0


    - name: Promote bootstrap node to Primary
      ansible.builtin.command: drbdadm primary linstor_db
      when:
        - "'controllers' in group_names"
        - "'secondary' in drbd_role_check.stdout"
      register: promoted_to_primary


    - name: Create filesystem on linstor_db
      ansible.builtin.command: mkfs.ext4 -F /dev/drbd/by-res/linstor_db/0
      when:
        - "'controllers' in group_names"
        - resource_created is changed

    # CRITICAL: Migrate database from local filesystem to DRBD before switching to reactor
    - name: Mount DRBD device temporarily
      ansible.builtin.mount:
        path: /mnt/linstor_migrate
        src: /dev/drbd/by-res/linstor_db/0
        fstype: ext4
        state: mounted
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Copy database to DRBD device
      ansible.builtin.shell: rsync -av /var/lib/linstor/ /mnt/linstor_migrate/
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Unmount temporary mount
      ansible.builtin.mount:
        path: /mnt/linstor_migrate
        state: unmounted
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Remove temporary mount point
      ansible.builtin.file:
        path: /mnt/linstor_migrate
        state: absent
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Remove old database files from /var/lib/linstor
      ansible.builtin.command: rm -rf /var/lib/linstor/*
      when:
        - "'controllers' in group_names"
        - resource_created is changed


    - name: Set immutable flag on /var/lib/linstor
      ansible.builtin.command: chattr +i /var/lib/linstor        


    - name: Create drbd-reactor configuration directory
      ansible.builtin.file:
        path: /etc/drbd-reactor.d
        state: directory
        mode: '0755'


    - name: Create drbd-reactor linstor_db configuration
      ansible.builtin.copy:
        dest: /etc/drbd-reactor.d/linstor_db.toml
        content: |
          [[promoter]]
          id = "linstor_db"
          
          [promoter.resources.linstor_db]
          start = ["var-lib-linstor.mount", "linstor-controller.service"]
        mode: '0644'


    - name: Create systemd mount unit for linstor_db
      ansible.builtin.copy:
        dest: /etc/systemd/system/var-lib-linstor.mount
        content: |
          [Unit]
          Description=mount LINSTOR database
          After=network.target
          After=drbd@linstor_db.target

          [Mount]
          What=/dev/drbd/by-res/linstor_db/0
          Where=/var/lib/linstor
          Type=ext4
          Options=defaults

          [Install]
          WantedBy=multi-user.target

          # EoF
          
        mode: '0644'


    - name: Create satellite configuration drop-in directory
      ansible.builtin.file:
        path: /etc/systemd/system/linstor-satellite.service.d
        state: directory
        mode: '0755'


    - name: Configure satellite override
      ansible.builtin.copy:
        dest: /etc/systemd/system/linstor-satellite.service.d/override.conf
        content: |
          [Service]
          Environment=LS_KEEP_RES=linstor_db
        mode: '0644'


    - name: Configure linstor-client to use HAProxy endpoint
      ansible.builtin.copy:
        dest: /etc/linstor/linstor-client.conf
        content: |
          [global]
          controllers={{ linstor_api_endpoint }}
        mode: '0644'


    - name: Stop controllers
      ansible.builtin.systemd:
        name: linstor-controller
        state: stopped


    - name: Reload systemd
      ansible.builtin.systemd:
        daemon_reload: yes


    - name: Install drbd-reactor
      ansible.builtin.apt:
        name: drbd-reactor
        state: present
        update_cache: yes


    - name: Enable and start drbd-reactor service
      ansible.builtin.systemd:
        name: drbd-reactor
        enabled: yes
        state: started


    - name: Wait for local Linstor controller
      ansible.builtin.wait_for:
        port: 3370
        timeout: 60
      when: "'controllers' in group_names"


    - name: Test Linstor API endpoint (via HAproxy)
      ansible.builtin.uri:
        url: "http://{{ linstor_api_endpoint }}:3370/v1/nodes"
        method: GET
        status_code: 200
      register: api_test
      retries: 5
      delay: 10
      until: api_test.status == 200

I would believe that the playbook covers all basic steps from the official DRBD documentation - but I might be missing somethingโ€ฆ

In the next play, the playbook deploys ArgoCD to K8, which ropes in Piraeus, which in turn creates the diskless Linstor nodes on kube0{1..3} wich successfully auto-register with the external Linstor controller. Then ArgoCD deploys Nextcloud via Helm and it nicely registers PVCs for Nextcloud storage as well as a 3-node PostgreSQL cluster with CNPG helm - thatโ€™s the PVCโ€™s youโ€™re seeing in the listingโ€ฆ

As said in the initial post - itโ€™s all great, until I try to test the failover or for any other reason reboot the primary Linstor node.

Iโ€™m not entirely new to Linux, but this one makes me pull my hair out. I just canโ€™t figure out why /dev/drbd doesnโ€™t even exist on zfs01 anymore after rebootingโ€ฆ as you can see in the cloud-init script, Iโ€™ve even built the drbd/zfs kernel modules into initramfs because I thought it might be missing them at early boot stage maybeโ€ฆ but that doesnโ€™t seem to be the issueโ€ฆ

In the Ansible playbook youโ€™ve shared, I am not seeing the quorum options set as DRBD options for the resource definition. Itโ€™s necessary you explicitly set this and the other quorum-related settings.

Based on the rest of the playbook, this appears to be a setup for a highly-available LINSTOR controller, so I will link the options needed for the LINSTOR database here (the name of the resource group is up to you, and it is preferable to define this in a resource group versus on the resource definition level):

Misconfigured quorum options will cause bad behavior during failover (when quorum could be lost), so it would track if you are only seeing this in a failover test.

If you could link what docs you are referencing I can potentially provide further context for them.

Hey @liniac,

thatโ€™s really helpful - thanks for pointing this out.

I think I found my high-level mistake:

I actually DO configure quorum settings in the K8 storage class that I am deploying via ArgoCD (after the playbook โ€œhands offโ€ to ArgoCD):

---
# Linstor 3-way replication StorageClass
apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: linstor-zfs-r3

  annotations:

    storageclass.kubernetes.io/is-default-class: "true" # default storage class if no other defined

    argocd.argoproj.io/sync-wave: "1340"    


provisioner: linstor.csi.linbit.com

parameters:
  # Replication settings
  linstor.csi.linbit.com/autoPlace: "3"
  linstor.csi.linbit.com/storagePool: linstor-pool

  
  # CRITICAL: Quorum behavior for pod mobility
  # When a node loses quorum, return I/O errors instead of suspending
  # This allows the volume to detach and reattach elsewhere
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-quorum: "io-error"
  
  # Quorum requires majority (2 out of 3 nodes)
  property.linstor.csi.linbit.com/DrbdOptions/Resource/quorum: "majority"
  
  # If primary becomes outdated, force to secondary
  # This prevents split-brain when pod moves
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: "force-secondary"
  
  # Automatically reconnect after network issues
  property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: "retry-connect"

# Allow volume expansion
allowVolumeExpansion: true

# Wait for pod to be scheduled before binding volume
# This helps with cross-node pod movements
volumeBindingMode: WaitForFirstConsumer

# Reclaim policy
reclaimPolicy: Delete   # set to "Retain" in production!!!

# EoF

I assume this goes well for the PVC volumes because Piraeus takes care of the resource configuration under the hood - but I infact DO NOT do the resource configurations you pointed out for the linstor_db resource - which is the only resource I am setting up โ€œmanuallyโ€ with the playbook and Piraeus hence knows nothing about itโ€ฆ yet itโ€™s the crucial resource for fail-over.

Re your question which documentation Iโ€™ve been following when creating the playbook - Iโ€™ve went with the DRBD 9 en guide (chapter 8.5) which suggests using mkfs.ext4 directly on /dev/drbd1000 (resource ha_mount in the guide example - the equivalent to my linstor_db resource) - but I am infact not configuring all settings the docu mentions:

resource ha-mount {
   options {
      auto-promote no;
      quorum majority;
      on-no-quorum suspend-io;
      on-no-data-accessible suspend-io;
      [...]
   }
   [...]
}

versus my playbook only does:

- name: Disable DRBD auto-promote for reactor management
  ansible.builtin.command: linstor resource-definition drbd-options --auto-promote no linstor_db
  when:
    - "'controllers' in group_names"
    - resource_created is changed


- name: Set DRBD suspended-primary-outdated behavior
  ansible.builtin.command: linstor resource-definition drbd-options --on-suspended-primary-outdated force-secondary linstor_db
  when:
    - "'controllers' in group_names"
    - resource_created is changed

Iโ€™ll update that and see if this is the missing piece.

Yes, this is a HA setup, but honestly, I am very new to Linstor.

Could you maybe help me understand how these missing bits are causing the system to end up in emergency mode?

My gut feeling would have been that shutting down the VM for failure simulation maybe doesnโ€™t give Reactor enough time to demote the VM and leave everything behind cleanly for a reboot into โ€œsecondaryโ€ state of the nodeโ€ฆ yet given Reactor uses systemd units (services and targets), I would have expected that all the clean-up tasks are part of the systemโ€™s shutdown routine once Reactor is installed.

Thatโ€™s why I couldnโ€™t make sense of the symptoms after rebootโ€ฆ like, when you have letโ€™s say, a sshfs mount active at the time of system shutdown (or even just a pendrive mounted), this usually gets unmounted cleanly during shutdown - as long as you donโ€™t literally pull the plug on the machine, which is not what I am doing. I am remoting into the VM, get root and then fire shutdown -P now. Does it not work like that with Reactor?

Those missing configuration pieces could certainly be causing unexpected behavior, regarding the system booting to emergency shell it could possibly be an issue with device naming: Persistent block device naming - ArchWiki
You may wish to verify that your reference to the backing storage is persistent.

Mistery solved: I shot my leg with my own playbook. :rofl:

This task

    - name: Mount DRBD device temporarily
      ansible.builtin.mount:
        path: /mnt/linstor_migrate
        src: /dev/drbd/by-res/linstor_db/0
        fstype: ext4
        state: mounted
      when:
        - "'controllers' in group_names"
        - resource_created is changed

is actually creating an entry in /etc/fstab rather than just mounting the DRBD resource as the module name suggests (itโ€™s in the docs - should have read it properlyโ€ฆ) and the corresponding unmount task

    - name: Unmount temporary mount
      ansible.builtin.mount:
        path: /mnt/linstor_migrate
        state: unmounted
      when:
        - "'controllers' in group_names"
        - resource_created is changed

DOES unmount the resource, but DOES NOT remove the fstab line!

What happens subsequently is:

  1. Debian in the zfs01 VM creates an auto-mounter systemd unit from the fstab entry automatically (I wasnโ€™t aware of that being an automatism)

  2. After rebooting the VM, that auto-mounter unit attempts to mount the DRBD resource to the temporary mountpoint - which however doesnโ€™t exist anymore because my playbook dutifully removes it after the DRBD HA database has been migrated to the distributed resource:

     - name: Remove temporary mount point
       ansible.builtin.file:
         path: /mnt/linstor_migrate
         state: absent
       when:
         - "'controllers' in group_names"
         - resource_created is changed
    
  3. Since the automounter unit is created with default settings (aka doesnโ€™t have an OnFailure=Ignore setting), the mount is treated essential for the system to arrive at the local-fs.target - which however - as a consequence of the timed-out auto-mount - is never being reached.

  4. DRBD Reactor in turn has local-fs.target as a dependency (via multi-user.target), but the system never reaches that target and Reactor therefore never starts.

  5. The VM is stuck in emergency mode and satellite zfs01 never joins back to the cluster.

Fixed the playbook task - everything works fine. There never was an issue with DRBD or Reactor.

Even though @liniac you were right - my initial resource definition in the playbook was incomplete (yet not the root cause for the issue) and the playbook now creates it properly as per the documentation you have linked to.

Thanks so much for your help - I really appreciate it!

1 Like