Proxmox: no DRBD storage while LINBIT controller is down

Mmmh, but DRBD 9 was running before. You can get it from the resource definition (containing the quorum keyword) and that it has been working before with controller version 1.29.2-1 . And then I apt upgraded to the latest package versions.

linstor-client/unknown 1.24.0-1 all [upgradable from: 1.23.2-1]
linstor-common/unknown 1.30.2-1 all [upgradable from: 1.29.2-1]
linstor-controller/unknown 1.30.2-1 all [upgradable from: 1.29.2-1]
python-linstor/unknown 1.24.0-1 all [upgradable from: 1.23.1-1]

Does it make sense, that there is no DRBD9 after this (and the lastest Proxmox / Debian) apt upgrade? I went back through my bash_history: there is no sign of intentionally reverting back to DRBD8. I am totally clueless, but of cause you’re guessing right:

root@pve-1:~# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 211FB288A383ED945B83420

Are you aware of anything in the lastest Proxmox / Debian updates, which would cause the system to fall back to DRBD8 again?

So, a state of “OFFLINE(MISSING EXTERNAL TOOLS)” from “linstor node list” means that the correct DRBD version (tools) is missing?

In the meantime - my wife giving me annoyed looks - I downgraded to the previous versions…

apt install linstor-client=1.23.2-1
apt install linstor-common=1.29.2-1
apt install linstor-controller=1.29.2-1
apt install linstor-python=1.23.1-1
apt install python-linstor=1.23.1-1
apt install linstor-satellite=1.29.2-1
apt install linstor-proxmox
apt autoremove

…only to find, that maybe the database has been modified/updated by the lastest linstor versions?!

root@linstor-controller:~# cat /var/log/linstor-controller/ErrorReport-67653320-00000-000000.log 
ERROR REPORT 67653320-00000-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.29.2
Build ID:                           372c916b7d97fa10e8ea480b66ea3da665ab5849
Build time:                         2024-11-05T11:22:22+00:00
Error time:                         2024-12-20 10:04:36
Node:                               linstor-controller
Thread:                             Main

============================================================

Reported error:
===============

Category:                           LinStorException
Class name:                         SystemServiceStartException
Class canonical name:               com.linbit.SystemServiceStartException
Generated at:                       Method 'initialize', Source file 'DbConnectionPoolInitializer.java', Line #71

Error message:                      Database initialization error

ErrorContext:


Call backtrace:

    Method                                   Native Class:Line number
    initialize                               N      com.linbit.linstor.dbcp.DbConnectionPoolInitializer:71
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:88
    start                                    N      com.linbit.linstor.core.Controller:375
    main                                     N      com.linbit.linstor.core.Controller:627

Caused by:
==========

Category:                           RuntimeException
Class name:                         FlywayValidateException
Class canonical name:               org.flywaydb.core.api.exception.FlywayValidateException
Generated at:                       Method 'execute', Source file 'Flyway.java', Line #177

Error message:                      Validate failed: Migrations have failed validation
Detected applied migration not resolved locally: 2024.10.24.10.00. If you removed this migration intentionally, run repair to mark the migration as deleted.
Detected applied migration not resolved locally: 2024.12.18.10.00. If you removed this migration intentionally, run repair to mark the migration as deleted.
Need more flexibility with validation rules? Learn more: https://rd.gt/3AbJUZE

Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      org.flywaydb.core.Flyway$1:177
    execute                                  N      org.flywaydb.core.Flyway$1:170
    execute                                  N      org.flywaydb.core.Flyway:586
    migrate                                  N      org.flywaydb.core.Flyway:170
    migrate                                  N      com.linbit.linstor.dbcp.DbConnectionPool:222
    initialize                               N      com.linbit.linstor.dbcp.DbConnectionPoolInitializer:63
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:88
    start                                    N      com.linbit.linstor.core.Controller:375
    main                                     N      com.linbit.linstor.core.Controller:627


END OF ERROR REPORT.

Going back to the last db backup before apt upgrade makes the controller service start again.

@kermat is right with “One thing at a time…” so I will move the next paragraph to it’s own thread:

One more question regarding the PrefNic property:

  • is there an option to set a preferred and a fallback NIC?
  • will DRBD replication use all interfaces of the physical host?
    (I have a port on the quad NIC reserved for replication, but it is connected crossover to the second PVE node, not to the switch. Hence it can not communicate with the raspi - which is diskless and doesn’t need the resource’s content replication traffic.)

I write about what I stumble across, hoping it will help other users.