Proxmox: no DRBD storage while LINBIT controller is down

Despite what I understand from the LINBIT guides, DRBD storage (already configured disks for CT or VM) will not allow failover, if LINBIT controller is down.

What I understood: you can’t reconfigure LINBIT provided DRBD storage or provision new one while Controller is down.

What I experience: an already migrated LXC, with a disk resource on DRBD, can not be migrated again while the Controller is down.

Error:

2024-12-10 19:52:58 volume 'drbd:pm-5ae81649_101' is on shared storage 'drbd'
could not connect to any LINSTOR controller at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 241.
2024-12-10 19:53:01 ERROR: volume deactivation failed: drbd:pm-5ae81649_101 at /usr/share/perl5/PVE/Storage.pm line 1280.

Trying to start a stopped container without migration (container ran on the node before being stopped) gives error:

TASK ERROR: could not connect to any LINSTOR controller at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 241.

Probably I’ve done something wrong, but what?

nothing wrong, all as expected. You need the controller up and running to “interact” with containers/VMs (start, migrate,…).

what will work: you have a VM up and running, then the controller stopps/dies, then the VM will continue to run (as the DRBD storage is configured and up). But everything else like migration will fail, it requires the controller to be up and running.

OK,

but how can I achieve LINBIT controller HA then? If I create a VM and a disk resource through LINBIT I can’t bring it online on the other node? Official docs are only dealing with DRBD Reactor, but I only have a 2-node PVE cluster + addition quorum, I only have DRBD and LINBIT on the 2 PVE nodes and I would like to use Proxmox HA, not DRBD Reactor (which IMHO requires a third DRBD node) and not Pacemaker.

I’m referencing a thread:

[DRBD-user] linstor-proxmox-2.9.0

Roland Kammerer

6 years ago

Permalink

Dear Proxmox VE users,

we released version 2.9.0 of the linstor-proxmox plugin.

Changes in this release:

The last change is probably the most exciting one, as it allows to run
the linstor-controller in a VM that is managed by the linstor plugin,
where the storage for that VM is on DRBD. Yes, a nice chicken and egg
problem. The benefit is that Proxmox’s HA feature can be used to make
the controller highly available without the complexity of pacemaker.

This feature is documented here:

that 6 year old version is completely outdated, forget it.

if you want HA then you have to provide a setup that allows it… the recommended and supported way is controller HA via drbd-reactor. For quorum you need a 3rd node, correct. That can basically be anything, a raspi should do as a diskless quorum node for example, or any other node you have in your system. Even providing real storage for the controller DB on the 3rd node should not be a problem, the controller DB is a few MB.

Thanks for the info.

Indeed, using a raspy has been my first attempt, but I got stuck. Let’s try it again.

What are the recommended packages to install via apt on a raspi as a diskless node? (The raspi should only serve as a quorum)

Do I need The LINBIT controller there?

Can I omit DRBD Reactor or Pacemaker with such a setup and stay with Proxmox-HA only? If not, I would like to setup Pacemaker as I can find more info (compared to DRBD Reactor) how to use it for other tasks as well.

What will the config look like in the end:

  • one Controller VM on Proxmox under HA with a controller-db on the VM-disk?
  • two independant Controller VMs, one on each PVE node, and shared storage DRBD disk (couldn’t do this with Proxmox HA only, right?)?
  • How will the config look like for the VMs and probably the shared DRBD disk?
  • How to config the raspi?

I really appreciate your feedback! Thanks again.

From my point of view - as a noob - information about DRBD versions (v8 available on common repos, v9 avail. on your LINBIT repo) as well as LINBIT software, their relation and interaction and most of all up-to-date configuration is somewhat not straight to catch, spreaded over different guides and a little bit foggy. As said: looking through the eyes as someone who just started with this.

And there is this from @kernat:

I really would favour this 2-node-Pacemaker-Proxmox-LinbitController setup. Maybe you have a How-to at hand, which I‘ve overlooked.

Don’t install LINSTOR in VMs, install LINSTOR on the Hypervisor nodes. The only reason I would run an HCI storage cluster inside of VMs is if I couldn’t run it directly on the hypervisor, which you can do with Proxmox.

Almost everything you need to do is outlined step-by-step in that blog.

Just treat your raspberry pi as “proxmox-2” in the blog, and skip the step where you create the pve-storage storage pool on the rpi / quorum node. It will use its default “diskless storage pool”.

Then, you can install DRBD Reactor on both of your hypervisor nodes. You don’t need DRBD Reactor on the raspberry pi node, since DRBD Reactor’s job is to promote services (in this case the LINSTOR controller service), which you never want the raspberry pi to do.

And follow the instructions to make the LINSTOR controller HA using DRBD Reactor. You can use the pve-rg resource group created in the PVE blog I linked to spawn the linstor_db resource.

The “HA LINSTOR controller” section of the user’s guide links to the section on specifying multiple controllers for the LINSTOR client, but you should also adjust your /etc/pve/storage.cfg to include both LINSTOR controller nodes (something like this):

drbd: linstor_storage
    content images, rootdir
    controller 192.168.222.130,192.168.222.131
    resourcegroup pve-rg

Understandable. I like to say that LINBIT develops more tools than products. You’ll have to understand how to use drills, hammers, ladders, concrete, etc. (tools, like DRBD, Reactor, and LINSTOR) before you can build a house (or an HA storage cluster for Proxmox in this case). Feedback on how our docs could be improved is absolutely always appreciated and welcome.

If you really must use only two nodes, that’s possible, but not preferred from a technical stand-point and also more complicated as you’ll be substituting DRBD Reactor for Corosync, Pacemaker, and STONITH configurations, which are especially tricky in 2-node clusters where quorum isn’t possible.

1 Like

Understood and comprehensible. In particular the thing about LINBIT (aka any storage cluster software) on the hypervisor.

Pacemaker has been preferred by me for some not-yet-existing and only maybe-to-come requirements. As far as I can see now, I will be very happy with Proxmox’s HA, VMs an LXCs, although I would prefer a Docker integration instead of LXCs (without the VM-shielding). But without any reasonable requirement to run any other CRM, I will go the DRBD Reactor route.

I will keep this thread up-to-date and perhaps write down my impression as a feedback for your guides - here or on another address?

Again: thanks a lot for taking part in this thread.

Let’s talk about docs and guides.

My question from above:

What are the recommended packages to install via apt on a raspi as a diskless node? (The raspi should only serve as a quorum)

Hint: I don’t have any other third node, only the raspi.

You convinced me to go with DRBD Reactor and said I should follow the linked blog. This blog leads to https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-proxmox-installing-from-linbit-public-repos, which might not be the correct install guide, for I’m not using a 3rd PVE node, well no PVE on the raspi at all. Yes, the raspi is running as Proxmox Corosync qDevice, but here we are focussing on DRBD and what I have understood is, that I just need to setup a 3rd LINSTOR/DRBD diskless node as a DRBD (not PVE) quorum.

OK, let’s go with the above guide for installing anyway. Here is the trail (after getting the keyring):

root@raspi-1:~# PVERS=8 && echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
http://packages.linbit.com/public/ proxmox-$PVERS drbd-9" > /etc/apt/sources.list.d/linbit.list
root@raspi-1:~# apt update
Hit:1 http://archive.raspberrypi.com/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm InRelease
Hit:4 http://deb.debian.org/debian-security bookworm-security InRelease
Hit:5 http://deb.debian.org/debian bookworm-updates InRelease
Get:3 https://packages.linbit.com/public proxmox-8 InRelease [2,793 B]
Fetched 2,793 B in 1s (2,437 B/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done

Then

root@raspi-1:~# apt -y install drbd-dkms drbd-utils
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package drbd-dkms

Different approach - the overall installation in the LINSTOR guide (not Proxmox-related).

2.2.2. Using a Script to Manage LINBIT Cluster Nodes

If you are a LINBIT® customer, you can download a LINBIT created helper script and run it on your nodes to:...

Alas, I am not a paying customer. This is for home use. At work we are currently and still with VMware and I try possible future routes at home.

A little bit further down on topic 2.2.2:

If you want to be able to use LINSTOR to create DRBD replicated storage, you will need to install the required DRBD packages. Depending on the Linux distribution that you are running on your node, install the DRBD-related packages that the helper script suggested. If you need to review the script’s suggested packages and installation commands, you can enter:

# ./linbit-manage-node.py --hints

Ah, the script may also help free users. Let’s try:

root@raspi-1:~# ./linbit-manage-node.py --hints
Looks like you executed the script on a GENERIC system.
Enter "apt update" to update your LINBIT repositories.
If this is an SDS controller node you might want to install:
  apt install linbit-sds-controller
You can configure a highly available controller later via:
  https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-linstor_ha
If this is an SDS satellite node you might want to install:
  apt install linbit-sds-satellite drbd-module-6.6.62+rpt-rpi-v8 # or drbd-dkms
If you don't intend to run an SDS satellite or controller, a useful set is:
  apt install drbd-utils drbd-module-6.6.62+rpt-rpi-v8 # or drbd-dkms
For documentation see:
  https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#p-administration

Is the raspi going to be a satellite node? Guess so from the blog I still follow.

So it should be the line (for package drbd-dkms is not there - see above)

root@raspi-1:~# apt install linbit-sds-satellite drbd-module-6.6.62+rpt-rpi-v8
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linbit-sds-satellite
E: Unable to locate package drbd-module-6.6.62+rpt-rpi-v8
E: Couldn't find any package by glob 'drbd-module-6.6.62+rpt-rpi-v8'

One more thing: I also tried to change to

root@raspi-1:~# echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
http://packages.linbit.com/public/ bookworm misc" > /etc/apt/sources.list.d/linbit.list"

Outcome is the same. Pretty stuck here, despite all your help.

Are you following

How to Setup LINSTOR on Proxmox VE - LINBIT

?

It clearly specifies which packages to use. linbit-sds-satellite is not among them.

Thanks for joining this. Please read carefully:

  • the question is WHAT to install on the raspi and HOW
  • the raspi ist NOT a PVE node, in regards to “it’s not a Proxmox server”. It is setup with RasPi OS and has only an additional qDevice/Corosync package installed
  • the raspi has not and never will have a DRBD volume (as in: have local storage); it will be diskless

Maybe I understand things wrong, but

  • why should I follow “How to Setup LINBIT on Proxmox VE” where the computer (the raspi) on which I install is not Promox VE?
  • and even if I follow the guide you mentioned, I get the errors I listed above
  • BTW: in addition you will find, that the LINBIT-Proxmox repos do not contain any ARM packages

I assume ARM packages are reserved for subscribers. I could not find much googling “ARM” and “Linstor” but I did find the tweet below from 2020.

(5) LINBIT on X: “RT @philipp_reisner: Supporting our customers on PPC64, ARM64 besides x86_64 with pre-built packages of DRBD, #linstor, and Pacemaker in ou…” / X

Which would mean you’d have to compile Linstor yourself.

Maybe less of a hassle, then, to get a x86 MiniPC / NUC?

Yeah, maybe.

Raspi are just lying around. But a used MiniPC is cheap and maybe the cos is not worth the hassle.

What’s really sad is that the docs/guides/blogs and communication is too bad about it. It’s just pushing in one direction and one gets the impression that details are left in the fog intentionally.
I think I have really tried to focus on the point, narrowing it to the raspi, diskless, no third PVE node, etc. For an “open source” thing, that got pushed to make it into the standard Linux kernel, this is - IMHO - a bad thing.

@proxmeup Just about 30min ago I got this same setup working, and came here to post an unrelated question. My setup is not battle tested yet, but I have about a dozen VM drives that so far are behaving as expected during simulated outages. I currently have 2x PVE nodes, and 1x rpi5 I will use as the tie breaker.
I think the issue with what you’re doing is in the package repo you’re setting up. You’re pointing the rpi to linbit’s proxmox repo. Instead do this:
sudo add-apt-repository ppa:linbit/linbit-drbd9-stack && sudo apt update
then
sudo apt install drbd-dkms drbd-utils linstor-satellite

For my case I also made the rpi the controller, and installed linstor-controller, but it’s up to you how to setup the controller (single node, with HA, …).

Edit: I should have mentioned that my rpi is running Ubuntu 24.04 LTS server.
Edit#2: also ensure first you have matching kernel headers to the kernel that is installed. For me that was sudo apt install linux-headers-6.8.0-1016-raspi, but also make sure if you did an upgrade, and if the kernel is changed, do a reboot, then confirm the headers are installed. Running the install of the headers triggers the automatic rebuild of the drbd-dkms drivers. lsmod should show drbd_transport_tcp. If it only shows drbd, you likely have the old v8 driver loaded which comes with the mainline kernel.

2 Likes

Great info! :smiley:

How have you configured everything after installing? If I do

linstor node create raspi-1 <raspi-IP>

I get warnings at the end

WARNING:
    Resource did not become ready on node 'raspi-1' within reasonable time, check Satellite for errors.

and

linstor resource list

shows errors

| ResourceName | Node    | Port | Usage  | Conns                   |      State | CreatedOn           |
|=====================================================================================================|
| linstor_db   | pve-1   | 7000 | Unused | Connecting(raspi-1)     |   UpToDate | 2024-11-26 18:09:01 |
| linstor_db   | pve-2   | 7000 | Unused | Connecting(raspi-1)     |   UpToDate | 2024-11-26 18:09:01 |
| linstor_db   | raspi-1 | 7000 | Unused | Connecting(pve-1,pve-2) | TieBreaker | 2024-12-17 17:25:51 |

Output of journalctl

Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db: Starting worker thread (node-id 2)
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-1: Starting sender thread (peer-node-id 0)
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-2: Starting sender thread (peer-node-id 1)
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-1: conn( StandAlone -> Unconnected ) [connect]
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-1: Starting receiver thread (peer-node-id 0)
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-1: conn( Unconnected -> Connecting ) [connecting]
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-2: conn( StandAlone -> Unconnected ) [connect]
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-2: Starting receiver thread (peer-node-id 1)
Dec 17 17:07:27 raspi-1 kernel: drbd linstor_db pve-2: conn( Unconnected -> Connecting ) [connecting]
Dec 17 17:07:27 raspi-1 Satellite[6552]: 2024-12-17 17:07:27.454 [DeviceManager] INFO  LINSTOR/Satellite/65ad93 SYSTEM - Resource 'linstor_db' [DRBD] adjusted.
Dec 17 17:07:27 raspi-1 (udev-worker)[7760]: drbd1000: Process '/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/drbd1000' failed with exit code 1.
Dec 17 17:07:27 raspi-1 Satellite[6552]: 2024-12-17 17:07:27.486 [DeviceManager] INFO  LINSTOR/Satellite/ SYSTEM - End DeviceManager cycle 8
Dec 17 17:07:27 raspi-1 Satellite[6552]: 2024-12-17 17:07:27.488 [DeviceManager] INFO  LINSTOR/Satellite/50e0cb SYSTEM - Begin DeviceManager cycle 9
Dec 17 17:07:27 raspi-1 Satellite[6552]: 2024-12-17 17:07:27.597 [DeviceManager] INFO  LINSTOR/Satellite/50e0cb SYSTEM - DRBD regenerated resource file: /var/l>

The aforementioned problems came from a preferred NIC setting for DRBD replication on the PVE nodes. Maybe “preferred” is the wrong word to use, if it is the only connection that’s being used after configuring this setting?!

I have changed to Ubuntu on the raspi. With Ubuntu and the Information from @mtisza I could successfully follow How to Setup LINSTOR on Proxmox VE.
At the end the raspi is shown as TieBreaker for every resource.

Then I went on to make the controller HA. I took into account, what @kermat said: installation of controller on PVE nodes is preferred.

Creating a linstor_db resource, mounting it on PVE-1, disabling the linstor-controller in the VM, moving the database to the mounted resource and installing linstor-controller on PVE-1. Pointing linstor-client.conf to the new controller IP.
Result: PVE-1 online, PVE-2 and raspi “version mismatch”.

Update linstor-satellite on PVE-2: now also online. Great.

Update linstor-satellite on raspi…
I tend to say “of cause”: new problems!

apt update got stuck and even dpkg --configure -a couldn’t rescue anything. linstor could not communicate with the raspi anymore. Now raspi is not healthy anymore. Idea: delete raspi as a node from linstor, purge linstor packages and reinstall them on raspi.
But no, I can’t do it, for after purging everything, the LINBIT apt packages install hangs forever:

<snip>
depmod......
Setting up linstor-satellite (1.30.1-1ppa1~noble1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/linstor-satellite.service → /usr/lib/systemd/system/linstor-satellite.service.

Progress: [ 88%] [##########

linstor node delete leaves node raspi in DELETING state. Couldn’t find any resource that shows how to force-delete, just a GitHub issue from 2019 without any practical solution. Also couldn’t remove raspi from configured resources.

In the meantime there were updates available on PVE, as this thread is really slowly evolving. Updating and rebooting the PVE nodes left DRBD storage resources in an unaccessible way. The linstor-controller service on PVE-1 (not yet installed anywhere else) won’t start anymore, the linstor_db resource can’t be mounted.

DRBDadm status is giving:

/var/lib/linstor.d/linstor_db.res:10: Parse error: 'an option keyword' expected,
	but got 'quorum'

All other DRBD resources are offline as well, all depending LXC and VMs are down.

I must ask: am I really too dumb to configure this software or is the necessary information too shattered and spread? What about resiliency?
Before I thought, that I complicated this with using a raspi. But raspis are mentioned by LINBIT themselves and I guess the apt update could also have happened on a x86 node.

Where is the information how to get out of the DELETING node?
Where is the information that points me to the cause why the linbit_db won’t mount and the controller service won’t start (code=exited, status=20)?
Why is the DRBD linbit_db resource file “# This file was generated by linstor(1.30.2), do not edit manually.” giving the error with drbdadm?

I would really, really appreciate, if someone from LINBIT took a look.

@proxmeup You mention a lot of issues, one of which I hit myself too yesterday.

The hung apt and/or hung satellite service start is a known issue discussed here. I used the first option (disable satellite service before/enable after all upgrades).

However, in this RPi case there’s another complication to the same issue. The latest linstor- controller/satellite package version bump was issued yesterday. At the instant when I happened to run an apt update/upgrade yesterday, proxmox repos were updated, but the ubuntu ppa (for the RPi) was not. So there was a version mismatch using the latest from each. I suspect this is what happened to you as well. Check your versions all match across all nodes.

To resolve this, on the PVE nodes I downgraded the packages linstor-common, linstor-controller, linstor-satellite to match the version on the RPi. This got it working again. To prevent this from happening again while doing apt update/upgrade for unrelated reasons, on all three nodes I marked all three of those packages with a “hold” using sudo apt-mark hold <pkgname>. When it comes time to upgrade them again, I will need to sudo apt-mark unhold <package_name>.

I’m not seeing the other issues you mention, but I wonder how well things work with mismatched versions, so perhaps resolve this and report back what issues you still see.

1 Like

Again, you were a little faster than me. I spend too much time getting angry about it. Just as I realized, that there must be version mismatches between the pve and ppa repo, I read your post. :grinning:

So thank you for your post!

I think for today I am too annoyed to downgrade the packages and spend more time. I fell like I’ve already spend too much time with this single piece of software. Maybe I will fall back to ZFS with replication. It’s not real-time, but it just works.

IMHO the system (LINBIT and DRBD) and their dependencies and interactions are missing explanation. The available documentation is not straight and hard to follow. Good man / reference pages for the command tools seem missing. I’ve not used DRBD before, but from I have read, it should be a well working thing. With LINBIT (the software system as documented, not the company) on top it feels like there are too many fragile dependencies and the system got unstable if something is not 100% correct. Not problem in → message out, more problem in → crash.

Anyway, the next couple of days won’t give me time to advance and then comes X-mas. I’ve spend so much time and I am not anywhere near, where I wanted to be (providing HA services on Proxmox, not still fighting with the basic software). Luckily some services do HA by themselves, without central or shared storage. Running and configuring such services on each Proxmox node has been a breeze.

This tells me that the DRBD 9 kernel module isn’t loaded and you’ve loaded the in-tree DRBD 8 kernel module. LINSTOR cannot manage a DRBD 8 kernel module.

$ cat /proc/drbd
version: 9.2.12 (api:2/proto:118-122)
GIT-hash: 2da6f528dc4ab3fd25c511f7b03531100e54ab08 build by @buildsystem, 2024-11-18 10:29:30
Transports (api:21):

To elaborate, DRBD is a kernel module, so when you install kernel updates, you need to be sure you’re updating the DRBD kernel module as well. DRBD version 8 is included in the Linux kernel, so if you don’t install an appropriate DRBD version 9 kernel module, the kernel will “fall back” to the in-tree DRBD version 8 module instead.

Mmmh, but DRBD 9 was running before. You can get it from the resource definition (containing the quorum keyword) and that it has been working before with controller version 1.29.2-1 . And then I apt upgraded to the latest package versions.

linstor-client/unknown 1.24.0-1 all [upgradable from: 1.23.2-1]
linstor-common/unknown 1.30.2-1 all [upgradable from: 1.29.2-1]
linstor-controller/unknown 1.30.2-1 all [upgradable from: 1.29.2-1]
python-linstor/unknown 1.24.0-1 all [upgradable from: 1.23.1-1]

Does it make sense, that there is no DRBD9 after this (and the lastest Proxmox / Debian) apt upgrade? I went back through my bash_history: there is no sign of intentionally reverting back to DRBD8. I am totally clueless, but of cause you’re guessing right:

root@pve-1:~# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 211FB288A383ED945B83420

Are you aware of anything in the lastest Proxmox / Debian updates, which would cause the system to fall back to DRBD8 again?

So, a state of “OFFLINE(MISSING EXTERNAL TOOLS)” from “linstor node list” means that the correct DRBD version (tools) is missing?

In the meantime - my wife giving me annoyed looks - I downgraded to the previous versions…

apt install linstor-client=1.23.2-1
apt install linstor-common=1.29.2-1
apt install linstor-controller=1.29.2-1
apt install linstor-python=1.23.1-1
apt install python-linstor=1.23.1-1
apt install linstor-satellite=1.29.2-1
apt install linstor-proxmox
apt autoremove

…only to find, that maybe the database has been modified/updated by the lastest linstor versions?!

root@linstor-controller:~# cat /var/log/linstor-controller/ErrorReport-67653320-00000-000000.log 
ERROR REPORT 67653320-00000-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.29.2
Build ID:                           372c916b7d97fa10e8ea480b66ea3da665ab5849
Build time:                         2024-11-05T11:22:22+00:00
Error time:                         2024-12-20 10:04:36
Node:                               linstor-controller
Thread:                             Main

============================================================

Reported error:
===============

Category:                           LinStorException
Class name:                         SystemServiceStartException
Class canonical name:               com.linbit.SystemServiceStartException
Generated at:                       Method 'initialize', Source file 'DbConnectionPoolInitializer.java', Line #71

Error message:                      Database initialization error

ErrorContext:


Call backtrace:

    Method                                   Native Class:Line number
    initialize                               N      com.linbit.linstor.dbcp.DbConnectionPoolInitializer:71
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:88
    start                                    N      com.linbit.linstor.core.Controller:375
    main                                     N      com.linbit.linstor.core.Controller:627

Caused by:
==========

Category:                           RuntimeException
Class name:                         FlywayValidateException
Class canonical name:               org.flywaydb.core.api.exception.FlywayValidateException
Generated at:                       Method 'execute', Source file 'Flyway.java', Line #177

Error message:                      Validate failed: Migrations have failed validation
Detected applied migration not resolved locally: 2024.10.24.10.00. If you removed this migration intentionally, run repair to mark the migration as deleted.
Detected applied migration not resolved locally: 2024.12.18.10.00. If you removed this migration intentionally, run repair to mark the migration as deleted.
Need more flexibility with validation rules? Learn more: https://rd.gt/3AbJUZE

Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      org.flywaydb.core.Flyway$1:177
    execute                                  N      org.flywaydb.core.Flyway$1:170
    execute                                  N      org.flywaydb.core.Flyway:586
    migrate                                  N      org.flywaydb.core.Flyway:170
    migrate                                  N      com.linbit.linstor.dbcp.DbConnectionPool:222
    initialize                               N      com.linbit.linstor.dbcp.DbConnectionPoolInitializer:63
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:88
    start                                    N      com.linbit.linstor.core.Controller:375
    main                                     N      com.linbit.linstor.core.Controller:627


END OF ERROR REPORT.

Going back to the last db backup before apt upgrade makes the controller service start again.

@kermat is right with “One thing at a time…” so I will move the next paragraph to it’s own thread:

One more question regarding the PrefNic property:

  • is there an option to set a preferred and a fallback NIC?
  • will DRBD replication use all interfaces of the physical host?
    (I have a port on the quad NIC reserved for replication, but it is connected crossover to the second PVE node, not to the switch. Hence it can not communicate with the raspi - which is diskless and doesn’t need the resource’s content replication traffic.)

I write about what I stumble across, hoping it will help other users.

One thing at a time, pelase.

Those are all LINSTOR packages, and no DRBD packages are shown, but I don’t know what commands you ran to get that list or what you’re trying to show me in that list.

I wasn’t trying to say you uninstalled DRBD 9 or ran some command to revert back to DRBD 8, I was trying to say that the DRBD 9 package you have installed needs to be compatible with the kernel you are running. If it is not compatible, while the in-tree module is compatible, the in-tree module will be loaded instead.

If you installed the drbd-dkms package to get DRBD, you’re essentially rebuilding the DRBD kernel module each time you upgrade your kernel. Your kernel update probably threw some dkms warning or error that wasn’t noticed. Try to “reinstall” (forcing a rebuild for your currently installed kernels) the drbd-dkms to see if that either loads the DRBD 9 module or reveals some error:

sudo apt install drbd-dkms --reinstall
sudo rmmod drbd
sudo modprobe drbd

If it doesn’t work, can you show us the outputs of these commands:

uname -r
dpkg -l | awk '{print $2}' | grep -ie drbd -ie headers -ie proxmox-kernel

I’m aware there are multiple kernels available in the Proxmox repositories. Not just multiple kernel versions, but multiple kernels with multiple versions. This snippet from the LINSTOR Users Guide section covering the Proxmox integration explains how you need to have the correct headers installed, but maybe there is more to add here:

EDIT: Maybe there aren’t multiple kernels :thinking: maybe there was just a name change recently that I’m remembering. Either way, dealing with kernels and kernel modules can be confusing at first, but once you understand the relationship it isn’t so bad, I promise.

1 Like