I’m in the process of building up a production Proxmox+Linstor cluster. I’m up to 2 nodes, which for historical reasons are called virtual1
and virtual5
. I’ve added Linstor HA as per section 3.2 of the Linstor Guide.
For testing purposes, I rebooted virtual5 (the backup). When it came back up, systemd says the system is “degraded”:
root@virtual5:~# systemctl status
● virtual5
State: degraded
Units: 569 loaded (incl. loaded aliases)
Jobs: 0 queued
Failed: 1 units
...
root@virtual5:~# systemctl status --failed
× drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db
Loaded: loaded (/lib/systemd/system/drbd-promote@.service; static)
Drop-In: /run/systemd/system/drbd-promote@linstor_db.service.d
└─reactor.conf
Active: failed (Result: exit-code) since Sat 2025-05-10 12:04:26 PDT; 2min 42s ago
Docs: man:drbd-promote@.service
Process: 2499 ExecStart=/usr/lib/drbd/scripts/drbd-service-shim.sh primary linstor_db (code=exited, status=17)
Main PID: 2499 (code=exited, status=17)
CPU: 2ms
May 10 12:04:26 virtual5 systemd[1]: Starting drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db...
May 10 12:04:26 virtual5 drbd-linstor_db[2499]: linstor_db: State change failed: (-2) Need access to UpToDate data
May 10 12:04:26 virtual5 systemd[1]: drbd-promote@linstor_db.service: Main process exited, code=exited, status=17/n/a
May 10 12:04:26 virtual5 systemd[1]: drbd-promote@linstor_db.service: Failed with result 'exit-code'.
May 10 12:04:26 virtual5 systemd[1]: Failed to start drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db.
Is this normal? (In a 2-node setup?)
I’m not sure why it’s trying to promote the linstor_db resource - it should remain as secondary.
This isn’t stopping the controller from working on the primary node, but I’d like to get to the bottom of what’s going on, and I certainly wouldn’t want the secondary to accidentally promote itself.
I’m pretty sure everything is set up right. The linstor_db resource is replicated:
root@virtual1:~# linstor v l --resource linstor_db -p
+------------------------------------------------------------------------------------------------------------------------+
| Resource | Node | StoragePool | VolNr | MinorNr | DeviceName | Allocated | InUse | State | Repl |
|========================================================================================================================|
| linstor_db | virtual1 | sata_ssd | 0 | 1002 | /dev/drbd1002 | 204 MiB | InUse | UpToDate | Established(1) |
| linstor_db | virtual5 | sata_ssd | 0 | 1002 | /dev/drbd1002 | 204 MiB | Unused | UpToDate | Established(1) |
+------------------------------------------------------------------------------------------------------------------------+
root@virtual1:~# linstor rg lp linstor-db-grp -p
+----------------------------------------------------------------------+
| Key | Value |
|======================================================================|
| DrbdOptions/Resource/auto-promote | no |
| DrbdOptions/Resource/on-no-data-accessible | io-error |
| DrbdOptions/Resource/on-no-quorum | io-error |
| DrbdOptions/Resource/on-suspended-primary-outdated | force-secondary |
| DrbdOptions/Resource/quorum | majority |
| Internal/Drbd/QuorumSetBy | user |
+----------------------------------------------------------------------+
It exists on the secondary:
root@virtual5:~# ls -l /dev/drbd/by-res/linstor_db/
total 0
lrwxrwxrwx 1 root root 17 May 10 12:04 0 -> ../../../drbd1002
Both nodes agree which is primary (no split brain AFAICS):
root@virtual1:~# drbdadm status linstor_db
linstor_db role:Primary
disk:UpToDate open:yes
virtual5 role:Secondary
peer-disk:UpToDate
root@virtual5:~# drbdadm status linstor_db
linstor_db role:Secondary
disk:UpToDate open:no
virtual1 role:Primary
peer-disk:UpToDate
This is the status from the reactor on the secondary:
root@virtual5:~# drbd-reactorctl status linstor_db
/etc/drbd-reactor.d/linstor_db.toml:
Promoter: Currently active on node 'virtual1'
○ drbd-services@linstor_db.target
× ├─ drbd-promote@linstor_db.service
○ ├─ var-lib-linstor.mount
○ └─ linstor-controller.service
The configs are identical between primary and secondary:
root@virtual5:~# cat /etc/drbd-reactor.d/linstor_db.toml
[[promoter]]
[promoter.resources.linstor_db]
start = ["var-lib-linstor.mount", "linstor-controller.service"]
root@virtual5:~# cat /etc/systemd/system/var-lib-linstor.mount
[Unit]
Description=Filesystem for the LINSTOR controller
[Mount]
What=/dev/drbd/by-res/linstor_db/0
Where=/var/lib/linstor
root@virtual5:~# cat /etc/systemd/system/linstor-satellite.service.d/override.conf
[Service]
Environment=LS_KEEP_RES=linstor_db
Any clues as to what else I can look for? Thanks!
P.S. A restart of drbd-reactor doesn’t change the system status, nor does it change the output from drbd-reactorctl.
root@virtual5:~# systemctl restart drbd-reactor
root@virtual5:~# systemctl status | head -2
● virtual5
State: degraded
root@virtual5:~# drbd-reactorctl status linstor_db
/etc/drbd-reactor.d/linstor_db.toml:
Promoter: Currently active on node 'virtual1'
○ drbd-services@linstor_db.target
× ├─ drbd-promote@linstor_db.service
○ ├─ var-lib-linstor.mount
○ └─ linstor-controller.service