I rebuilt my 3-node Proxmox cluster with Linstor GUI and followed the instructions in the blog. This time I installed the controller as HA on all three nodes according to the instructions.
So far, this works. If I take a node offline, another controller takes over. However, I am currently failing to get the virtual IP (VIP) to run and I don’t understand where it is failing. The solutions in the forum (here, here or here) do not lead to the desired result.
Are there any other dependencies I have overlooked besides DRBD Promoter? When I try to start drbd-reactor, I see the following errors:
drbd-reactor[3170]: WARN [drbd_reactor::plugin::promoter] Starting 'linstor_db' failed: Return code not status success
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] run: resource 'linstor_db' may promote after 0ms
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] systemd_start: systemctl start drbd-services@linstor_db.target
drbd-reactor[3170]: WARN [drbd_reactor::plugin::promoter] Starting 'linstor_db' failed: Return code not status success
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target
This does not help me much. Are there useful logs?
for the ocf agents you also need the resource-agents package installed. maybe that is missing?
other than that I would:
systemctl stop drbd-reactor.service on all nodes
check that the DRBD resource it not primary (drbdsetup status)
manually start the services one after another and see where things fail (i.e., systemctl start drbd-promote@linstor_db.service, check, systemctl start ocf.rs@service_ip_linstor_db.service, check, …)
the first service where things fail is the one you need to take a closer look
resource-agents was not installed. I don’t know exactly where I should know this from - I can’t find anything about it in the docs. I’m just starting to get into the topic, maybe the documentation is just not intended for people like me.
Edit: found this:
I was able to start drbd-reactor.service normally. However, systemctl start ocf.rs@service_ip_linstor_db.service fails with the following error:
× ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db
Loaded: loaded (/lib/systemd/system/ocf.rs@.service; static)
Drop-In: /run/systemd/system/ocf.rs@service_ip_linstor_db.service.d
└─reactor.conf
Active: failed (Result: exit-code) since Thu 2025-04-24 09:41:06 CEST; 3s ago
Process: 6169 ExecStart=/usr/libexec/drbd-reactor/ocf-rs-wrapper start-and-monitor ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
Process: 6211 ExecStopPost=/usr/libexec/drbd-reactor/ocf-rs-wrapper stop ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
Main PID: 6169 (code=exited, status=6)
Status: "IPaddr2:service_ip_linstor_db,s-a-m,start: FAILED with exit code 6"
CPU: 36ms
systemd[1]: ocf.rs@service_ip_linstor_db.service: Scheduled restart job, restart counter is at 5.
systemd[1]: Stopped ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
systemd[1]: ocf.rs@service_ip_linstor_db.service: Start request repeated too quickly.
systemd[1]: ocf.rs@service_ip_linstor_db.service: Failed with result 'exit-code'.
systemd[1]: Failed to start ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
When running drbd-reactorctl status --verbose linstor_db.toml, I get the following errors:
Summary
root@pve01:~# drbd-reactorctl status --verbose linstor_db.toml
/etc/drbd-reactor.d/linstor_db.toml:
Promoter: Currently active on this node
○ drbd-services@linstor_db.target - Services for DRBD resource linstor_db
Loaded: loaded (/lib/systemd/system/drbd-services@.target; static)
Drop-In: /run/systemd/system/drbd-services@linstor_db.target.d
└─reactor-50-before.conf, reactor.conf
Active: inactive (dead)
Docs: man:drbd-services@.target(7)
Apr 24 09:37:18 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:37:18 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:37:38 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:37:38 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:37:58 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:37:58 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:38:19 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:38:19 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:39:04 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:39:04 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
● drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db
Loaded: loaded (/lib/systemd/system/drbd-promote@.service; static)
Drop-In: /run/systemd/system/drbd-promote@linstor_db.service.d
└─reactor.conf
Active: active (exited) since Thu 2025-04-24 09:39:50 CEST; 15min ago
Docs: man:drbd-promote@.service
Process: 5665 ExecStart=/usr/lib/drbd/scripts/drbd-service-shim.sh primary linstor_db (code=exited, status=0/SUCCESS)
Main PID: 5665 (code=exited, status=0/SUCCESS)
CPU: 1ms
Apr 24 09:39:50 pve01 systemd[1]: Starting drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db...
Apr 24 09:39:50 pve01 systemd[1]: Finished drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db.
× ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db
Loaded: loaded (/lib/systemd/system/ocf.rs@.service; static)
Drop-In: /run/systemd/system/ocf.rs@service_ip_linstor_db.service.d
└─reactor.conf
Active: failed (Result: exit-code) since Thu 2025-04-24 09:55:43 CEST; 3s ago
Process: 8736 ExecStart=/usr/libexec/drbd-reactor/ocf-rs-wrapper start-and-monitor ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
Process: 8778 ExecStopPost=/usr/libexec/drbd-reactor/ocf-rs-wrapper stop ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
Main PID: 8736 (code=exited, status=6)
Status: "IPaddr2:service_ip_linstor_db,s-a-m,start: FAILED with exit code 6"
CPU: 37ms
Apr 24 09:55:43 pve01 systemd[1]: ocf.rs@service_ip_linstor_db.service: Scheduled restart job, restart counter is at 5.
Apr 24 09:55:43 pve01 systemd[1]: Stopped ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
Apr 24 09:55:43 pve01 systemd[1]: ocf.rs@service_ip_linstor_db.service: Start request repeated too quickly.
Apr 24 09:55:43 pve01 systemd[1]: ocf.rs@service_ip_linstor_db.service: Failed with result 'exit-code'.
Apr 24 09:55:43 pve01 systemd[1]: Failed to start ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
○ var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor
Loaded: loaded (/etc/systemd/system/var-lib-linstor.mount; static)
Drop-In: /run/systemd/system/var-lib-linstor.mount.d
└─reactor-50-mount.conf, reactor.conf
Active: inactive (dead)
Where: /var/lib/linstor
What: /dev/drbd/by-res/linstor_db/0
Apr 24 09:37:18 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:37:18 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:37:38 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:37:38 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:37:58 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:37:58 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:38:19 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:38:19 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:39:04 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:39:04 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
○ linstor-controller.service - drbd-reactor controlled linstor-controller
Loaded: loaded (/lib/systemd/system/linstor-controller.service; disabled; preset: enabled)
Drop-In: /run/systemd/system/linstor-controller.service.d
└─reactor.conf
Active: inactive (dead)
Apr 24 09:37:18 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:37:18 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:37:38 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:37:38 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:37:58 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:37:58 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:38:19 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:38:19 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:39:04 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:39:04 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
What are these other dependencies? Can I somehow find out about them?
In the meantime, I found your debug info page. Since ocf.rs@service_ip_linstor_db.service not starting, I tried systemctl list-dependencies ocf.rs@service_ip_linstor_db.service:
OCF agents are expected in /usr/lib/ocf/resource.d/. Please make sure to check for resource-agents packages provided by your distribution or use the packages provided by LINBIT (customers only).
these are not packge dependencies, these are systemd dependencies. every service in the start = [] list depends on the one before it. like the linstor-controller.service depends on the var-lib-linstor.mount, which depends on ocf.rs@ and down the road. IIRC that current debian ships broken resource agents. You can replace the one script with the one from the upstream resource-agents. IIRC we reported that to the Debian people, might be that they did not fix it.
Also FWIW: I would not use a resource agent in this case at all. There are use cases where one wants it, but most cases it is better to just have the linstor DB and controller up on whatever node and configure multiple IPs (the ones of the potential controllers) in the linstor-proxmox plugin configuration.
Thank you @rck, I appreciate your support and time.
I probably just expected it to be a little more DAU friendly.
This was my plan B, but I think I read somewhere that Proxmox might have issues with it (changing controller IP). I have already reconfigured and services are up and running.
just for people that might stumble upon that thread later: no. not true. configuring multiple IPs in the plugin is how 99% of our users do it, FLOSS users as well as customers. The code in the plugin just tries to connect to the configured IPs, the first that answers wins, reactor makes sure that there is only one at a time. simple as that, no magic.
One thing, the only downside without using a virtual IP address is you sometimes have to determine where the LINSTOR controller is running. For example, which IP address do you need to use to log in to the LINSTOR GUI?
If you never want to ask yourself that question, the virtual IP address approach is slightly more complex to configure, but might be a little more elegant to manage day-to-day in your environment.