VIP not working

I would be thankful for your help.

I rebuilt my 3-node Proxmox cluster with Linstor GUI and followed the instructions in the blog. This time I installed the controller as HA on all three nodes according to the instructions.

So far, this works. If I take a node offline, another controller takes over. However, I am currently failing to get the virtual IP (VIP) to run and I don’t understand where it is failing. The solutions in the forum (here, here or here) do not lead to the desired result.

Are there any other dependencies I have overlooked besides DRBD Promoter? When I try to start drbd-reactor, I see the following errors:

drbd-reactor[3170]: WARN [drbd_reactor::plugin::promoter] Starting 'linstor_db' failed: Return code not status success
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] run: resource 'linstor_db' may promote after 0ms
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] systemd_start: systemctl start drbd-services@linstor_db.target
drbd-reactor[3170]: WARN [drbd_reactor::plugin::promoter] Starting 'linstor_db' failed: Return code not status success
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
drbd-reactor[3170]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target

This does not help me much. Are there useful logs?

My linstor_db.toml file looks like this:

[[promoter]]
id = "linstor_db"
[promoter.resources.linstor_db]
start = [
"ocf:heartbeat:IPaddr2 service_ip ip=10.5.80.10 cidr_netmask=24 iflabel=virtualip",
"var-lib-linstor.mount",
"linstor-controller.service"
]

drbd-reactorctl status shows:

Promoter: Currently active on <unknown>
○ drbd-services@linstor_db.target
○ ├─ drbd-promote@linstor_db.service
○ ├─ ocf.rs@service_ip_linstor_db.service
○ ├─ var-lib-linstor.mount
○ └─ linstor-controller.service

Do you have any suggestions for me?

for the ocf agents you also need the resource-agents package installed. maybe that is missing?

other than that I would:

  • systemctl stop drbd-reactor.service on all nodes
  • check that the DRBD resource it not primary (drbdsetup status)
  • manually start the services one after another and see where things fail (i.e., systemctl start drbd-promote@linstor_db.service, check, systemctl start ocf.rs@service_ip_linstor_db.service, check, …)

the first service where things fail is the one you need to take a closer look

Thank you @rck.

resource-agents was not installed. I don’t know exactly where I should know this from - I can’t find anything about it in the docs. I’m just starting to get into the topic, maybe the documentation is just not intended for people like me.

Edit: found this:

I was able to start drbd-reactor.service normally. However, systemctl start ocf.rs@service_ip_linstor_db.service fails with the following error:

× ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db
     Loaded: loaded (/lib/systemd/system/ocf.rs@.service; static)
    Drop-In: /run/systemd/system/ocf.rs@service_ip_linstor_db.service.d
             └─reactor.conf
     Active: failed (Result: exit-code) since Thu 2025-04-24 09:41:06 CEST; 3s ago
    Process: 6169 ExecStart=/usr/libexec/drbd-reactor/ocf-rs-wrapper start-and-monitor ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
    Process: 6211 ExecStopPost=/usr/libexec/drbd-reactor/ocf-rs-wrapper stop ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
   Main PID: 6169 (code=exited, status=6)
     Status: "IPaddr2:service_ip_linstor_db,s-a-m,start: FAILED with exit code 6"
        CPU: 36ms

systemd[1]: ocf.rs@service_ip_linstor_db.service: Scheduled restart job, restart counter is at 5.
systemd[1]: Stopped ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
systemd[1]: ocf.rs@service_ip_linstor_db.service: Start request repeated too quickly.
systemd[1]: ocf.rs@service_ip_linstor_db.service: Failed with result 'exit-code'.
systemd[1]: Failed to start ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.

When running drbd-reactorctl status --verbose linstor_db.toml, I get the following errors:

Summary
root@pve01:~# drbd-reactorctl status --verbose linstor_db.toml
/etc/drbd-reactor.d/linstor_db.toml:
Promoter: Currently active on this node
○ drbd-services@linstor_db.target - Services for DRBD resource linstor_db
     Loaded: loaded (/lib/systemd/system/drbd-services@.target; static)
    Drop-In: /run/systemd/system/drbd-services@linstor_db.target.d
             └─reactor-50-before.conf, reactor.conf
     Active: inactive (dead)
       Docs: man:drbd-services@.target(7)

Apr 24 09:37:18 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:37:18 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:37:38 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:37:38 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:37:58 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:37:58 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:38:19 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:38:19 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
Apr 24 09:39:04 pve01 systemd[1]: Dependency failed for drbd-services@linstor_db.target - Services for DRBD resource linstor_db.
Apr 24 09:39:04 pve01 systemd[1]: drbd-services@linstor_db.target: Job drbd-services@linstor_db.target/start failed with result 'dependency'.
● drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db
     Loaded: loaded (/lib/systemd/system/drbd-promote@.service; static)
    Drop-In: /run/systemd/system/drbd-promote@linstor_db.service.d
             └─reactor.conf
     Active: active (exited) since Thu 2025-04-24 09:39:50 CEST; 15min ago
       Docs: man:drbd-promote@.service
    Process: 5665 ExecStart=/usr/lib/drbd/scripts/drbd-service-shim.sh primary linstor_db (code=exited, status=0/SUCCESS)
   Main PID: 5665 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Apr 24 09:39:50 pve01 systemd[1]: Starting drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db...
Apr 24 09:39:50 pve01 systemd[1]: Finished drbd-promote@linstor_db.service - Promotion of DRBD resource linstor_db.
× ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db
     Loaded: loaded (/lib/systemd/system/ocf.rs@.service; static)
    Drop-In: /run/systemd/system/ocf.rs@service_ip_linstor_db.service.d
             └─reactor.conf
     Active: failed (Result: exit-code) since Thu 2025-04-24 09:55:43 CEST; 3s ago
    Process: 8736 ExecStart=/usr/libexec/drbd-reactor/ocf-rs-wrapper start-and-monitor ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
    Process: 8778 ExecStopPost=/usr/libexec/drbd-reactor/ocf-rs-wrapper stop ocf.rs@service_ip_linstor_db.service (code=exited, status=6)
   Main PID: 8736 (code=exited, status=6)
     Status: "IPaddr2:service_ip_linstor_db,s-a-m,start: FAILED with exit code 6"
        CPU: 37ms

Apr 24 09:55:43 pve01 systemd[1]: ocf.rs@service_ip_linstor_db.service: Scheduled restart job, restart counter is at 5.
Apr 24 09:55:43 pve01 systemd[1]: Stopped ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
Apr 24 09:55:43 pve01 systemd[1]: ocf.rs@service_ip_linstor_db.service: Start request repeated too quickly.
Apr 24 09:55:43 pve01 systemd[1]: ocf.rs@service_ip_linstor_db.service: Failed with result 'exit-code'.
Apr 24 09:55:43 pve01 systemd[1]: Failed to start ocf.rs@service_ip_linstor_db.service - drbd-reactor controlled ocf.rs@service_ip_linstor_db.
○ var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor
     Loaded: loaded (/etc/systemd/system/var-lib-linstor.mount; static)
    Drop-In: /run/systemd/system/var-lib-linstor.mount.d
             └─reactor-50-mount.conf, reactor.conf
     Active: inactive (dead)
      Where: /var/lib/linstor
       What: /dev/drbd/by-res/linstor_db/0

Apr 24 09:37:18 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:37:18 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:37:38 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:37:38 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:37:58 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:37:58 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:38:19 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:38:19 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
Apr 24 09:39:04 pve01 systemd[1]: Dependency failed for var-lib-linstor.mount - drbd-reactor controlled var-lib-linstor.
Apr 24 09:39:04 pve01 systemd[1]: var-lib-linstor.mount: Job var-lib-linstor.mount/start failed with result 'dependency'.
○ linstor-controller.service - drbd-reactor controlled linstor-controller
     Loaded: loaded (/lib/systemd/system/linstor-controller.service; disabled; preset: enabled)
    Drop-In: /run/systemd/system/linstor-controller.service.d
             └─reactor.conf
     Active: inactive (dead)

Apr 24 09:37:18 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:37:18 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:37:38 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:37:38 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:37:58 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:37:58 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:38:19 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:38:19 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.
Apr 24 09:39:04 pve01 systemd[1]: Dependency failed for linstor-controller.service - drbd-reactor controlled linstor-controller.
Apr 24 09:39:04 pve01 systemd[1]: linstor-controller.service: Job linstor-controller.service/start failed with result 'dependency'.

What are these other dependencies? Can I somehow find out about them?

In the meantime, I found your debug info page. Since ocf.rs@service_ip_linstor_db.service not starting, I tried systemctl list-dependencies ocf.rs@service_ip_linstor_db.service:

root@pve01:~# systemctl list-dependencies ocf.rs@service_ip_linstor_db.service
ocf.rs@service_ip_linstor_db.service
● ├─drbd-promote@linstor_db.service
● ├─system-ocf.rs.slice
● └─sysinit.target
●   ├─apparmor.service
●   ├─blk-availability.service
●   ├─dev-hugepages.mount
●   ├─dev-mqueue.mount
●   ├─keyboard-setup.service
●   ├─kmod-static-nodes.service
●   ├─lvm2-lvmpolld.socket
●   ├─lvm2-monitor.service
○   ├─open-iscsi.service
●   ├─proc-sys-fs-binfmt_misc.automount
●   ├─pvenetcommit.service
●   ├─sys-fs-fuse-connections.mount
●   ├─sys-kernel-config.mount
●   ├─sys-kernel-debug.mount
●   ├─sys-kernel-tracing.mount
●   ├─systemd-ask-password-console.path
●   ├─systemd-binfmt.service
○   ├─systemd-boot-system-token.service
○   ├─systemd-firstboot.service
●   ├─systemd-journal-flush.service
●   ├─systemd-journald.service
○   ├─systemd-machine-id-commit.service
●   ├─systemd-modules-load.service
○   ├─systemd-pcrphase-sysinit.service
○   ├─systemd-pcrphase.service
○   ├─systemd-pstore.service
●   ├─systemd-random-seed.service
○   ├─systemd-repart.service
●   ├─systemd-sysctl.service
●   ├─systemd-sysusers.service
●   ├─systemd-tmpfiles-setup-dev.service
●   ├─systemd-tmpfiles-setup.service
●   ├─systemd-udev-trigger.service
●   ├─systemd-udevd.service
●   ├─systemd-update-utmp.service
●   ├─cryptsetup.target
●   ├─integritysetup.target
●   ├─local-fs.target
●   │ ├─-.mount
●   │ ├─boot-efi.mount
○   │ ├─systemd-fsck-root.service
●   │ └─systemd-remount-fs.service
●   ├─swap.target
●   │ └─dev-pve-swap.swap
●   └─veritysetup.target

There are obviously some dependencies that are not met, but I’m not sure if I need all of them?

hm, it is right in the documentation drbd-reactor/doc/promoter.md at master · LINBIT/drbd-reactor · GitHub

OCF agents are expected in /usr/lib/ocf/resource.d/. Please make sure to check for resource-agents packages provided by your distribution or use the packages provided by LINBIT (customers only).

:person_shrugging:

these are not packge dependencies, these are systemd dependencies. every service in the start = [] list depends on the one before it. like the linstor-controller.service depends on the var-lib-linstor.mount, which depends on ocf.rs@ and down the road. IIRC that current debian ships broken resource agents. You can replace the one script with the one from the upstream resource-agents. IIRC we reported that to the Debian people, might be that they did not fix it.

Also FWIW: I would not use a resource agent in this case at all. There are use cases where one wants it, but most cases it is better to just have the linstor DB and controller up on whatever node and configure multiple IPs (the ones of the potential controllers) in the linstor-proxmox plugin configuration.

Thank you @rck, I appreciate your support and time.

I probably just expected it to be a little more DAU friendly. :laughing:

This was my plan B, but I think I read somewhere that Proxmox might have issues with it (changing controller IP). I have already reconfigured and services are up and running.

Thank you again, also for your explanations.

just for people that might stumble upon that thread later: no. not true. configuring multiple IPs in the plugin is how 99% of our users do it, FLOSS users as well as customers. The code in the plugin just tries to connect to the configured IPs, the first that answers wins, reactor makes sure that there is only one at a time. simple as that, no magic.

3 Likes

One thing, the only downside without using a virtual IP address is you sometimes have to determine where the LINSTOR controller is running. For example, which IP address do you need to use to log in to the LINSTOR GUI?

If you never want to ask yourself that question, the virtual IP address approach is slightly more complex to configure, but might be a little more elegant to manage day-to-day in your environment.

This is true. But for me, it is not worth the complexity, to be honest.

1 Like