How exactly are tiebreakers set?

So I have a 2+1 PVE cluster, and I was trying to set up ha using linstor. I plodded through the guide and basically followed every step that I found relevant, from installing to ha controller. I had some trouble setting up the third node on my orangepi because it’s not a standard device but I managed. However, there’s no “Tiebreaker” showing up in any of the resources.

I tried manually “set-properties DrbdOptions/auto-add-quorum-tiebreaker true” on both resource groups and the resources (as I understand, proxmox plugins basically treat resource groups as resource definitions?), no luck.

I tried manually creating and including a diskless storagepool from orangepi, and I got a warning about including “diskless wrongfully configured for diskful” which made me think that it’s not the right way. I also tried not setting storagepools for the resource groups at all since I only have 2 ssds with the same name and I wasn’t sure about how the inclusion functions. And I did have “DisklessOnRemaining yes”. Also no luck.

I struggled very hard in trying to find the detailed documentation on “tiebreaker”, I found a lot of explanations and examples, which are great, but none explained how to actually set it, as if it just magically works. I believed it’s designed that way, and based on the discussions I saw it seems the magic does work for most of the users, it’s just that I’m the muggle among wizards.

Because I only have 2+1, I believe that without tiebreakers the ha controller and ha vms won’t work at all, so please please help me out here. I’ll share some settings I have that I think are relevant, please advise me or point me to a more detailed doc on how to set it.

linstor node list
╭─────────────────────────────────────────────────────────╮
┊ Node      ┊ NodeType  ┊ Addresses              ┊ State  ┊
╞═════════════════════════════════════════════════════════╡
┊ orangepi  ┊ SATELLITE ┊ 10.0.0.3:3366 (PLAIN)  ┊ Online ┊
┊ pve       ┊ SATELLITE ┊ 10.0.0.5:3366 (PLAIN)  ┊ Online ┊
┊ serverpve ┊ SATELLITE ┊ 10.0.0.10:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────╯
linstor node info
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node      ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊ SPDK ┊ EXOS ┊ Remote SPDK ┊ Storage Spaces ┊ Storage Spaces/Thin ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ orangepi  ┊ +        ┊ -   ┊ +       ┊ -        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ pve       ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ serverpve ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────────────────────────────────────────────────╮
┊ Node      ┊ DRBD ┊ LUKS ┊ NVMe ┊ Cache ┊ BCache ┊ WriteCache ┊ Storage ┊
╞════════════════════════════════════════════════════════════════════════╡
┊ orangepi  ┊ +    ┊ -    ┊ -    ┊ +     ┊ -      ┊ +          ┊ +       ┊
┊ pve       ┊ +    ┊ +    ┊ +    ┊ +     ┊ -      ┊ +          ┊ +       ┊
┊ serverpve ┊ +    ┊ +    ┊ +    ┊ +     ┊ -      ┊ +          ┊ +       ┊
╰────────────────────────────────────────────────────────────────────────╯
linstor storage-pool list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node      ┊ Driver   ┊ PoolName                            ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName                     ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ orangepi  ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ orangepi;DfltDisklessStorPool  ┊
┊ DfltDisklessStorPool ┊ pve       ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ pve;DfltDisklessStorPool       ┊
┊ DfltDisklessStorPool ┊ serverpve ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ serverpve;DfltDisklessStorPool ┊
┊ orangepi_diskless    ┊ orangepi  ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ orangepi;orangepi_diskless     ┊
┊ ssd_thin_pools       ┊ pve       ┊ LVM_THIN ┊ linstor-pve/linstor-pve             ┊   390.01 GiB ┊    438.07 GiB ┊ True         ┊ Ok    ┊ pve;ssd_thin_pools             ┊
┊ ssd_thin_pools       ┊ serverpve ┊ LVM_THIN ┊ linstor-serverpve/linstor-serverpve ┊   390.01 GiB ┊    438.07 GiB ┊ True         ┊ Ok    ┊ serverpve;ssd_thin_pools       ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
linstor resource-group list
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceGroup  ┊ SelectFilter              ┊ VlmNrs ┊ Description ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp     ┊ PlaceCount: 2             ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ linstor-db-grp ┊ PlaceCount: 2             ┊ 0      ┊             ┊
┊                ┊ DisklessOnRemaining: True ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ octavius       ┊ PlaceCount: 2             ┊ 0      ┊             ┊
┊                ┊ DisklessOnRemaining: True ┊        ┊             ┊
╰───────────────────────────────────────────────────────────────────╯
linstor resource-group list-properties linstor-db-grp
╭──────────────────────────────────────────────────────────────────────╮
┊ Key                                                ┊ Value           ┊
╞══════════════════════════════════════════════════════════════════════╡
┊ Aux/importance                                     ┊ high            ┊
┊ DrbdOptions/Resource/auto-promote                  ┊ no              ┊
┊ DrbdOptions/Resource/on-no-data-accessible         ┊ io-error        ┊
┊ DrbdOptions/Resource/on-no-quorum                  ┊ io-error        ┊
┊ DrbdOptions/Resource/on-suspended-primary-outdated ┊ force-secondary ┊
┊ DrbdOptions/Resource/quorum                        ┊ majority        ┊
┊ DrbdOptions/auto-add-quorum-tiebreaker             ┊ true            ┊
┊ Internal/Drbd/QuorumSetBy                          ┊ user            ┊
╰──────────────────────────────────────────────────────────────────────╯
linstor resource-group list-properties octavius
╭─────────────────────────────────────────────────────────╮
┊ Key                                        ┊ Value      ┊
╞═════════════════════════════════════════════════════════╡
┊ DrbdOptions/Resource/auto-promote          ┊ yes        ┊
┊ DrbdOptions/Resource/on-no-data-accessible ┊ suspend-io ┊
┊ DrbdOptions/Resource/on-no-quorum          ┊ suspend-io ┊
┊ DrbdOptions/Resource/quorum                ┊ majority   ┊
┊ DrbdOptions/auto-add-quorum-tiebreaker     ┊ true       ┊
┊ Internal/Drbd/QuorumSetBy                  ┊ user       ┊
╰─────────────────────────────────────────────────────────╯
linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node      ┊ Layers       ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db   ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:40 ┊
┊ linstor_db   ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:45 ┊
┊ pm-058983c2  ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-09 13:20:45 ┊
┊ pm-058983c2  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-09 13:20:46 ┊
┊ pm-72434027  ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-09 13:41:21 ┊
┊ pm-72434027  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-09 13:41:22 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────╯
linstor c lp | grep quorum
| DrbdOptions/Resource/on-no-quorum         | io-error                                               |
| DrbdOptions/auto-add-quorum-tiebreaker    | True                                                   |

Please let me know if there are anything else that I should share.

Also I’m not sure whether this is relevant or not, but since my device is a “OrangePi zero3”, which is an arm64 device that’s not very common, and runs a bullseye, I had to use some packages from booksworm repo to make the versions match. I got them here and there from under this source “Index of /public”.

At first it worked, but it showed “-” for drbd, which I understood to be bad if I want it to be tiebreaker, so I had to go under “amd64” a little bit to find the dkms package (Architecture All). It wasn’t powerful enough to compile, so I had to learn to cross compile for the first time and got the drbd.ko/handshake.ko/drbd_transport_tcp.ko. It seemed to be working, but I’m not so sure.

Please kindly help or advise or point me somewhere, thank you!

So sadly right after posting the question and before I got any replies, I already suffered from lack of ha (or at least that’s how I understood it). So the story was such:

Between 21:00-21:30 CST I was preparing info for the post, and was sure that I had all the nodes and resources in OP. After about half an hour I wanted to check the properties again, and found out that I could only see the linstor_db resource, the other two (which were qemu disks) was lost. I checked resources with all the commands I know and it was as if they never existed. However the qemus are still running (and I remember reading about how data won’t be lost even if controller’s down in the docs ---- shoutout to the team!). So naturally I panicked and turned to ai to ask for ways to recover. It diagnosed the situation and decided that somehow the resource-definition was lost, and I could re-create the definitions and the “disks” can be recovered. I tried on one qemu and made it go down, so even worse.

Luckily I had backups so there’s no real harm, but I had to find out why, so I turned to another ai, and finally it seemed to be a hardware issue, where controller node went down for a bit for NIC issue and resumed right after, meanwhile having io errors and possibly lost the data for some of the resource-definitions.

If I had a working tiebreaker, could it have been better? I don’t know, I would say I would’ve had better chances. If you have any thoughts on this and how to avoid this in the future, please please share.

So I extracted some relevant logs, but I don’t know if I can upload the file. I’ll append them as code at the end, for anyone who’s interested. If you see anything that I missed or misunderstood, please share. Any advise, please share. Anything for the original tiebreaker question I had, please also share. Thanks.

Here’s a short summary of this incident from ai:

  1. Trigger: Physical NIC failed on the active LINSTOR controller node.
  2. Impact: Caused PVE cluster/DRBD network disruption. The DRBD resource for LINSTOR’s DB (linstor_db) lost quorum.
  3. Critical Failure: As HA mechanisms shut down the LINSTOR controller and unmounted its DB filesystem (on DRBD), severe filesystem I/O errors occurred due to the unstable underlying DRBD device.
  4. Outcome: LINSTOR controller’s database got corrupted during these write failures, leading to lost resource definitions upon service restart.
    Root Cause: Hardware (NIC) failure led to critical I/O errors on the DRBD-backed LINSTOR database during shutdown, corrupting the metadata.

Guess the take away today is that always have backup, and more nodes the better.

Relevant logs:

May 09 21:32:01 pve Controller[420583]: 2025-05-09 21:32:01.546 [grizzly-http-server-5] INFO  LINSTOR/Controller/4c945d SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:32:59 pve Controller[420583]: 2025-05-09 21:32:59.422 [grizzly-http-server-4] INFO  LINSTOR/Controller/bf3987 SYSTEM - REST/API RestClient(10.0.0.10; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:33:01 pve Controller[420583]: 2025-05-09 21:33:01.374 [grizzly-http-server-1] INFO  LINSTOR/Controller/4a4a1e SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:34:01 pve Controller[420583]: 2025-05-09 21:34:01.180 [grizzly-http-server-0] INFO  LINSTOR/Controller/662aaa SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:34:05 pve Controller[420583]: 2025-05-09 21:34:05.977 [grizzly-http-server-5] INFO  LINSTOR/Controller/fd80d0 SYSTEM - REST/API RestClient(10.0.0.10; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:34:12 pve Controller[420583]: 2025-05-09 21:34:12.924 [grizzly-http-server-7] INFO  LINSTOR/Controller/308bd0 SYSTEM - REST/API RestClient(127.0.0.1; 'PythonLinstor/1.25.3 (API1.0.4): Client 1.25.4')/LstVlm
May 09 21:34:22 pve Controller[420583]: 2025-05-09 21:34:22.141 [grizzly-http-server-0] INFO  LINSTOR/Controller/662aaa SYSTEM - REST/API RestClient(127.0.0.1; 'PythonLinstor/1.25.3 (API1.0.4): Client 1.25.4')/LstRscDfn
May 09 21:35:01 pve Controller[420583]: 2025-05-09 21:35:01.930 [grizzly-http-server-5] INFO  LINSTOR/Controller/48d041 SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:35:12 pve Controller[420583]: 2025-05-09 21:35:12.849 [grizzly-http-server-4] INFO  LINSTOR/Controller/c03883 SYSTEM - REST/API RestClient(10.0.0.10; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.288 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutdown in progress
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.288 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'GrizzlyHttpServer' of type Grizzly-HTTP-Server
May 09 21:35:18 pve systemd[1]: Stopping linstor-controller.service - drbd-reactor controlled linstor-controller...
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.305 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'GrizzlyHttpServer' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.307 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'EbsStatusPoll' of type EbsStatusPoll
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.308 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'EbsStatusPoll' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.308 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'ScheduleBackupService' of type ScheduleBackupService
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.308 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'ScheduleBackupService' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.308 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'SpaceTrackingService' of type SpaceTrackingService
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.308 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'SpaceTrackingService' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.310 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'TaskScheduleService' of type TaskScheduleService
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.310 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'TaskScheduleService' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.310 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'DatabaseService' of type DatabaseService
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.311 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'DatabaseService' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.311 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutting down service instance 'TimerEventService' of type TimerEventService
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.311 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Waiting for service instance 'TimerEventService' to complete shutdown
May 09 21:35:18 pve Controller[420583]: 2025-05-09 21:35:18.311 [Thread-2] INFO  LINSTOR/Controller/fb7eae SYSTEM - Shutdown complete
May 09 21:35:18 pve systemd[1]: linstor-controller.service: Deactivated successfully.
May 09 21:35:18 pve systemd[1]: Stopped linstor-controller.service - drbd-reactor controlled linstor-controller.
May 09 21:35:18 pve systemd[1]: linstor-controller.service: Consumed 3min 50.918s CPU time.
May 09 21:35:18 pve systemd[1]: Starting linstor-controller.service - drbd-reactor controlled linstor-controller...
May 09 21:35:19 pve Controller[584132]: LINSTOR, Module Controller
May 09 21:35:19 pve Controller[584132]: Version:            1.31.0 (a187af5c85a96bb27df87a5eab0bcf9dd6de6a34)
May 09 21:35:19 pve Controller[584132]: Build time:         2025-04-08T07:51:57+00:00 Log v2
May 09 21:35:19 pve Controller[584132]: Java Version:       17
May 09 21:35:19 pve Controller[584132]: Java VM:            Debian, Version 17.0.15+6-Debian-1deb12u1
May 09 21:35:19 pve Controller[584132]: Operating system:   Linux, Version 6.8.12-9-pve
May 09 21:35:19 pve Controller[584132]: Environment:        amd64, 4 processors, 8192 MiB memory reserved for allocations
May 09 21:35:19 pve Controller[584132]: System components initialization in progress
May 09 21:35:19 pve Controller[584132]: Loading configuration file "/etc/linstor/linstor.toml"
May 09 21:35:20 pve Controller[584132]: 21:35:19,961 |-INFO in ch.qos.logback.classic.LoggerContext[default] - This is logback-classic version 1.3.8
May 09 21:35:20 pve Controller[584132]: 21:35:20,033 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
May 09 21:35:20 pve Controller[584132]: 21:35:20,034 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/usr/share/linstor-server/lib/conf/logback.xml]
May 09 21:35:20 pve Controller[584132]: 21:35:20,304 |-INFO in ch.qos.logback.core.model.processor.AppenderModelHandler - Processing appender named [STDOUT]
May 09 21:35:20 pve Controller[584132]: 21:35:20,304 |-INFO in ch.qos.logback.core.model.processor.AppenderModelHandler - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
May 09 21:35:20 pve Controller[584132]: 21:35:20,321 |-INFO in ch.qos.logback.core.model.processor.ImplicitModelHandler - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
May 09 21:35:20 pve Controller[584132]: 21:35:20,381 |-WARN in ch.qos.logback.core.model.processor.AppenderModelHandler - Appender named [FILE] not referenced. Skipping further processing.
May 09 21:35:20 pve Controller[584132]: 21:35:20,382 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [okhttp3] to OFF
May 09 21:35:20 pve Controller[584132]: 21:35:20,382 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [org.apache.http] to INFO
May 09 21:35:20 pve Controller[584132]: 21:35:20,382 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [cron] to INFO
May 09 21:35:20 pve Controller[584132]: 21:35:20,382 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [LINSTOR/Controller] to INFO
May 09 21:35:20 pve Controller[584132]: 21:35:20,382 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting additivity of logger [LINSTOR/Controller] to false
May 09 21:35:20 pve Controller[584132]: 21:35:20,382 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[LINSTOR/Controller]
May 09 21:35:20 pve Controller[584132]: 21:35:20,383 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [LINSTOR/Satellite] to INFO
May 09 21:35:20 pve Controller[584132]: 21:35:20,383 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting additivity of logger [LINSTOR/Satellite] to false
May 09 21:35:20 pve Controller[584132]: 21:35:20,383 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[LINSTOR/Satellite]
May 09 21:35:20 pve Controller[584132]: 21:35:20,383 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [LINSTOR/TESTS] to OFF
May 09 21:35:20 pve Controller[584132]: 21:35:20,383 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting additivity of logger [LINSTOR/TESTS] to false
May 09 21:35:20 pve Controller[584132]: 21:35:20,383 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[LINSTOR/TESTS]
May 09 21:35:20 pve Controller[584132]: 21:35:20,384 |-INFO in ch.qos.logback.classic.model.processor.RootLoggerModelHandler - Setting level of ROOT logger to INFO
May 09 21:35:20 pve Controller[584132]: 21:35:20,384 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[ROOT]
May 09 21:35:20 pve Controller[584132]: 21:35:20,384 |-INFO in ch.qos.logback.core.model.processor.DefaultProcessor@10163d6 - End of configuration.
May 09 21:35:20 pve Controller[584132]: 21:35:20,386 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@2dde1bff - Registering current configuration as safe fallback point
May 09 21:35:21 pve Controller[584132]: 2025-05-09 21:35:21.157 [main] INFO  LINSTOR/Controller/ffffff SYSTEM - ErrorReporter DB version 1 found.
May 09 21:35:21 pve Controller[584132]: 2025-05-09 21:35:21.161 [main] INFO  LINSTOR/Controller/ffffff SYSTEM - Log directory set to: '/var/log/linstor-controller'
May 09 21:35:21 pve Controller[584132]: 2025-05-09 21:35:21.254 [main] INFO  LINSTOR/Controller/ffffff SYSTEM - Database type is SQL
May 09 21:35:21 pve Controller[584132]: 2025-05-09 21:35:21.255 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Loading API classes started.
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.331 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - API classes loading finished: 1076ms
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.331 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dependency injection started.
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.372 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.373 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.373 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.406 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.407 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
May 09 21:35:22 pve Controller[584132]: 2025-05-09 21:35:22.408 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" was successful
May 09 21:35:25 pve Controller[584132]: 2025-05-09 21:35:25.139 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dependency injection finished: 2808ms
May 09 21:35:25 pve Controller[584132]: 2025-05-09 21:35:25.140 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Cryptography provider: Using default cryptography module
May 09 21:35:25 pve Controller[584132]: 2025-05-09 21:35:25.851 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Initializing authentication subsystem
May 09 21:35:26 pve Controller[584132]: 2025-05-09 21:35:26.496 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - SpaceTrackingService: Instance added as a system service
May 09 21:35:26 pve Controller[584132]: 2025-05-09 21:35:26.498 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
May 09 21:35:26 pve Controller[584132]: 2025-05-09 21:35:26.498 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Initializing the database connection pool
May 09 21:35:26 pve Controller[584132]: 2025-05-09 21:35:26.499 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - SQL database connection URL is "jdbc:h2:/var/lib/linstor/linstordb"
May 09 21:35:26 pve Controller[584132]: 2025-05-09 21:35:26.668 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - SQL database is H2
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.059 [Main] INFO  org.flywaydb.core.internal.license.VersionPrinter/ffffff Flyway Community Edition 7.15.0 by Redgate
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.061 [Main] INFO  org.flywaydb.core.internal.database.base.BaseDatabaseType/ffffff Database: jdbc:h2:/var/lib/linstor/linstordb (H2 1.4)
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.224 [Main] INFO  org.flywaydb.core.internal.command.DbValidate/ffffff Successfully validated 84 migrations (execution time 00:00.123s)
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.296 [Main] INFO  org.flywaydb.core.internal.command.DbMigrate/ffffff Current version of schema "LINSTOR": 2025.03.18.08.00
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.297 [Main] WARN  org.flywaydb.core.internal.command.DbMigrate/ffffff outOfOrder mode is active. Migration of schema "LINSTOR" may not be reproducible.
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.314 [Main] INFO  org.flywaydb.core.internal.command.DbMigrate/ffffff Schema "LINSTOR" is up to date. No migration necessary.
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.321 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'DatabaseService' of type DatabaseService
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.327 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Security objects load from database is in progress
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.369 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Security objects load from database completed
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.369 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Core objects load from database is in progress
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.588 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Core objects load from database completed
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.590 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'TaskScheduleService' of type TaskScheduleService
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.591 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Initializing network communications services
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.592 [Main] WARN  LINSTOR/Controller/ffffff SYSTEM - The SSL network communication service 'DebugSslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.608 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Created network communication service 'PlainConnector'
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.608 [Main] WARN  LINSTOR/Controller/ffffff SYSTEM - The SSL network communication service 'SslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.609 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Created network communication service 'SslConnector'
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.610 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Reconnecting to previously known nodes
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.613 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Establishing connection to node 'orangepi' via /10.0.0.3:3366 ...
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.648 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Establishing connection to node 'pve' via /10.0.0.5:3366 ...
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.651 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Establishing connection to node 'serverpve' via /10.0.0.10:3366 ...
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.652 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Reconnect requests sent
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.654 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'SpaceTrackingService' of type SpaceTrackingService
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.655 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'ScheduleBackupService' of type ScheduleBackupService
May 09 21:35:27 pve Controller[584132]: 2025-05-09 21:35:27.656 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'EbsStatusPoll' of type EbsStatusPoll
May 09 21:35:28 pve Controller[584132]: May 09, 2025 9:35:28 PM org.glassfish.jersey.server.wadl.WadlFeature configure
May 09 21:35:28 pve Controller[584132]: WARNING: JAX-B API not found . WADL feature is disabled.
May 09 21:35:29 pve Controller[584132]: May 09, 2025 9:35:29 PM org.glassfish.grizzly.http.server.NetworkListener start
May 09 21:35:29 pve Controller[584132]: INFO: Started listener bound to [[::]:3370]
May 09 21:35:29 pve Controller[584132]: May 09, 2025 9:35:29 PM org.glassfish.grizzly.http.server.HttpServer start
May 09 21:35:29 pve Controller[584132]: INFO: [HttpServer] Started.
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.753 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Controller initialized
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.755 [SslConnector] INFO  LINSTOR/Controller/d77355 SYSTEM - Sending authentication to satellite 'orangepi'
May 09 21:35:29 pve systemd[1]: Started linstor-controller.service - drbd-reactor controlled linstor-controller.
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.926 [TaskScheduleService] INFO  LINSTOR/Controller/05949b SYSTEM - Establishing connection to node 'serverpve' via /10.0.0.10:3366 ...
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.930 [TaskScheduleService] INFO  LINSTOR/Controller/05949b SYSTEM - Establishing connection to node 'pve' via /10.0.0.5:3366 ...
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.934 [TaskScheduleService] INFO  LINSTOR/Controller/5716e6 SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.938 [TaskScheduleService] INFO  LINSTOR/Controller/5716e6 SYSTEM - LogArchive: No logs to archive.
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.938 [TaskScheduleService] INFO  LINSTOR/Controller/a232f6 SYSTEM - BalanceResourcesTask/START
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.947 [SslConnector] INFO  LINSTOR/Controller/843f55 SYSTEM - Sending authentication to satellite 'serverpve'
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.949 [SslConnector] INFO  LINSTOR/Controller/ SYSTEM - Sending authentication to satellite 'pve'
May 09 21:35:29 pve Controller[584132]: 2025-05-09 21:35:29.958 [TaskScheduleService] INFO  LINSTOR/Controller/a232f6 SYSTEM - BalanceResourcesTask/END: Adjusted: 0 - Removed: 0
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.214 [MainWorkerPool-4] INFO  LINSTOR/Controller/000001 SYSTEM - Changing connection state of node 'pve' from OFFLINE -> CONNECTED
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.216 [MainWorkerPool-4] INFO  LINSTOR/Controller/000001 SYSTEM - Satellite 'pve' authenticated
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.221 [MainWorkerPool-1] INFO  LINSTOR/Controller/000001 SYSTEM - Changing connection state of node 'serverpve' from OFFLINE -> CONNECTED
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.223 [MainWorkerPool-1] INFO  LINSTOR/Controller/000001 SYSTEM - Satellite 'serverpve' authenticated
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.227 [MainWorkerPool-4] INFO  LINSTOR/Controller/000001 SYSTEM - Sending full sync to Node: 'pve'.
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.227 [MainWorkerPool-1] INFO  LINSTOR/Controller/000001 SYSTEM - Sending full sync to Node: 'serverpve'.
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.546 [MainWorkerPool-3] INFO  LINSTOR/Controller/000001 SYSTEM - Changing connection state of node 'orangepi' from OFFLINE -> CONNECTED
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.551 [MainWorkerPool-3] INFO  LINSTOR/Controller/000001 SYSTEM - Satellite 'orangepi' authenticated
May 09 21:35:30 pve Controller[584132]: 2025-05-09 21:35:30.554 [MainWorkerPool-3] INFO  LINSTOR/Controller/000001 SYSTEM - Sending full sync to Node: 'orangepi'.
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.071 [MainWorkerPool-1] WARN  LINSTOR/Controller/fb40c7 SYSTEM - Event update for unknown resource pm-058983c2 on node pve
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.174 [MainWorkerPool-3] WARN  LINSTOR/Controller/ee0980 SYSTEM - Event update for unknown resource pm-72434027 on node pve
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.182 [MainWorkerPool-4] INFO  LINSTOR/Controller/000002 SYSTEM - Changing connection state of node 'orangepi' from CONNECTED -> ONLINE
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.205 [MainWorkerPool-4] INFO  LINSTOR/Controller/000004 SYSTEM - Changing connection state of node 'pve' from CONNECTED -> ONLINE
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.232 [MainWorkerPool-2] WARN  LINSTOR/Controller/1642b0 SYSTEM - Event update for unknown resource pm-058983c2 on node serverpve
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.250 [MainWorkerPool-2] WARN  LINSTOR/Controller/64f744 SYSTEM - Event update for unknown resource pm-72434027 on node serverpve
May 09 21:35:31 pve Controller[584132]: 2025-05-09 21:35:31.337 [MainWorkerPool-3] INFO  LINSTOR/Controller/000004 SYSTEM - Changing connection state of node 'serverpve' from CONNECTED -> ONLINE
May 09 21:36:01 pve Controller[584132]: 2025-05-09 21:36:01.523 [grizzly-http-server-3] INFO  LINSTOR/Controller/28f901 SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:36:05 pve Controller[584132]: 2025-05-09 21:36:05.199 [grizzly-http-server-6] INFO  LINSTOR/Controller/5c1465 SYSTEM - REST/API RestClient(127.0.0.1; 'PythonLinstor/1.25.3 (API1.0.4): Client 1.25.4')/LstVlm
May 09 21:36:15 pve Controller[584132]: 2025-05-09 21:36:15.698 [grizzly-http-server-0] INFO  LINSTOR/Controller/ee088f SYSTEM - REST/API RestClient(10.0.0.10; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:37:11 pve Controller[584132]: 2025-05-09 21:37:11.869 [grizzly-http-server-2] INFO  LINSTOR/Controller/e5b5bb SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:37:23 pve Controller[584132]: 2025-05-09 21:37:23.084 [grizzly-http-server-4] INFO  LINSTOR/Controller/5f774b SYSTEM - REST/API RestClient(10.0.0.10; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:37:27 pve Controller[584132]: 2025-05-09 21:37:27.702 [MainWorkerPool-2] INFO  LINSTOR/Controller/000003 SYSTEM - Satellite orangepi reports a capacity of 0 kiB, allocated space 0 kiB, no errors
May 09 21:37:27 pve Controller[584132]: 2025-05-09 21:37:27.865 [MainWorkerPool-2] INFO  LINSTOR/Controller/000008 SYSTEM - Satellite pve reports a capacity of 459345920 kiB, allocated space 50390249 kiB, no errors
May 09 21:37:28 pve Controller[584132]: 2025-05-09 21:37:28.046 [MainWorkerPool-2] INFO  LINSTOR/Controller/000007 SYSTEM - Satellite serverpve reports a capacity of 459345920 kiB, allocated space 50390249 kiB, no errors
May 09 21:37:28 pve Controller[584132]: 2025-05-09 21:37:28.048 [SpaceTrackingService] INFO  LINSTOR/Controller/ SYSTEM - SpaceTracking: Aggregate capacity is 918691840 kiB, allocated space is 100780498 kiB, usable space is 204800 kiB
May 09 21:38:21 pve Controller[584132]: 2025-05-09 21:38:21.953 [grizzly-http-server-6] INFO  LINSTOR/Controller/b130db SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:38:36 pve Controller[584132]: 2025-05-09 21:38:36.453 [grizzly-http-server-0] INFO  LINSTOR/Controller/4601d5 SYSTEM - REST/API RestClient(10.0.0.10; 'linstor-proxmox/8.1.1')/QryAllSizeInfo
May 09 21:39:32 pve Controller[584132]: 2025-05-09 21:39:32.155 [grizzly-http-server-2] INFO  LINSTOR/Controller/57449a SYSTEM - REST/API RestClient(10.0.0.5; 'linstor-proxmox/8.1.1')/QryAllSizeInfo

Ooof, that’s quite the adventure. :grin: Everything seems to show healthy in the output you provided though, I would expect this to be working correctly at this point.

I believe you have everything configured correctly. The auto-add-quorum-tiebreaker should be what is needed. One thing to realize though is that not all options and changes are applied retroactively. If you changed these options after the creation of your three resources this will only apply to newly created resources. Have you tried creating a new resource since setting this option to true?

You only seem to have this options applied to the octavious and linstor-db-grp resource groups. I would verify that Proxmox is configured to use these in the /etc/pve/storage.cfg file.

As for your second post with the outage. I can only speculate as to what exactly happened, and without knowing exactly what happened, I really cannot say. The logs you provided are only for the LINSTOR Controller, but quorum is actually handled via DRBD. Also, it’s a cluster, so you’ll likely have to dig through the logs of the peer nodes as well to get the full story. Regardless, I would get the tiebreakers and cluster working as you want first, then worry about further unexpected behaviors.

Thanks so much for replying! I believe that I did try a few things suspecting the property doesn’t apply retroactively, but it was among many failures so I just forgot about them.

So the drbd entry in my “/etc/pve/storage.cfg” looks like this:

drbd: octavius
	resourcegroup octavius
	content images,rootdir
	controller 10.0.0.5,10.0.0.10
	# exactsize yes

I’m not 100% sure whether the resources in OP were created before or after I set auto-tiebreaker on the resource group, but I did try just now to create another disk on “octavius” via proxmox, to be precise I created a new vm and set the disk on “octavius”, and no tiebreaker seen:

linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node      ┊ Layers       ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db   ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:40 ┊
┊ linstor_db   ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:45 ┊
┊ pm-ca77353a  ┊ pve       ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:27 ┊
┊ pm-ca77353a  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:30 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────╯
linstor resource-group list-properties octavius
╭─────────────────────────────────────────────────────────╮
┊ Key                                        ┊ Value      ┊
╞═════════════════════════════════════════════════════════╡
┊ DrbdOptions/Resource/auto-promote          ┊ no         ┊
┊ DrbdOptions/Resource/on-no-data-accessible ┊ suspend-io ┊
┊ DrbdOptions/Resource/on-no-quorum          ┊ suspend-io ┊
┊ DrbdOptions/Resource/quorum                ┊ majority   ┊
┊ DrbdOptions/auto-add-quorum-tiebreaker     ┊ true       ┊
┊ Internal/Drbd/QuorumSetBy                  ┊ user       ┊
╰─────────────────────────────────────────────────────────╯

And somehow I failed to start the qemu:

blockdev: cannot open /dev/drbd/by-res/pm-ca77353a/0: Wrong medium type
kvm: -drive file=/dev/drbd/by-res/pm-ca77353a/0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open '/dev/drbd/by-res/pm-ca77353a/0': Wrong medium type
TASK ERROR: start failed: QEMU exited with code 1

I got “wrong medium type” yesterday when trying to move qemu disks to linstor, and solved that by setting auto-promote yes, but I guess that property was also lost in the incident somehow. I went on to set it back to yes, and the qemu started, without having to re-create or anything (so is it applying retroactively here? what’s the mechanism?)

Anyway, if I just spawn I can get a “diskless”, but not a “tiebreaker” (I am aware they are about the same functionally, but “tiebreaker” looks prettier):

linstor resource-group spawn octavius testres 1GiB

SUCCESS:
    Volume definition with number '0' successfully  created in resource definition 'testres'.
SUCCESS:
Description:
    New resource definition 'testres' created.
Details:
    Resource definition 'testres' UUID is: 24c31208-3a09-4041-a78f-6da99bf07fa4
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    Resource 'testres' successfully autoplaced on 2 nodes
Details:
    Used nodes (storage pool name): 'pve (ssd_thin_pools)', 'serverpve (ssd_thin_pools)'
INFO:
    Updated testres DRBD auto verify algorithm to 'crct10dif'
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' updated from undefined to 'majority' by User
SUCCESS:
    (orangepi) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    Created resource 'testres' on 'orangepi'
SUCCESS:
    (pve) Volume number 0 of resource 'testres' [LVM-Thin] created
SUCCESS:
    (pve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    Created resource 'testres' on 'pve'
SUCCESS:
    (serverpve) Volume number 0 of resource 'testres' [LVM-Thin] created
SUCCESS:
    (serverpve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    Created resource 'testres' on 'serverpve'
SUCCESS:
Description:
    Resource 'testres' on 'pve' ready
Details:
    Resource group: octavius
SUCCESS:
Description:
    Resource 'testres' on 'serverpve' ready
Details:
    Resource group: octavius
SUCCESS:
Description:
    Resource 'testres' on 'orangepi' ready
Details:
    Resource group: octavius
SUCCESS:
    (serverpve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    (pve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    (orangepi) Resource 'testres' [DRBD] adjusted.
linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node      ┊ Layers       ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db   ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:40 ┊
┊ linstor_db   ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:45 ┊
┊ pm-ca77353a  ┊ pve       ┊ DRBD,STORAGE ┊ InUse ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:27 ┊
┊ pm-ca77353a  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:30 ┊
┊ testres      ┊ orangepi  ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ Diskless ┊ 2025-05-10 07:28:41 ┊
┊ testres      ┊ pve       ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:28:41 ┊
┊ testres      ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:28:44 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────╯

So I guess my question being, why is the linstor-proxmox plugin not using the auto-tiebreaker property of “octavius”, and how do I get to have a auto-tiebreaker on resources that’s already created, namingly linstor_db here for controller high availability, because I might add more nodes in the future and I don’t want to have to manually add/del diskless every time.

Ah, I think the issue is right here :backhand_index_pointing_up: Specifically the DisklessOnRemaining: True bit. If you have it configured to put a diskless on all nodes which didn’t get a diskful replica then you would have no need for a tiebreaker as the diskless nodes will fill that role already.

To add a tiebreaker to the already created resources what you can do is create a diskless resource on the remaining node, then delete it while instructing LINSTOR to keep a tiebreaker around. It’s a bit of a hack as there is no way to manually create a resource labeled as a tiebreaker. For example:

linstor resource create orangepi linstor_db --drbd-diskless
linstor resource delete orangepi linstor_db --keep-tiebreaker

There should be no need to do this every time you add a or remove a node.

Concerning the incident, I got more logs here. I want to show the command I used to get them, but I had to extract the relevant time period for the satellite logs otherwise it would be too long. Other logs are not pruned, some only shows one second of log in 1.5 hour period, and that’s all I got.

On “serverpve”, which was acting satellite at that time:

sudo journalctl -k --since "2025-05-09 21:00:00" --until "2025-05-09 22:00:00" --no-pager | grep -i drbd
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: PingAck did not arrive in time.
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: disk( UpToDate -> Consistent ) quorum( yes -> no )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: Enabling local AL-updates
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing cluster-wide state change 99509206: 1->all empty
May 09 21:35:18 serverpve kernel: drbd linstor_db: Committing cluster-wide state change 99509206 (0ms)
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: disk( Consistent -> UpToDate ) [lost-peer]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Terminating sender thread
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Starting sender thread (peer-node-id 0)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Connection closed
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: helper command: /sbin/drbdadm disconnected
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: helper command: /sbin/drbdadm disconnected exit code 0
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( NetworkFailure -> Unconnected ) [disconnected]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Restarting receiver thread
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Unconnected -> Connecting ) [connecting]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Handshake to peer 0 successful: Agreed network protocol version 122
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Peer authenticated using 20 bytes HMAC
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing remote state change 1999569790: 0->1 role( Secondary ) conn( Connected )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: drbd_sync_handshake:
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: self D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: peer D3CCD55D4FDEC14A:0000000000000000:E85792A90B23D21E:0000000000000000 bits:0 flags:1520
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: uuid_compare()=no-sync by rule=reconnected
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Committing remote state change 1999569790 (primary_nodes=0)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: quorum( no -> yes ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: cleared bm UUID and bitmap D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing remote state change 2496847919: 0->all role( Primary )
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Committing remote state change 2496847919 (primary_nodes=1)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: peer( Secondary -> Primary ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: Disabling local AL-updates (optimization)

sudo journalctl -k --since "2025-05-09 21:00:00" --until "2025-05-09 22:00:00" --no-pager | grep -i drbd
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: PingAck did not arrive in time.
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: disk( UpToDate -> Consistent ) quorum( yes -> no )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: Enabling local AL-updates
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing cluster-wide state change 99509206: 1->all empty
May 09 21:35:18 serverpve kernel: drbd linstor_db: Committing cluster-wide state change 99509206 (0ms)
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: disk( Consistent -> UpToDate ) [lost-peer]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Terminating sender thread
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Starting sender thread (peer-node-id 0)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Connection closed
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: helper command: /sbin/drbdadm disconnected
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: helper command: /sbin/drbdadm disconnected exit code 0
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( NetworkFailure -> Unconnected ) [disconnected]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Restarting receiver thread
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Unconnected -> Connecting ) [connecting]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Handshake to peer 0 successful: Agreed network protocol version 122
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Peer authenticated using 20 bytes HMAC
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing remote state change 1999569790: 0->1 role( Secondary ) conn( Connected )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: drbd_sync_handshake:
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: self D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: peer D3CCD55D4FDEC14A:0000000000000000:E85792A90B23D21E:0000000000000000 bits:0 flags:1520
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: uuid_compare()=no-sync by rule=reconnected
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Committing remote state change 1999569790 (primary_nodes=0)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: quorum( no -> yes ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: cleared bm UUID and bitmap D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing remote state change 2496847919: 0->all role( Primary )
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Committing remote state change 2496847919 (primary_nodes=1)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: peer( Secondary -> Primary ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: Disabling local AL-updates (optimization)

sudo journalctl -u drbd-reactor.service --since "2025-05-09 21:00:00" --until "2025-05-09 22:30:00" --no-pager
May 09 21:35:18 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] run: resource 'linstor_db' lost quorum
May 09 21:35:18 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
May 09 21:35:18 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
May 09 21:35:18 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target
May 09 21:35:18 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] run: resource 'linstor_db' may promote after 0ms
May 09 21:35:18 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] systemd_start: systemctl start drbd-services@linstor_db.target
May 09 21:35:19 serverpve systemctl[293043]: A dependency job for drbd-services@linstor_db.target failed. See 'journalctl -xe' for details.
May 09 21:35:19 serverpve drbd-reactor[3183]: WARN [drbd_reactor::plugin::promoter] Starting 'linstor_db' failed: Return code not status success
May 09 21:35:19 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
May 09 21:35:19 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
May 09 21:35:19 serverpve drbd-reactor[3183]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target

on “pve”, the controller logs were posted before so I won’t post again, here’s satellite and drbd logs:

sudo journalctl -k --since "2025-05-09 21:00:00" --until "2025-05-09 22:00:00" --no-pager | grep -i drbd
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: PingAck did not arrive in time.
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: disk( UpToDate -> Consistent ) quorum( yes -> no )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: Enabling local AL-updates
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing cluster-wide state change 99509206: 1->all empty
May 09 21:35:18 serverpve kernel: drbd linstor_db: Committing cluster-wide state change 99509206 (0ms)
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: disk( Consistent -> UpToDate ) [lost-peer]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Terminating sender thread
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Starting sender thread (peer-node-id 0)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Connection closed
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: helper command: /sbin/drbdadm disconnected
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: helper command: /sbin/drbdadm disconnected exit code 0
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( NetworkFailure -> Unconnected ) [disconnected]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Restarting receiver thread
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Unconnected -> Connecting ) [connecting]
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Handshake to peer 0 successful: Agreed network protocol version 122
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Peer authenticated using 20 bytes HMAC
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing remote state change 1999569790: 0->1 role( Secondary ) conn( Connected )
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: drbd_sync_handshake:
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: self D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: peer D3CCD55D4FDEC14A:0000000000000000:E85792A90B23D21E:0000000000000000 bits:0 flags:1520
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: uuid_compare()=no-sync by rule=reconnected
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Committing remote state change 1999569790 (primary_nodes=0)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: quorum( no -> yes ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000 pve: cleared bm UUID and bitmap D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000
May 09 21:35:18 serverpve kernel: drbd linstor_db: Preparing remote state change 2496847919: 0->all role( Primary )
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: Committing remote state change 2496847919 (primary_nodes=1)
May 09 21:35:18 serverpve kernel: drbd linstor_db pve: peer( Secondary -> Primary ) [remote]
May 09 21:35:18 serverpve kernel: drbd linstor_db/0 drbd1000: Disabling local AL-updates (optimization)
sudo journalctl -k --since "2025-05-09 21:00:00" --until "2025-05-09 22:00:00" --no-pager | grep -i drbd
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: PingAck did not arrive in time.
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000: quorum( yes -> no )
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 09 21:35:18 pve kernel: drbd linstor_db: Preparing cluster-wide state change 2595681150: 0->all empty
May 09 21:35:18 pve kernel: drbd linstor_db: Committing cluster-wide state change 2595681150 (0ms)
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Terminating sender thread
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Starting sender thread (peer-node-id 1)
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000: helper command: /sbin/drbdadm quorum-lost
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000: helper command: /sbin/drbdadm quorum-lost exit code 0
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Connection closed
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: helper command: /sbin/drbdadm disconnected
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: helper command: /sbin/drbdadm disconnected exit code 0
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: conn( NetworkFailure -> Unconnected ) [disconnected]
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Restarting receiver thread
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: conn( Unconnected -> Connecting ) [connecting]
May 09 21:35:18 pve kernel: EXT4-fs warning (device drbd1000): ext4_end_bio:342: I/O error 10 writing to inode 12 starting block 9681)
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9681
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9682
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9683
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9684
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9685
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9686
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9687
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9688
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9689
May 09 21:35:18 pve kernel: Buffer I/O error on device drbd1000, logical block 9690
May 09 21:35:18 pve kernel: Aborting journal on device drbd1000-8.
May 09 21:35:18 pve kernel: Buffer I/O error on dev drbd1000, logical block 90113, lost sync page write
May 09 21:35:18 pve kernel: JBD2: I/O error when updating journal superblock for drbd1000-8.
May 09 21:35:18 pve kernel: EXT4-fs (drbd1000): unmounting filesystem a8f8598a-f399-45b2-9fea-7460bca617f3.
May 09 21:35:18 pve kernel: Buffer I/O error on dev drbd1000, logical block 1, lost sync page write
May 09 21:35:18 pve kernel: EXT4-fs (drbd1000): I/O error while writing superblock
May 09 21:35:18 pve kernel: drbd linstor_db: Preparing cluster-wide state change 1090717473: 0->all role( Secondary )
May 09 21:35:18 pve kernel: drbd linstor_db: Committing cluster-wide state change 1090717473 (0ms)
May 09 21:35:18 pve kernel: drbd linstor_db: role( Primary -> Secondary ) [secondary]
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Handshake to peer 1 successful: Agreed network protocol version 122
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: Peer authenticated using 20 bytes HMAC
May 09 21:35:18 pve kernel: drbd linstor_db: Preparing cluster-wide state change 1999569790: 0->1 role( Secondary ) conn( Connected )
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: drbd_sync_handshake:
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: self D3CCD55D4FDEC14A:0000000000000000:E85792A90B23D21E:0000000000000000 bits:0 flags:520
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: peer D3CCD55D4FDEC14A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:1120
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: uuid_compare()=no-sync by rule=reconnected
May 09 21:35:18 pve kernel: drbd linstor_db: State change 1999569790: primary_nodes=0, weak_nodes=0
May 09 21:35:18 pve kernel: drbd linstor_db: Committing cluster-wide state change 1999569790 (12ms)
May 09 21:35:18 pve kernel: drbd linstor_db serverpve: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [connected]
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000: quorum( no -> yes ) [connected]
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [connected]
May 09 21:35:18 pve kernel: drbd linstor_db/0 drbd1000 serverpve: cleared bm UUID and bitmap D3CCD55D4FDEC14A:0000000000000000:E85792A90B23D21E:0000000000000000
May 09 21:35:18 pve kernel: drbd linstor_db: Preparing cluster-wide state change 2496847919: 0->all role( Primary )
May 09 21:35:18 pve kernel: drbd linstor_db: State change 2496847919: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
May 09 21:35:18 pve kernel: drbd linstor_db: Committing cluster-wide state change 2496847919 (1ms)
May 09 21:35:18 pve kernel: drbd linstor_db: role( Secondary -> Primary ) [primary]
May 09 21:35:18 pve kernel: EXT4-fs (drbd1000): recovery complete
May 09 21:35:18 pve kernel: EXT4-fs (drbd1000): mounted filesystem a8f8598a-f399-45b2-9fea-7460bca617f3 r/w with ordered data mode. Quota mode: none.

sudo journalctl -u drbd-reactor.service --since "2025-05-09 21:00:00" --until "2025-05-09 22:30:00" --no-pager
May 09 21:35:18 pve drbd-reactor[419872]: INFO [drbd_reactor::plugin::promoter] run: resource 'linstor_db' lost quorum
May 09 21:35:18 pve drbd-reactor[419872]: INFO [drbd_reactor::plugin::promoter] stop_actions (could trigger failure actions (e.g., reboot)): linstor_db
May 09 21:35:18 pve drbd-reactor[419872]: INFO [drbd_reactor::plugin::promoter] stop_actions: stopping 'drbd-services@linstor_db.target'
May 09 21:35:18 pve drbd-reactor[419872]: INFO [drbd_reactor::plugin::promoter] systemd_stop: systemctl stop drbd-services@linstor_db.target
May 09 21:35:18 pve drbd-reactor[419872]: INFO [drbd_reactor::plugin::promoter] run: resource 'linstor_db' may promote after 0ms
May 09 21:35:18 pve drbd-reactor[419872]: INFO [drbd_reactor::plugin::promoter] systemd_start: systemctl start drbd-services@linstor_db.target

On orangepi, nothing happened what so ever, in a bliss of ignorance:

sudo journalctl -u linstor-satellite.service --since "2025-05-09 21:00:00" --until "2025-05-09 22:30:00" --no-pager
-- Journal begins at Thu 2025-05-08 12:17:22 UTC, ends at Fri 2025-05-09 23:17:22 UTC. --
-- No entries --
sudo journalctl -u linstor-controller.service --since "2025-05-09 21:00:00" --until "2025-05-09 22:30:00" --no-pager
-- Journal begins at Thu 2025-05-08 12:17:22 UTC, ends at Fri 2025-05-09 23:22:15 UTC. --
-- No entries --
sudo journalctl -k --since "2025-05-09 21:00:00" --until "2025-05-09 22:00:00" --no-pager | grep -i drbd

No output for the last command there. And this node doesn’t have drbd-reactor.

So something I noticed here is that my “serverpve” node failed to takeover due to some dependency issue which I’ll look into, and also it showed a lot io error during the outage, which I think is normal behavior for the linstor_db right? So I wonder if the io errors are just normal drbd behaviors, where are the actual errors? I’ll try to look into it myself, but please comment or advise if you have any thoughts. Thank you!

I am not going to review the logs in full, as the cluster wasn’t yet configured properly at the time. I will say that you do have the the linstor_db resource configured to return an in-error on loss of quorum or no data accessible. The io-errors might be the expected behavior here.

Thanks for the timely reply! However it didn’t seem to keep a tiebreaker tho:

linstor resource create orangepi linstor_db --drbd-diskless
SUCCESS:
Description:
    New resource 'linstor_db' on node 'orangepi' registered.
Details:
    Resource 'linstor_db' on node 'orangepi' UUID is: e6a47cc7-60e6-4821-ae8e-0cc228d39e19
SUCCESS:
Description:
    Volume with number '0' on resource 'linstor_db' on node 'orangepi' successfully registered
Details:
    Volume UUID is: e96380f4-7f0a-475d-adf1-ae7a6187335b
SUCCESS:
    (orangepi) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Created resource 'linstor_db' on 'orangepi'
SUCCESS:
    (pve) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Added peer(s) 'orangepi' to resource 'linstor_db' on 'pve'
SUCCESS:
    (serverpve) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Added peer(s) 'orangepi' to resource 'linstor_db' on 'serverpve'
SUCCESS:
Description:
    Resource 'linstor_db' on 'orangepi' ready
Details:
    Node(s): 'orangepi', Resource: 'linstor_db'
linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node      ┊ Layers       ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db   ┊ orangepi  ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ Diskless ┊ 2025-05-10 08:28:54 ┊
┊ linstor_db   ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:40 ┊
┊ linstor_db   ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:45 ┊
┊ pm-ca77353a  ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:27 ┊
┊ pm-ca77353a  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:30 ┊
┊ testres      ┊ orangepi  ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ Diskless ┊ 2025-05-10 07:28:41 ┊
┊ testres      ┊ pve       ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:28:41 ┊
┊ testres      ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:28:44 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────╯
linstor resource delete orangepi linstor_db --keep-tiebreaker
SUCCESS:
Description:
    Node: orangepi, Resource: linstor_db preparing for deletion.
Details:
    Node: orangepi, Resource: linstor_db UUID is: e6a47cc7-60e6-4821-ae8e-0cc228d39e19
SUCCESS:
    (pve) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Preparing deletion of resource on 'pve'
SUCCESS:
    (serverpve) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Preparing deletion of resource on 'serverpve'
SUCCESS:
    (orangepi) Resource 'linstor_db' [DRBD] deleted.
SUCCESS:
    Preparing deletion of resource on 'orangepi'
SUCCESS:
Description:
    Node: orangepi, Resource: linstor_db marked for deletion.
Details:
    Node: orangepi, Resource: linstor_db UUID is: e6a47cc7-60e6-4821-ae8e-0cc228d39e19
SUCCESS:
    Cleaning up 'linstor_db' on 'orangepi'
SUCCESS:
    (pve) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Notified 'pve' that 'linstor_db' is being cleaned up on Node(s): [orangepi]
SUCCESS:
    (serverpve) Resource 'linstor_db' [DRBD] adjusted.
SUCCESS:
    Notified 'serverpve' that 'linstor_db' is being cleaned up on Node(s): [orangepi]
SUCCESS:
Description:
    Node: orangepi, Resource: linstor_db deletion complete.
Details:
    Node: orangepi, Resource: linstor_db UUID was: e6a47cc7-60e6-4821-ae8e-0cc228d39e19
linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node      ┊ Layers       ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db   ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:40 ┊
┊ linstor_db   ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:45 ┊
┊ pm-ca77353a  ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:27 ┊
┊ pm-ca77353a  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:30 ┊
┊ testres      ┊ orangepi  ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ Diskless ┊ 2025-05-10 07:28:41 ┊
┊ testres      ┊ pve       ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:28:41 ┊
┊ testres      ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:28:44 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────╯

I have set DisklessOnRemaining to false on both concerned rgs since they don’t seem to be doing anything anyway,

linstor rg list
╭────────────────────────────────────────────────────────────────────╮
┊ ResourceGroup  ┊ SelectFilter               ┊ VlmNrs ┊ Description ┊
╞════════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp     ┊ PlaceCount: 2              ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ linstor-db-grp ┊ PlaceCount: 2              ┊ 0      ┊             ┊
┊                ┊ DisklessOnRemaining: False ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ octavius       ┊ PlaceCount: 2              ┊ 0      ┊             ┊
┊                ┊ DisklessOnRemaining: False ┊        ┊             ┊
╰────────────────────────────────────────────────────────────────────╯

Thanks for the confirm, I’ll focus on the tiebreaker issue now.

So I deleted and re-spawned testres, and I guess it turned out DisklessOnRemaining is doing something while the auto-tiebreaker isn’t:

linstor resource-group spawn octavius testres 1GiB
SUCCESS:
    Volume definition with number '0' successfully  created in resource definition 'testres'.
SUCCESS:
Description:
    New resource definition 'testres' created.
Details:
    Resource definition 'testres' UUID is: 5b849b43-3fda-4d3f-a4de-e2793ab1d918
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    Resource 'testres' successfully autoplaced on 2 nodes
Details:
    Used nodes (storage pool name): 'pve (ssd_thin_pools)', 'serverpve (ssd_thin_pools)'
INFO:
    Updated testres DRBD auto verify algorithm to 'crct10dif'
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' updated from undefined to 'majority' by User
SUCCESS:
    (pve) Volume number 0 of resource 'testres' [LVM-Thin] created
SUCCESS:
    (pve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    Created resource 'testres' on 'pve'
SUCCESS:
    (serverpve) Volume number 0 of resource 'testres' [LVM-Thin] created
SUCCESS:
    (serverpve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    Created resource 'testres' on 'serverpve'
SUCCESS:
Description:
    Resource 'testres' on 'pve' ready
Details:
    Resource group: octavius
SUCCESS:
Description:
    Resource 'testres' on 'serverpve' ready
Details:
    Resource group: octavius
SUCCESS:
    (pve) Resource 'testres' [DRBD] adjusted.
SUCCESS:
    (serverpve) Resource 'testres' [DRBD] adjusted.
linstor r l
╭───────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node      ┊ Layers       ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db   ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:40 ┊
┊ linstor_db   ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-08 15:33:45 ┊
┊ pm-95528daa  ┊ pve       ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 08:32:36 ┊
┊ pm-95528daa  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 08:32:37 ┊
┊ pm-ca77353a  ┊ pve       ┊ DRBD,STORAGE ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:27 ┊
┊ pm-ca77353a  ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 07:18:30 ┊
┊ testres      ┊ pve       ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 09:10:37 ┊
┊ testres      ┊ serverpve ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2025-05-10 09:10:39 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────╯
linstor rg lp octavius
╭─────────────────────────────────────────────────────────╮
┊ Key                                        ┊ Value      ┊
╞═════════════════════════════════════════════════════════╡
┊ DrbdOptions/Resource/auto-promote          ┊ yes        ┊
┊ DrbdOptions/Resource/on-no-data-accessible ┊ suspend-io ┊
┊ DrbdOptions/Resource/on-no-quorum          ┊ suspend-io ┊
┊ DrbdOptions/Resource/quorum                ┊ majority   ┊
┊ DrbdOptions/auto-add-quorum-tiebreaker     ┊ true       ┊
┊ Internal/Drbd/QuorumSetBy                  ┊ user       ┊
╰─────────────────────────────────────────────────────────╯
linstor rg l
╭────────────────────────────────────────────────────────────────────╮
┊ ResourceGroup  ┊ SelectFilter               ┊ VlmNrs ┊ Description ┊
╞════════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp     ┊ PlaceCount: 2              ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ linstor-db-grp ┊ PlaceCount: 2              ┊ 0      ┊             ┊
┊                ┊ DisklessOnRemaining: False ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ octavius       ┊ PlaceCount: 2              ┊ 0      ┊             ┊
┊                ┊ DisklessOnRemaining: False ┊        ┊             ┊
╰────────────────────────────────────────────────────────────────────╯

I managed to solve this problem. For anyone having similar issue, the conclusion here is that, for auto-add-quorum-tiebreaker to work, auto-quorum must be the one controlling (which is the default I think), and quorum and on-no-quorum must not be set by user.

So I had to went through the codes on github because I couldn’t find any thorough documentation on this, and found out that both isAutoTieBreakerEnabled && shouldTieBreakerExist had to be true for linstor to create a tiebreaker, and CtrlRscAutoQuorumHelper.isAutoQuorumEnabled will be checked, and QuorumSetBy has to be linstor (instead of user) for it to be true.

So I went on to check whether I have DrbdOptions/auto-quorum enabled, and in its description it says that this property controls automatic setting of quorum and on-no-quorum, which I manually set.

Also from my previous outputs sent in this post there was:

linstor rg lp octavius
╭─────────────────────────────────────────────────────────╮
┊ Key                                        ┊ Value      ┊
╞═════════════════════════════════════════════════════════╡
┊ DrbdOptions/Resource/auto-promote          ┊ yes        ┊
┊ DrbdOptions/Resource/on-no-data-accessible ┊ suspend-io ┊
┊ DrbdOptions/Resource/on-no-quorum          ┊ suspend-io ┊
┊ DrbdOptions/Resource/quorum                ┊ majority   ┊
┊ DrbdOptions/auto-add-quorum-tiebreaker     ┊ true       ┊
┊ Internal/Drbd/QuorumSetBy                  ┊ user       ┊
╰─────────────────────────────────────────────────────────╯

Which showed that the value of my QuorumSetBy was user. This property seem to show only in cli, and not seen in the gui. It is also not whitelisted so users cannot manually set it.

So I went on to remove the properties quorum and on-no-quorum and added auto-quorum, and voila, the tiebreakers popped up magically. Didn’t even have to change anything on resource-definitions or add diskless and then delete with --keep-tiebreaker (which I also saw in the codes, that part of the codes was really thoughtful).

So conclusion here is that, for auto-add-quorum-tiebreaker to work, auto-quorum must be the one controlling (which is the default I think), and quorum and on-no-quorum must not be set by user.

P.S. The reason why I set them in the first place was that, in “3.1.1. Configuring Highly Available LINSTOR Database Storage” section of the linstor user guide, the commands it offered included setting those properties manually. Also I don’t remember where but I saw that for vms it’s better to have suspend-io than io-error for the latter might unmount the disk and could be nasty to recover or something. A little suggestion to the group would be that it would be helpful to mention that setting these properties manually instead of using auto-quorum will break auto-add-quorum-tiebreaker. The docs still have room for improvement, but Devin has been very supportive and the codes are pretty clear. Overall it’s been a fun experience.

1 Like

Hello, thanks for sharing this journey here.
We just recently changed the behavior of the DrbdOptions/Resource/quorum property and also just recently introduced this entire Internal/Drbd/QuorumSetBy = user|linstor internal property, so I’d say that we unfortunately also introduced a bug with 1.31.0.

For now, your workaround sounds reasonable, i.e to not manually set DrbdOptions/Resource/quorum but leave it to be managed by LINSTOR. If the property is not set anywhere, it defaults to majority anyways.

We will certainly fix this bug. The desired behavior is imho still isAutoTieBreakerEnabled && shouldTieBreakerExist, but shouldTieBreakerExist should not check if Internal/Drbd/QuorumSetBy == "linstor" but rather check DrbdOptions/Resource/quorum == "majority" regardless if this property was set by the user or by LINSTOR. I believe this behavior should also work for your setup just fine.

Thanks for replying! It’s nice to know I’ve been somewhat helpful.

1 Like

Short update: the just released LINSTOR 1.31.1 should fix this issue.

2 Likes