Resources stuck in "Connecting"

I have a three node Proxmox cluster and am using Linstor for storage across the three. Each node had a direct connection to the other two (for replication), as well as a main connection to my network. I’ll show this later on. Our power went out for a while and I only noticed a couple of days later that I couldn’t snapshot vm’s, etc. I wasn’t sure what was going on so I upgraded to the latest Proxmox and linstor:

# pveversion 
pve-manager/9.1.2/9d436f37a0ac4172 (running kernel: 6.17.2-2-pve)

# linstor --version
linstor-client 1.27.1; GIT-hash: 9c57f040eb3834500db508e4f04d361d006cb6b5

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 71c8bcff6ea77a022b272a7eba649a774251bac4\ build\ by\ @buildsystem\,\ 2025-11-03\ 10:34:37
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090300
DRBD_KERNEL_VERSION=9.3.0
DRBDADM_VERSION_CODE=0x092100
DRBDADM_VERSION=9.33.0

Still no luck so I dug in a saw that, even though all my nodes are online and can ping each other, the resources are stuck in the Connecting state:

# linstor resource list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName      ┊ Node            ┊ Layers       ┊ Usage  ┊ Conns                       ┊        State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db        ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ InUse  ┊ Ok                          ┊     UpToDate ┊ 2024-07-22 23:03:43 ┊
┊ linstor_db        ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-22 23:03:43 ┊
┊ linstor_db        ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-22 23:03:43 ┊
┊ pm-1e56468b       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-07-25 17:16:49 ┊
┊ pm-1e56468b       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-07-25 17:16:47 ┊
┊ pm-1e56468b       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-25 17:16:49 ┊
┊ pm-1ecca352       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-30 12:22:20 ┊
┊ pm-1ecca352       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2025-11-30 12:22:20 ┊
┊ pm-1ecca352       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-30 12:22:19 ┊
┊ pm-4ac0d6c3       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ InUse  ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-17 21:39:12 ┊
┊ pm-4ac0d6c3       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     Outdated ┊ 2025-11-17 21:39:13 ┊
┊ pm-4ac0d6c3       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-17 21:39:13 ┊
┊ pm-25c0b03c       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-08-11 02:00:23 ┊
┊ pm-25c0b03c       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-08-11 02:00:28 ┊
┊ pm-25c0b03c       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-08-11 02:00:28 ┊
┊ pm-502566f5       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-07-25 14:47:19 ┊
┊ pm-502566f5       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-07-25 14:47:20 ┊
┊ pm-502566f5       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-25 14:47:20 ┊
┊ pm-c448ddb7       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-08-10 17:42:31 ┊
┊ pm-c448ddb7       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-08-10 17:42:33 ┊
┊ pm-c448ddb7       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-08-10 17:42:33 ┊
┊ pm-d03ba4dc       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-27 22:21:30 ┊
┊ pm-d03ba4dc       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2025-11-27 22:21:31 ┊
┊ pm-d03ba4dc       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-27 22:21:31 ┊
┊ pm-f3c7f14d       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-17 21:20:24 ┊
┊ pm-f3c7f14d       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2025-11-17 21:20:25 ┊
┊ pm-f3c7f14d       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-17 21:20:25 ┊
┊ vm-9001-cloudinit ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ InUse  ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-08-10 17:47:55 ┊
┊ vm-9001-cloudinit ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊ Inconsistent ┊ 2024-08-10 17:47:57 ┊
┊ vm-9001-cloudinit ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-08-10 17:47:57 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

I’m sorry the above formatting sucks, let me know if I should post the info some other way.

Here is my node info:

# linstor node list
╭───────────────────────────────────────────────────────────────╮
┊ Node            ┊ NodeType  ┊ Addresses              ┊ State  ┊
╞═══════════════════════════════════════════════════════════════╡
┊ home-candc-srv7 ┊ SATELLITE ┊ 10.4.16.2:3366 (PLAIN) ┊ Online ┊
┊ home-candc-srv8 ┊ SATELLITE ┊ 10.4.16.3:3366 (PLAIN) ┊ Online ┊
┊ home-candc-srv9 ┊ SATELLITE ┊ 10.4.16.4:3366 (PLAIN) ┊ Online ┊
╰───────────────────────────────────────────────────────────────╯

And details on their connectivity. Note the interfaces that start with “nic_enp” are the direct connects to the other two nodes from that node:

# linstore node interface list home-candc-srv7
╭─────────────────────────────────────────────────────────────────────╮
┊ home-candc-srv7 ┊ NetInterface ┊ IP         ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ +               ┊ nic_enp7s0f0 ┊ 10.4.17.33 ┊      ┊                ┊
┊ +               ┊ nic_enp7s0f1 ┊ 10.4.17.25 ┊      ┊                ┊
┊ + StltCon       ┊ vlan21       ┊ 10.4.16.2  ┊ 3366 ┊ PLAIN          ┊
╰─────────────────────────────────────────────────────────────────────╯
# linstore node interface list home-candc-srv8
╭─────────────────────────────────────────────────────────────────────╮
┊ home-candc-srv8 ┊ NetInterface ┊ IP         ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ +               ┊ nic_enp7s0f0 ┊ 10.4.17.41 ┊      ┊                ┊
┊ +               ┊ nic_enp7s0f1 ┊ 10.4.17.26 ┊      ┊                ┊
┊ + StltCon       ┊ vlan21       ┊ 10.4.16.3  ┊ 3366 ┊ PLAIN          ┊
╰─────────────────────────────────────────────────────────────────────╯
# linstore node interface list home-candc-srv9
╭─────────────────────────────────────────────────────────────────────╮
┊ home-candc-srv9 ┊ NetInterface ┊ IP         ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ +               ┊ nic_enp7s0f0 ┊ 10.4.17.34 ┊      ┊                ┊
┊ +               ┊ nic_enp7s0f1 ┊ 10.4.17.42 ┊      ┊                ┊
┊ + StltCon       ┊ vlan21       ┊ 10.4.16.4  ┊ 3366 ┊ PLAIN          ┊
╰─────────────────────────────────────────────────────────────────────╯

Anyway, even the linstor_db resource was screwed to start with and possibly split-brained based on the status on srv7 to start with but I did a lot of googling and by doing some combination of (can’t remember exactly, it was late late last night) of drbdadm invalidate, disconnect, connect with discard-my-data I was able to get linstor_db resource back in shape somehow.

However, no matter what I try, I can’t get the other resources to connect again between srv7 and srv8. I know this will look stupid but here is what I tried for resource pm-1e56468b on srv7 based on looking at my history:

  434  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  436  drbdadm status pm-1e56468b
  437  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  438  drbdadm status pm-1e56468b
  440  drbdadm disconnect pm-1e56468b
  441  drbdadm status pm-1e56468b
  442  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  443  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  444  drbdadm status pm-1e56468b
  445  drbdadm status pm-1e56468b
  446  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  447  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  448  drbdadm status pm-1e56468b
  450  drbdadm disconnect pm-1e56468b
  451  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  452  drbdadm status pm-1e56468b
  453  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  454  drbdadm status pm-1e56468b
  455  drbdadm disconnect pm-1e56468b
  456  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  457  drbdadm status pm-1e56468b
  458  drbdadm connect pm-1e56468b:home-candc-srv8
  459  drbdadm status pm-1e56468b
  548  drbdadm status pm-1e56468b
  549  drbdadm verify pm-1e56468b
  554  drbdadm connect pm-1e56468b
  556  drbdadm cstate pm-1e56468b
  557  drbdadm status pm-1e56468b
  559  drbdadm up pm-1e56468b
  583  history | grep pm-1e56468b

And here is what I did on srv8 on the same resource:

481  drbdadm status pm-1e56468b
482  drbdadm connect pm-1e56468b:home-candc-srv7

Here is what drbdadm status says:

root@home-candc-srv7:~# drbdadm status
linstor_db role:Primary
  disk:UpToDate open:yes
  home-candc-srv8 role:Secondary
    peer-disk:UpToDate
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-1e56468b role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-1ecca352 role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-25c0b03c role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-4ac0d6c3 role:Primary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-502566f5 role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-c448ddb7 role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-d03ba4dc role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-f3c7f14d role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

vm-9001-cloudinit role:Primary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

Does anyone have any ideas about how to get these connected again? Am I totally screwed? I’m very disheartened right now because I’ve been trying to build something rock solid so my wife doesn’t get mad when our email is down again, but somehow things have ended up in this state.

Any help would be GREATLY appreciated! Hopefully it is just something stupid I missed.

Have a good one.

Can you share the recent kernel logs from srv7 and srv8, during the time you’ve been performing these connects/disconnects/invalidates?

It’s not entirely clear what the blocker is from the outputs themselves you’ve shared, but the kernel logs should provide more details about what is and is not happening in the connection process between the two peers, and from there I can hopefully give you some more insight in how you can solve.

Yes, I’ll dig them out and trim them down to a reasonable time frame. Thanks! I’m at my wit’s end.

I pull the logs for those two days and tar-gz’ed them but apparently I can’t upload to this forum. I guess I’ll try to filter out some examples based on the resource name and paste that in here unless there is a better way…

I’d suggest repeating your attempts to get one of the resources reconnected, and then sharing a snippet of the kernel logs from both peers here in a code block.

You could start by performing a drbdadm down <resource name> on home-candc-srv7 and home-candc-srv8 which should give a good indicator in the kernel logs of where to start looking, and then drbdadm up <resource name> which should re-initiate a connection attempt.

If you want to get a bit more precise you can try adding a logger function to your ~/.bashrc which should make it very easy to find your most recent connection attempt (perform these steps on both nodes):

log_and_run() {

logger -t “$(basename $0)” “Running command: $*”

“$@”
}

Then load it:

source ~/.bashrc

Then call it like so, using the example of resource pm-25c0b03c, on home-candc-srv7 and home-candc-srv8:

log_and_run drbdadm down pm-25c0b03c
log_and_run drbdadm up pm-25c0b03c

But unless your kernel logs are particularly noisy, you should be able to run the commands then immediately tail the last few hundred lines or so to find the connection attempt for a resource, and that will tell us more about what is going on.

1 Like

Thanks. I’m going to post the output here for running that against one of the resources. I ran drbdadm down pm-f3c7f14d on both srv7 and srv8, waited about half a minute, and then ran drbdadm up pm-f3c7f14d on both nodes. Below are the logs:

home-candc-srv7:

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: conn( Connecting -> Disconnecting ) [down]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Terminating sender thread

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Starting sender thread (peer-node-id 1)

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Connection closed

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: helper command: /sbin/drbdadm disconnected

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: helper command: /sbin/drbdadm disconnected exit code 0

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: conn( Disconnecting -> StandAlone ) [disconnected]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Terminating receiver thread

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Terminating sender thread

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d: Preparing cluster-wide state change 1367122509: 0->2 conn( Disconnecting )

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d: State change 1367122509: primary_nodes=0, weak_nodes=0

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Cluster is now split

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d: Committing cluster-wide state change 1367122509 (0ms)

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [down]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: quorum( yes -> no ) [down]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [down]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Terminating sender thread

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Starting sender thread (peer-node-id 2)

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Connection closed

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: helper command: /sbin/drbdadm disconnected

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: helper command: /sbin/drbdadm disconnected exit code 0

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Disconnecting -> StandAlone ) [disconnected]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Terminating receiver thread

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Terminating sender thread

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( UpToDate -> Detaching ) [down]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: drbd_bm_resize called with capacity == 0

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( Detaching -> Diskless ) [go-diskless]

Dec 22 22:48:06 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: setting new queue limits failed

Dec 22 22:48:06 home-candc-srv7 kernel: drbd /unregistered/pm-f3c7f14d: Terminating worker thread

Dec 22 22:48:24 home-candc-srv7 Controller[8111]: 2025-12-22 22:48:24.912 [grizzly-http-server-10] INFO  LINSTOR/Controller/48f080 SYSTEM - REST/API RestClient(10.4.16.2; 'linstor-proxmox/8.1.4')/QryAllSizeInfo

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d: Starting worker thread (node-id 0)

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Starting sender thread (peer-node-id 1)

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Starting sender thread (peer-node-id 2)

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: meta-data IO uses: blk-bio

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( Diskless -> Attaching ) [attach]

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: Maximum number of peer devices = 7

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d: Method to ensure write ordering: flush

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: drbd_bm_resize called with capacity == 82844304

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: resync bitmap: bits=10355538 bits_4k=10355538 words=1132642 pages=2213

Dec 22 22:48:30 home-candc-srv7 kernel: drbd1007: detected capacity change from 0 to 82844304

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: size = 40 GB (41422152 KB)

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: bitmap READ of 2213 pages took 18 ms

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: recounting of set bits took additional 5ms

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( Attaching -> UpToDate ) [attach]

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: attached to current UUID: 5F46D23CB6F30044

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: Setting exposed data uuid: 5F46D23CB6F30044

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: conn( StandAlone -> Unconnected ) [connect]

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( StandAlone -> Unconnected ) [connect]

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: Starting receiver thread (peer-node-id 1)

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Starting receiver thread (peer-node-id 2)

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv8: conn( Unconnected -> Connecting ) [connecting]

Dec 22 22:48:30 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Unconnected -> Connecting ) [connecting]

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Handshake to peer 2 successful: Agreed network protocol version 123

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Feature flags enabled on protocol level: 0x1ff TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Peer authenticated using 20 bytes HMAC

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d: Preparing cluster-wide state change 1949118582: 0->2 role( Secondary ) conn( Connected )

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: drbd_sync_handshake:

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: self 5F46D23CB6F30044:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: peer 5F46D23CB6F30044:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:1120

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: uuid_compare()=no-sync by rule=lost-quorum

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d: State change 1949118582: primary_nodes=0, weak_nodes=0

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d: Committing cluster-wide state change 1949118582 (8ms)

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [connected]

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007: quorum( no -> yes ) [connected]

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [connected]

Dec 22 22:48:31 home-candc-srv7 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: cleared bm UUID and bitmap 5F46D23CB6F30044:0000000000000000:0000000000000000:0000000000000000

Dec 22 22:48:32 home-candc-srv7 kernel: drbd pm-f3c7f14d: Preparing remote state change 2490023500: 1->2 role( Secondary ) conn( Connected )

Dec 22 22:48:32 home-candc-srv7 kernel: drbd pm-f3c7f14d home-candc-srv9: Committing remote state change 2490023500 (primary_nodes=0)

Dec 22 22:48:32 home-candc-srv7 Controller[8111]: 2025-12-22 22:48:32.921 [grizzly-http-server-14] INFO  LINSTOR/Controller/b70253 SYSTEM - REST/API RestClient(10.4.16.3; 'linstor-proxmox/8.1.4')/QryAllSizeInfo

Dec 22 22:48:32 home-candc-srv7 Satellite[1267]: 2025-12-22 22:48:32.921 [MainWorkerPool-5] INFO  LINSTOR/Satellite/004e0f SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:48:33 home-candc-srv7 Satellite[1267]: 2025-12-22 22:48:33.089 [MainWorkerPool-5] INFO  LINSTOR/Satellite/004e0f SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 712236855/828375040

Dec 22 22:48:42 home-candc-srv7 Controller[8111]: 2025-12-22 22:48:42.173 [grizzly-http-server-9] INFO  LINSTOR/Controller/d7fd5e SYSTEM - REST/API RestClient(10.4.16.4; 'linstor-proxmox/8.1.4')/QryAllSizeInfo

Dec 22 22:48:50 home-candc-srv7 Satellite[1267]: 2025-12-22 22:48:50.314 [MainWorkerPool-7] INFO  LINSTOR/Satellite/004e11 SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:48:50 home-candc-srv7 Satellite[1267]: 2025-12-22 22:48:50.433 [MainWorkerPool-7] INFO  LINSTOR/Satellite/004e11 SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 712236855/828375040

Dec 22 22:49:13 home-candc-srv7 postfix/qmgr[1567]: CB507440511: from=<root@home-prd-srv7.mccurleyweb.net>, size=16597, nrcpt=1 (queue active)

Dec 22 22:49:13 home-candc-srv7 postfix/qmgr[1567]: 0F40C4403C0: from=<root@home-prd-srv7.mccurleyweb.net>, size=22714, nrcpt=1 (queue active)

Dec 22 22:49:14 home-candc-srv7 postfix/smtp[3151790]: CB507440511: to=<davidmac@mccurleyweb.net>, relay=mail.mccurleyweb.net[10.4.32.6]:25, delay=420508, delays=420508/0.01/0.56/0.09, dsn=4.1.8, status=deferred (host mail.mccurleyweb.net[10.4.32.6] said: 450 4.1.8 <root@home
-prd-srv7.mccurleyweb.net>: Sender address rejected: Domain not found (in reply to RCPT TO command))

Dec 22 22:49:14 home-candc-srv7 postfix/smtp[3151791]: 0F40C4403C0: to=<davidmac@mccurleyweb.net>, relay=mail.mccurleyweb.net[10.4.32.6]:25, delay=160207, delays=160206/0.01/0.54/0.14, dsn=4.1.8, status=deferred (host mail.mccurleyweb.net[10.4.32.6] said: 450 4.1.8 <root@home
-prd-srv7.mccurleyweb.net>: Sender address rejected: Domain not found (in reply to RCPT TO command))

Dec 22 22:49:24 home-candc-srv7 Controller[8111]: 2025-12-22 22:49:24.719 [grizzly-http-server-1] INFO  LINSTOR/Controller/258c22 SYSTEM - REST/API RestClient(10.4.16.2; 'linstor-proxmox/8.1.4')/QryAllSizeInfo

Dec 22 22:49:43 home-candc-srv7 Controller[8111]: 2025-12-22 22:49:43.112 [grizzly-http-server-2] INFO  LINSTOR/Controller/5fded8 SYSTEM - REST/API RestClient(10.4.16.3; 'linstor-proxmox/8.1.4')/QryAllSizeInfo

Dec 22 22:49:43 home-candc-srv7 Satellite[1267]: 2025-12-22 22:49:43.112 [MainWorkerPool-8] INFO  LINSTOR/Satellite/004e12 SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:49:43 home-candc-srv7 Satellite[1267]: 2025-12-22 22:49:43.275 [MainWorkerPool-8] INFO  LINSTOR/Satellite/004e12 SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 712236855/828375040

Dec 22 22:49:52 home-candc-srv7 Controller[8111]: 2025-12-22 22:49:52.070 [grizzly-http-server-5] INFO  LINSTOR/Controller/c297ef SYSTEM - REST/API RestClient(10.4.16.4; 'linstor-proxmox/8.1.4')/QryAllSizeInfo

Dec 22 22:50:13 home-candc-srv7 pveproxy[3127064]: worker exit

Dec 22 22:50:13 home-candc-srv7 pveproxy[2424]: worker 3127064 finished

Dec 22 22:50:13 home-candc-srv7 pveproxy[2424]: starting 1 worker(s)

Dec 22 22:50:13 home-candc-srv7 pveproxy[2424]: worker 3152041 started

Dec 22 22:50:24 home-candc-srv7 Controller[8111]: 2025-12-22 22:50:24.229 [grizzly-http-server-7] INFO  LINSTOR/Controller/2ca01f SYSTEM - REST/API RestClient(10.4.16.2; 'linstor-proxmox/8.1.4')/QryAllSizeInfo


and here is home-candc-srv8:

Dec 22 22:48:06 home-candc-srv8 kernel: drbd pm-f3c7f14d: Preparing remote state change 1367122509: 0->2 conn( Disconnecting )

Dec 22 22:48:06 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Committing remote state change 1367122509 (primary_nodes=0)

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: conn( Connecting -> Disconnecting ) [down]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Terminating sender thread

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Starting sender thread (peer-node-id 0)

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Connection closed

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: helper command: /sbin/drbdadm disconnected

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: helper command: /sbin/drbdadm disconnected exit code 0

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: conn( Disconnecting -> StandAlone ) [disconnected]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Terminating receiver thread

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Terminating sender thread

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d: Preparing cluster-wide state change 717105052: 1->2 conn( Disconnecting )

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d: State change 717105052: primary_nodes=0, weak_nodes=0

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Cluster is now split

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d: Committing cluster-wide state change 717105052 (0ms)

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [down]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: quorum( yes -> no ) [down]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [down]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Terminating sender thread

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Starting sender thread (peer-node-id 2)

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Connection closed

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: helper command: /sbin/drbdadm disconnected

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: helper command: /sbin/drbdadm disconnected exit code 0

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Disconnecting -> StandAlone ) [disconnected]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Terminating receiver thread

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Terminating sender thread

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( UpToDate -> Detaching ) [down]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: drbd_bm_resize called with capacity == 0

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( Detaching -> Diskless ) [go-diskless]

Dec 22 22:48:07 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: setting new queue limits failed

Dec 22 22:48:07 home-candc-srv8 kernel: drbd /unregistered/pm-f3c7f14d: Terminating worker thread

Dec 22 22:48:27 home-candc-srv8 pveproxy[3062782]: worker exit

Dec 22 22:48:27 home-candc-srv8 pveproxy[2608]: worker 3062782 finished

Dec 22 22:48:27 home-candc-srv8 pveproxy[2608]: starting 1 worker(s)

Dec 22 22:48:27 home-candc-srv8 pveproxy[2608]: worker 3073244 started

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d: Starting worker thread (node-id 1)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Starting sender thread (peer-node-id 0)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Starting sender thread (peer-node-id 2)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: meta-data IO uses: blk-bio

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( Diskless -> Attaching ) [attach]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: Maximum number of peer devices = 7

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d: Method to ensure write ordering: flush

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: drbd_bm_resize called with capacity == 82844304

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: resync bitmap: bits=10355538 bits_4k=10355538 words=1132642 pages=2213

Dec 22 22:48:32 home-candc-srv8 kernel: drbd1007: detected capacity change from 0 to 82844304

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: size = 40 GB (41422152 KB)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: bitmap READ of 2213 pages took 20 ms

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: recounting of set bits took additional 4ms

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: disk( Attaching -> UpToDate ) [attach]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: attached to current UUID: 5F46D23CB6F30044

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: Setting exposed data uuid: 5F46D23CB6F30044

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: conn( StandAlone -> Unconnected ) [connect]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( StandAlone -> Unconnected ) [connect]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: Starting receiver thread (peer-node-id 0)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Starting receiver thread (peer-node-id 2)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv7: conn( Unconnected -> Connecting ) [connecting]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Unconnected -> Connecting ) [connecting]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Handshake to peer 2 successful: Agreed network protocol version 123

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Feature flags enabled on protocol level: 0x1ff TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: Peer authenticated using 20 bytes HMAC

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d: Preparing cluster-wide state change 2490023500: 1->2 role( Secondary ) conn( Connected )

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: drbd_sync_handshake:

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: self 5F46D23CB6F30044:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: peer 5F46D23CB6F30044:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:1120

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: uuid_compare()=no-sync by rule=lost-quorum

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d: State change 2490023500: primary_nodes=0, weak_nodes=0

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d: Committing cluster-wide state change 2490023500 (9ms)

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d home-candc-srv9: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [connected]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007: quorum( no -> yes ) [connected]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [connected]

Dec 22 22:48:32 home-candc-srv8 kernel: drbd pm-f3c7f14d/0 drbd1007 home-candc-srv9: cleared bm UUID and bitmap 5F46D23CB6F30044:0000000000000000:0000000000000000:0000000000000000

Dec 22 22:48:32 home-candc-srv8 Satellite[1269]: 2025-12-22 22:48:32.921 [MainWorkerPool-2] INFO  LINSTOR/Satellite/004e09 SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:48:33 home-candc-srv8 Satellite[1269]: 2025-12-22 22:48:33.086 [MainWorkerPool-2] INFO  LINSTOR/Satellite/004e09 SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 702793385/828375040

Dec 22 22:48:43 home-candc-srv8 pveproxy[3073244]: Clearing outdated entries from certificate cache

Dec 22 22:48:50 home-candc-srv8 Satellite[1269]: 2025-12-22 22:48:50.322 [MainWorkerPool-3] INFO  LINSTOR/Satellite/004e0b SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:48:50 home-candc-srv8 Satellite[1269]: 2025-12-22 22:48:50.454 [MainWorkerPool-3] INFO  LINSTOR/Satellite/004e0b SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 702793385/828375040

Dec 22 22:49:43 home-candc-srv8 Satellite[1269]: 2025-12-22 22:49:43.112 [MainWorkerPool-5] INFO  LINSTOR/Satellite/004e0c SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:49:43 home-candc-srv8 Satellite[1269]: 2025-12-22 22:49:43.270 [MainWorkerPool-5] INFO  LINSTOR/Satellite/004e0c SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 702793385/828375040

Dec 22 22:50:43 home-candc-srv8 postfix/qmgr[1569]: 04488140637: from=<root@home-prd-srv8.mccurleyweb.net>, size=19503, nrcpt=1 (queue active)

Dec 22 22:50:43 home-candc-srv8 postfix/qmgr[1569]: 22071140438: from=<root@home-prd-srv8.mccurleyweb.net>, size=21841, nrcpt=1 (queue active)

Dec 22 22:50:44 home-candc-srv8 postfix/smtp[3073819]: 04488140637: to=<davidmac@mccurleyweb.net>, relay=mail.mccurleyweb.net[10.4.32.6]:25, delay=420544, delays=420543/0.01/0.77/0.04, dsn=4.1.8, status=deferred (host mail.mccurleyweb.net[10.4.32.6] said: 450 4.1.8 <root@home
-prd-srv8.mccurleyweb.net>: Sender address rejected: Domain not found (in reply to RCPT TO command))

Dec 22 22:50:44 home-candc-srv8 postfix/smtp[3073820]: 22071140438: to=<davidmac@mccurleyweb.net>, relay=mail.mccurleyweb.net[10.4.32.6]:25, delay=160210, delays=160209/0.01/0.76/0.15, dsn=4.1.8, status=deferred (host mail.mccurleyweb.net[10.4.32.6] said: 450 4.1.8 <root@home
-prd-srv8.mccurleyweb.net>: Sender address rejected: Domain not found (in reply to RCPT TO command))

Dec 22 22:50:53 home-candc-srv8 Satellite[1269]: 2025-12-22 22:50:53.088 [MainWorkerPool-6] INFO  LINSTOR/Satellite/004e0d SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807

Dec 22 22:50:53 home-candc-srv8 Satellite[1269]: 2025-12-22 22:50:53.289 [MainWorkerPool-6] INFO  LINSTOR/Satellite/004e0d SYSTEM - SpaceInfo: lsstorpool_ssd2_1 -> 702793385/828375040

I appreciate you taking a look at this.

Did my post above give anyone any ideas? I’ve been away from the problem due to a family illness but I’m still stuck with nowhere to go and can’t figure out what is wrong.

Sorry to hear about the illness in the fam, hope everyone gets well soon.

As far as your logs, I am not seeing anything particularly exotic regarding the connection hang between srv7 and srv8, which is suggesting to me that this may be a network issue, or at least that the route that is configured between the two peers should be tested.

Installing telnet and telnetd temporarily on both srv7 and srv8 will allow you to check that route based on the result of a telnet [ip address] [port] to the peer from both servers.

But before that you may wish to check what the actual configuration is for your routing in the resource files that LINSTOR created here. What does a ‘drbdsetup show’ output for one of your resources when run on srv7, and when run on srv8? Those IP addresses and ports you see in that resource configuration for each peer are the ones that are used, and I am wondering if they are being properly expressed or utilized in LINSTOR right now.

So either the routing itself has an issue, which you can test for with telnet, or, the resource configs have a different IP address and port assignment for the route between srv7 and srv8 than what you expect. As for why that would be the case given the LINSTOR output, can be addressed later, but I think confirming the actual effective configuration first will be a good first step.

I had similar behavior. In my case I used bond of two interfaces for linstor replication and one of them was configured with a wrong VLAN or maybe just connected elsewhere. I could ping everything I wanted but “connecting” errors persisted. So I did manual check of all network connections (including cluster), i.e. disconnected first interface of the bond and checked connectivity, then reconnected first and disconnected second interface of the bond and ran checks.

Hi,

Did you resolve the problem?
I have noticed similar behaviour today, are you using load-balancing over TCP by any chance ?

I haven’t resolved the issue yet. Thanks everyone for your concern and advice so far. I’m only getting short periods to check things but I’m hoping to try some of the additional suggestions above tonight or tomorrow night. I will definitely post back here if I figure something out. I do run OSPF routing on these nodes so they will fall back over the non-direct link if the direct links go down, but I haven’t changed that config since these were set up and working fine. Maybe some of the non-proxmox updates have jinked something up.

Thanks to everyone’s help with the commands to use for investigation, this drbdsetup show command helped me find why they aren’t connecting, although I haven’t figured out how to correct it yet.

Reminder of my topology (not all NICs shown):

According to the drbsetup show output, for some reason srv7 is trying to talk to srv8 over the wrong interface i.e. it is trying to connect to 10.4.17.41 from 10.4.17.33. I remember setting prefnic stuff and some settings around this in the past, and everything was working. Now, for some reason this connection affinity stuff isn’t working. Also, my intent was that if a direct connect was down it would route the connection over the admin interface (10.4.0.*), and testing this also worked at some point.

Some settings seem to have gotten lost. linstor node-connection list and linstor node-connection path list for any of the three nodes doesn’t show any properties or anything. It looks like there is still a prefnic property set but that isn’t enough. Here is a thread I created when I was first setting up and followed that advice:

I think there are some new capabilities now regarding mutliple paths and direct connects so I’m going to re-read those sections of the manual and see if I can get this working again. I still don’t have root cause of how some of the settings seem gone now but at least there is some light at the end of the tunnel.

Thanks for the help!

Hoorah!!! Things seem to be working again. Here is what I did:

Note: I didn’t call it out but I also have a LAN connection interface called vlan21 that is for normal connections.

First I set the PrefNic property on all the nodes to vlan21 and they started syncing again, although this was just an interim step, but it gave me a sigh of relief :slight_smile:

Next, I set the connections paths again for the primary/direct cross connections between servers (like I had many moons ago):

linstor node-connection path create home-candc-srv7 home-candc-srv8 path7to8 nic_enp7s0f1 nic_enp7s0f1
linstor node-connection path create home-candc-srv7 home-candc-srv9 path7to9 nic_enp7s0f0 nic_enp7s0f0
linstor node-connection path create home-candc-srv8 home-candc-srv9 path8to9 nic_enp7s0f0 nic_enp7s0f1

Then, I set “backup” paths to go over the vlan in case any of the direct links failed:

linstor node-connection path create home-candc-srv7 home-candc-srv8 vlan-path7to8 vlan21 vlan21
linstor node-connection path create home-candc-srv7 home-candc-srv9 vlan-path7to9 vlan21 vlan21
linstor node-connection path create home-candc-srv8 home-candc-srv9 vlan-path8to9 vlan21 vlan21

Now, at this point I still have the PrefNic property on the nodes set to vlan21. I’m not sure how this interacts with the above settings (I’m thinking it has no effect when paths are in place?) but I want to remove that property so there is no confusion.

  1. Should I remove the PrefNic property now?
  2. How do I do remove the property? I see linstor node set-property but I don’t see a linstor node delete-property command and I can’t find it in the online manual.

Again, thanks everyone for your help!

1 Like

Glad things are back in operation for you again!

The properties for the linstor node set-property can be unset by leaving the value blank. So to unset for home-candc-srv7, you’d do this:

linstor node set-property home-candc-srv7 PrefNic

1 Like

Thanks! I’ve got it removed now.