Resources stuck in "Connecting"

I have a three node Proxmox cluster and am using Linstor for storage across the three. Each node had a direct connection to the other two (for replication), as well as a main connection to my network. I’ll show this later on. Our power went out for a while and I only noticed a couple of days later that I couldn’t snapshot vm’s, etc. I wasn’t sure what was going on so I upgraded to the latest Proxmox and linstor:

# pveversion 
pve-manager/9.1.2/9d436f37a0ac4172 (running kernel: 6.17.2-2-pve)

# linstor --version
linstor-client 1.27.1; GIT-hash: 9c57f040eb3834500db508e4f04d361d006cb6b5

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 71c8bcff6ea77a022b272a7eba649a774251bac4\ build\ by\ @buildsystem\,\ 2025-11-03\ 10:34:37
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090300
DRBD_KERNEL_VERSION=9.3.0
DRBDADM_VERSION_CODE=0x092100
DRBDADM_VERSION=9.33.0

Still no luck so I dug in a saw that, even though all my nodes are online and can ping each other, the resources are stuck in the Connecting state:

# linstor resource list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName      ┊ Node            ┊ Layers       ┊ Usage  ┊ Conns                       ┊        State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db        ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ InUse  ┊ Ok                          ┊     UpToDate ┊ 2024-07-22 23:03:43 ┊
┊ linstor_db        ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-22 23:03:43 ┊
┊ linstor_db        ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-22 23:03:43 ┊
┊ pm-1e56468b       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-07-25 17:16:49 ┊
┊ pm-1e56468b       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-07-25 17:16:47 ┊
┊ pm-1e56468b       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-25 17:16:49 ┊
┊ pm-1ecca352       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-30 12:22:20 ┊
┊ pm-1ecca352       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2025-11-30 12:22:20 ┊
┊ pm-1ecca352       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-30 12:22:19 ┊
┊ pm-4ac0d6c3       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ InUse  ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-17 21:39:12 ┊
┊ pm-4ac0d6c3       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     Outdated ┊ 2025-11-17 21:39:13 ┊
┊ pm-4ac0d6c3       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-17 21:39:13 ┊
┊ pm-25c0b03c       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-08-11 02:00:23 ┊
┊ pm-25c0b03c       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-08-11 02:00:28 ┊
┊ pm-25c0b03c       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-08-11 02:00:28 ┊
┊ pm-502566f5       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-07-25 14:47:19 ┊
┊ pm-502566f5       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-07-25 14:47:20 ┊
┊ pm-502566f5       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-07-25 14:47:20 ┊
┊ pm-c448ddb7       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-08-10 17:42:31 ┊
┊ pm-c448ddb7       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2024-08-10 17:42:33 ┊
┊ pm-c448ddb7       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-08-10 17:42:33 ┊
┊ pm-d03ba4dc       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-27 22:21:30 ┊
┊ pm-d03ba4dc       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2025-11-27 22:21:31 ┊
┊ pm-d03ba4dc       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-27 22:21:31 ┊
┊ pm-f3c7f14d       ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2025-11-17 21:20:24 ┊
┊ pm-f3c7f14d       ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊     UpToDate ┊ 2025-11-17 21:20:25 ┊
┊ pm-f3c7f14d       ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2025-11-17 21:20:25 ┊
┊ vm-9001-cloudinit ┊ home-candc-srv7 ┊ DRBD,STORAGE ┊ InUse  ┊ Connecting(home-candc-srv8) ┊     UpToDate ┊ 2024-08-10 17:47:55 ┊
┊ vm-9001-cloudinit ┊ home-candc-srv8 ┊ DRBD,STORAGE ┊ Unused ┊ Connecting(home-candc-srv7) ┊ Inconsistent ┊ 2024-08-10 17:47:57 ┊
┊ vm-9001-cloudinit ┊ home-candc-srv9 ┊ DRBD,STORAGE ┊ Unused ┊ Ok                          ┊     UpToDate ┊ 2024-08-10 17:47:57 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

I’m sorry the above formatting sucks, let me know if I should post the info some other way.

Here is my node info:

# linstor node list
╭───────────────────────────────────────────────────────────────╮
┊ Node            ┊ NodeType  ┊ Addresses              ┊ State  ┊
╞═══════════════════════════════════════════════════════════════╡
┊ home-candc-srv7 ┊ SATELLITE ┊ 10.4.16.2:3366 (PLAIN) ┊ Online ┊
┊ home-candc-srv8 ┊ SATELLITE ┊ 10.4.16.3:3366 (PLAIN) ┊ Online ┊
┊ home-candc-srv9 ┊ SATELLITE ┊ 10.4.16.4:3366 (PLAIN) ┊ Online ┊
╰───────────────────────────────────────────────────────────────╯

And details on their connectivity. Note the interfaces that start with “nic_enp” are the direct connects to the other two nodes from that node:

# linstore node interface list home-candc-srv7
╭─────────────────────────────────────────────────────────────────────╮
┊ home-candc-srv7 ┊ NetInterface ┊ IP         ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ +               ┊ nic_enp7s0f0 ┊ 10.4.17.33 ┊      ┊                ┊
┊ +               ┊ nic_enp7s0f1 ┊ 10.4.17.25 ┊      ┊                ┊
┊ + StltCon       ┊ vlan21       ┊ 10.4.16.2  ┊ 3366 ┊ PLAIN          ┊
╰─────────────────────────────────────────────────────────────────────╯
# linstore node interface list home-candc-srv8
╭─────────────────────────────────────────────────────────────────────╮
┊ home-candc-srv8 ┊ NetInterface ┊ IP         ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ +               ┊ nic_enp7s0f0 ┊ 10.4.17.41 ┊      ┊                ┊
┊ +               ┊ nic_enp7s0f1 ┊ 10.4.17.26 ┊      ┊                ┊
┊ + StltCon       ┊ vlan21       ┊ 10.4.16.3  ┊ 3366 ┊ PLAIN          ┊
╰─────────────────────────────────────────────────────────────────────╯
# linstore node interface list home-candc-srv9
╭─────────────────────────────────────────────────────────────────────╮
┊ home-candc-srv9 ┊ NetInterface ┊ IP         ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ +               ┊ nic_enp7s0f0 ┊ 10.4.17.34 ┊      ┊                ┊
┊ +               ┊ nic_enp7s0f1 ┊ 10.4.17.42 ┊      ┊                ┊
┊ + StltCon       ┊ vlan21       ┊ 10.4.16.4  ┊ 3366 ┊ PLAIN          ┊
╰─────────────────────────────────────────────────────────────────────╯

Anyway, even the linstor_db resource was screwed to start with and possibly split-brained based on the status on srv7 to start with but I did a lot of googling and by doing some combination of (can’t remember exactly, it was late late last night) of drbdadm invalidate, disconnect, connect with discard-my-data I was able to get linstor_db resource back in shape somehow.

However, no matter what I try, I can’t get the other resources to connect again between srv7 and srv8. I know this will look stupid but here is what I tried for resource pm-1e56468b on srv7 based on looking at my history:

  434  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  436  drbdadm status pm-1e56468b
  437  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  438  drbdadm status pm-1e56468b
  440  drbdadm disconnect pm-1e56468b
  441  drbdadm status pm-1e56468b
  442  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  443  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  444  drbdadm status pm-1e56468b
  445  drbdadm status pm-1e56468b
  446  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  447  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  448  drbdadm status pm-1e56468b
  450  drbdadm disconnect pm-1e56468b
  451  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  452  drbdadm status pm-1e56468b
  453  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv8
  454  drbdadm status pm-1e56468b
  455  drbdadm disconnect pm-1e56468b
  456  drbdadm --discard-my-data connect pm-1e56468b:home-candc-srv9
  457  drbdadm status pm-1e56468b
  458  drbdadm connect pm-1e56468b:home-candc-srv8
  459  drbdadm status pm-1e56468b
  548  drbdadm status pm-1e56468b
  549  drbdadm verify pm-1e56468b
  554  drbdadm connect pm-1e56468b
  556  drbdadm cstate pm-1e56468b
  557  drbdadm status pm-1e56468b
  559  drbdadm up pm-1e56468b
  583  history | grep pm-1e56468b

And here is what I did on srv8 on the same resource:

481  drbdadm status pm-1e56468b
482  drbdadm connect pm-1e56468b:home-candc-srv7

Here is what drbdadm status says:

root@home-candc-srv7:~# drbdadm status
linstor_db role:Primary
  disk:UpToDate open:yes
  home-candc-srv8 role:Secondary
    peer-disk:UpToDate
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-1e56468b role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-1ecca352 role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-25c0b03c role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-4ac0d6c3 role:Primary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-502566f5 role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-c448ddb7 role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-d03ba4dc role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

pm-f3c7f14d role:Secondary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

vm-9001-cloudinit role:Primary
  disk:UpToDate open:no
  home-candc-srv8 connection:Connecting
  home-candc-srv9 role:Secondary
    peer-disk:UpToDate

Does anyone have any ideas about how to get these connected again? Am I totally screwed? I’m very disheartened right now because I’ve been trying to build something rock solid so my wife doesn’t get mad when our email is down again, but somehow things have ended up in this state.

Any help would be GREATLY appreciated! Hopefully it is just something stupid I missed.

Have a good one.

Can you share the recent kernel logs from srv7 and srv8, during the time you’ve been performing these connects/disconnects/invalidates?

It’s not entirely clear what the blocker is from the outputs themselves you’ve shared, but the kernel logs should provide more details about what is and is not happening in the connection process between the two peers, and from there I can hopefully give you some more insight in how you can solve.

Yes, I’ll dig them out and trim them down to a reasonable time frame. Thanks! I’m at my wit’s end.

I pull the logs for those two days and tar-gz’ed them but apparently I can’t upload to this forum. I guess I’ll try to filter out some examples based on the resource name and paste that in here unless there is a better way…

I’d suggest repeating your attempts to get one of the resources reconnected, and then sharing a snippet of the kernel logs from both peers here in a code block.

You could start by performing a drbdadm down <resource name> on home-candc-srv7 and home-candc-srv8 which should give a good indicator in the kernel logs of where to start looking, and then drbdadm up <resource name> which should re-initiate a connection attempt.

If you want to get a bit more precise you can try adding a logger function to your ~/.bashrc which should make it very easy to find your most recent connection attempt (perform these steps on both nodes):

log_and_run() {

logger -t “$(basename $0)” “Running command: $*”

“$@”
}

Then load it:

source ~/.bashrc

Then call it like so, using the example of resource pm-25c0b03c, on home-candc-srv7 and home-candc-srv8:

log_and_run drbdadm down pm-25c0b03c
log_and_run drbdadm up pm-25c0b03c

But unless your kernel logs are particularly noisy, you should be able to run the commands then immediately tail the last few hundred lines or so to find the connection attempt for a resource, and that will tell us more about what is going on.