Errors running linstor cli after split-brain issue

Hello all,

Writing here in case anyone has stubled across this issue and looking for any feedback or ideas:
We have a Proxmox cluster with 2-nodes DRBD + 3rd for quorum only (no DRBD/Linstor-related packages on it all). Linstor controller was set up on the two nodes with a shared DRBD resource and started on one of them only.
After a hardware failure of one of the main nodes + simulatenous failure of the 3rd quorum-only node, successed by a reboot of the 1st node as well we’ve got into a situation of DRBD split-brain, but turns out Proxmox was unable to properly move/migrate the VMs which were last running on the failed node.
Investigating into the issue, we found out that the linstor cli is unable to be run, giving out this weird error:

# linstor
Traceback (most recent call last):
  File "/usr/bin/linstor", line 24, in <module>
    linstor_client_main.main()
  File "/usr/lib/python3/dist-packages/linstor_client_main.py", line 682, in main
    LinStorCLI().run()
  File "/usr/lib/python3/dist-packages/linstor_client_main.py", line 156, in __init__
    self._parser = self.setup_parser()
  File "/usr/lib/python3/dist-packages/linstor_client_main.py", line 236, in setup_parser
    sub_cmd.setup_commands(subp)
  File "/usr/lib/python3/dist-packages/linstor_client/commands/storpool_cmds.py", line 205, in setup_commands
    p_new_storage_spaces_pool.set_defaults(func=self.create, driver=linstor.StoragePoolDriver.STORAGE_SPACES)
AttributeError: type object 'StoragePoolDriver' has no attribute 'STORAGE_SPACES'

I thought it might be that the linstor-* packages were not updated, so tried updating them on one of the machines, which got us to this error then:

# linstor
Traceback (most recent call last):
  File "/usr/bin/linstor", line 21, in <module>
    import linstor_client_main
  File "/usr/lib/python3/dist-packages/linstor_client_main.py", line 31, in <module>
    from linstor_client.commands import (
  File "/usr/lib/python3/dist-packages/linstor_client/commands/__init__.py", line 1, in <module>
    from .commands import DefaultState, Commands, MiscCommands, ArgumentError
  File "/usr/lib/python3/dist-packages/linstor_client/commands/commands.py", line 15, in <module>
    from linstor.sharedconsts import KEY_STOR_POOL_MAX_OVERSUBSCRIPTION_RATIO
ImportError: cannot import name 'KEY_STOR_POOL_MAX_OVERSUBSCRIPTION_RATIO' from 'linstor.sharedconsts' (/usr/lib/python3/dist-packages/linstor/sharedconsts.py)

This is running Proxmox VE 7, DRBD9, Linstor Controller & Common 1.30.2-1, Linstor Client 1.24.0-1.

Any ideas highly appreciated!

Solved. Turns out there was a stale /usr/lib/python3/linstor directory not cleaned up after package updates, while the current one was /usr/lib/python3.9/linstor.

Removing the stale one fixed the issues for both versions (before and after the upgrade).

I’m curious, how exactly did the split-brain happen?
Once the other two nodes are down, the last node should no longer have quorum, even if you reboot it.

Did you bring up the other two failed nodes, while the primary node was down?

The third (quorum-only) node was brought back up. All DRBD resources were manually promoted to primary so they can be used.

1 Like