Hello everyone. I noticed such a feature that when the satellite service is stopped on the second node in a cluster of two nodes, then when trying to create a disk through proxmox, a resource without a replica is created in the linstore, which is not visible even in proxmox, but it exists and takes up space. If I understand correctly, it creates a resource first on the node on which the virtual machine is created and then on the second, apparently it does not check in any way that it cannot create a second resource and leaves the first one to exist. I would like this behavior to not happen at all, I understand that I can delete such a resource manually, but this behavior should not happen at all.
Sounds like a potential bug to me. I’m guessing that LINSTOR wants to create two replicas, but because it cannot (peer offline) it sends back an error to the Proxmox driver for LINSTOR and fails to cleanup the resource where it was provisioned.
I will try to recreate this in the next day or so, but in the meantime, can you share the versions of LINBIT software you have installed?
I managed to find out the reason for this behavior. Here are the versions of packages that I have installed, the latest ones for today and proxmox 8.4.5.
linstor-client/unknown,now 1.25.4-1 all [installed]
linstor-common/unknown,now 1.31.3-1 all [installed,automatic]
linstor-controller/unknown,now 1.31.3-1 all [installed]
linstor-gui/unknown,now 1.9.7-1 all [installed]
linstor-proxmox/unknown,now 8.1.2-1 all [installed]
linstor-satellite/unknown,now 1.31.3-1 all [installed]
python-linstor/unknown,now 1.25.3-1 all [installed,automatic]
drbd-dkms/unknown,now 9.2.14-1 all [installed]
drbd-reactor/unknown,now 1.9.0-1 amd64 [installed]
drbd-utils/unknown,now 9.31.0-1 amd64 [installed]
When I tried to create a resource manually, autoplacement does not allow me to do this (as it should be) and no unnecessary resource was created, but when I tried to create a resource manually, it was created without problems on one node and was not created on the second (because the second node was stopped). Then I assumed that this is the basis of the mechanism implemented in the plugin for proxmox and is called preferlocal and is set by default to “yes”. That is, it manually tries to create a resource first on its node with this parameter and then on others and when it fails, an error appears and a single resource remains. At the same time, if you set preferlocal to “no”, everything works correctly. Since my cluster consists of two nodes, I do not see any advantage for myself in using the preferlocal yes parameter, because anyway, the drbd resource becomes primary at the time of starting the virtual machine and there is only a delay for synchronizing data when creating (Correct me if I understand wrong).
Nevertheless, the option with preferlocal no suits me quite well, but I think that this is not quite the behavior that I would like to see.
However, during the test period, I came across another unpleasant moment.
Situation: Two nodes are working, then one node’s disk dies and everything continues to work correctly, but if you try to create a disk in proxmox at this moment, it will create two resources, but linstor will return proxmox code 500 and proxmox will consider that the disk was not created, thereby again resources invisible to proxmox appear that need to be deleted manually. Although these resources are quite functional and if you restore the disk on the second node, everything will synchronize and work.
As a result, I even wrote a bash script that finds resources created in linstor that are not in proxmox and I plan to use it with the monitoring system.
#!/bin/bash
proxmox_disks=$(pvesh get /nodes/pve/storage/linstor-storage/content 2>/dev/null \
| grep linstor-storage \
| awk '{ print $7 }' \
| sed 's/^[^:]*://' \
| sed 's/_.*//' \
| sort -u)
linstor_disks=$(linstor r l 2>/dev/null \
| grep "DRBD,STORAGE" \
| awk '{ print $2 }' \
| grep -v '^$' \
| sed 's/_.*//' \
| sort -u)
comm -23 <(echo "$linstor_disks") <(echo "$proxmox_disks")
It seems that these two not very critical errors with the linstor plugin are the only things I could find during the testing period, if you need any other information, you can ask.