Not very critical, since it requires you to be in a situation where a node is in the state “required ext tools missing”, but I feel this could be handled better.
Three nodes:
node1: OK
node2: OK
node3: required_ext_tools_missing
node3 functions as tiebreaker only. All that on Proxmox.
It seems that in this situation, the *.res files generated on node1 and node2 appear to be generated with an empty hostname in place of where it should say “node3”. The effect being that no disk operations in Proxmox were possible, and a node that would get rebooted would no longer be able to read the configuration files.
This was quickly fixed with
sed -i 's/on ""/on "node3"/g' /var/lib/linstor.d/*.res
sed -i 's/host ""/host "node3"/g' /var/lib/linstor.d/*.res
but it would be neater if it didn’t do that
I fixed the required_ext_tools_missing, rebooted the node, had to re-run the sed command, because it generated the configuration files again with no node3 host, but after that it ran stable once again.
PS.: the documentation could also be updated so that, for example, the different node stati in the linstor controller are listed. I don’t know the exact wording anymore, but it was something like REQ_EXT_TOOL_MISSING or similar; it looked like a constant. At least using Google there was no documentation for that status, except for one commit, where that status got introduced. I think it would help to just have a list of all possible stati and what they mean
I would like to be more specific, but the node status was just that - required external tool missing. It read that instead of “OK” on that satellite, at the /ui/#/inventory/nodes URI of the LINSTOR controller. I can’t find the exact term, but it was something like REQ_EXT_TOOL_MISSING or such.
In the error logs I do find an error reading
Received a resource that requires DRBD9_KERNEL but that external tool is not supported on this satellite (MissingRequiredExtToolsStorageException)
so it could be that. There’s also a
Cannot run program "cat": error=0, Failed to exec spawn helper: pid: 3506215, exit value: 1 (IOException)
error. The system required a reboot, which fixed the error. Which is fine.
The problem is, that the resource files on the other two nodes had an empty host name. The .res files all looked like this
# This file was generated by LINSTOR (1.31.3), do not edit manually.
# Name
# LINSTOR nodename: node1
# Local hostname : node1
# File generated at:
# Local time : 2025-09-02 21:14:06
# UTC : 2025-09-02 19:14:06
resource "pm-12345678"
{
[...]
on ""
{
volume 0
{
disk none;
disk
{
discard-zeroes-if-aligned yes;
rs-discard-granularity 1048576;
}
meta-disk internal;
device minor 1051;
}
node-id 2;
}
[...]
connection
{
net
{
allow-two-primaries yes;
protocol C;
}
disk
{
c-max-rate 0;
c-min-rate 0;
}
host "node1" address ipv4 10.1.1.1:7051;
host "" address ipv4 10.1.1.3:7051;
}
}
Which, again, is something quick to fix, but it would be neater if it didn’t do that, because if you reboot any of the other nodes, they won’t be able to parse the .res file and wouldn’t be able to bring their resources up. I guess instead of an empty string you could just put anything in the hostname, so at least the configuration files aren’t invalid on the two functioning nodes
We have already seen this error and it is triggered if the Satellite was started and afterwards some packages upgrades are done to the jre. After that spawning any external command doesn’t work anymore, which makes the Satellite completely useless, but should be easily fixed by a satellite restart.
My guess is that then the satellite couldn’t report back its real hostname and for some reason Linstor wrote ““ into the .res file.
Absolutely. I’m just saying, LINSTOR shouldn’t write ““ into the configuration file, because it makes it invalid and - at least as long as you don’t manually edit them - useless to the other working nodes.
I suppose write anything else but an empty string so that at least it parses? 'invalid-hostname-10.1.1.3’? Seems like an easy fix and it increases the robustness of your solution