I was following the blog post at Setting Up Highly Available Storage for Proxmox Using LINSTOR & the LINBIT GUI - LINBIT to deploy Linstor with Proxmox, and I have a small point for feedback.
As one step it suggests you do this:
cat << EOF | sudo tee /etc/pve/storage.cfg
drbd: linstor_storage
content images, rootdir
controller 192.168.222.130
resourcegroup pve-rg
EOF
However, I found out the hard way that this overwrites the full set of Proxmox storage definitions, deleting any existing ones.
I suggest this is changed to tee -a
so that it appends to the file, rather than overwriting it.
Regards,
Brian.
1 Like
Great suggestion. When writing the blog article, I was focused on a new deployment and overlooked this consideration. I am updating the article and I will revise this command based on your suggestion.
Thanks for taking the time to post here… and apologies for the “hard way” part.
1 Like
Not at all… your blog post was extremely helpful, thank you, and this is a new system that I have blown away several times before it’s ready for production.
1 Like
I have another minor blog comment: this time on https://linbit.com/blog/monitoring-clusters-using-prometheus-drbd-reactor/
(forum won’t let me post a link)
The Linstor Grafana dashboard (15917) has some hard-coded assumptions in it, which weren’t clear at the time I deployed it.
- The expected job name is hard-coded to be “linstor-node”. This only affects one panel, which is the scrape duration, that matches on
{job="linstor-node"}
.
- It expcets label “node” for the node name, rather than the more traditional “instance”. This affects almost all of the graph legends, as well as the drop-down node filter.
If the blog post could show the scrape configuration you’re using, it would be clear what the dashboard expects.
Fixing (1) was easy as it was just one panel query (my prometheus job was called “linstor” rather than “linstor-node”). But fixing (2) would be a lot of edits; I’m considering whether to change my scrape job to add a “node” label in addition to “instance”.
But this is a relatively minor issue. Thanks for providing a ready-made dashboard, even if I might have to tweak things to make it work.
1 Like
I think I understand what’s going on now with the dashboard, but it’s not trivial.
The following metrics have a “node” label returned natively by the exporter.
# count by (__name__) ({node!=""})
linstor_node_state{} 2
linstor_node_reconnect_attempt_count{} 2
linstor_resource_state{} 16
linstor_volume_state{} 16
linstor_volume_allocated_size_bytes{} 16
linstor_storage_pool_capacity_free_bytes{} 8
linstor_storage_pool_capacity_total_bytes{} 8
linstor_storage_pool_error_count{} 8
It gives the short node name as known to linstor, not the host’s FQDN (and therefore is likely different to the “instance” label)
# count by (node) ({node!="", __name__=~"linstor_.*"})
{node="virtual1"} 38
{node="virtual5"} 38
Now, looking further into the dashboard JSON I find that:
-
The query for the node
variable, which generates the Grafana drop-down menu, is label_values(drbdreactor_up, node)
. But drbdreactor_up
doesn’t return a node
label. It just has the job
and instance
labels that Prometheus itself adds.
-
Some dashboard queries expect a node
label, on these metrics:
linstor_scrape_duration_seconds
linstor_error_reports_count
drbd_device_written_bytes_total
drbd_device_read_bytes_total
drbd_resource_resources
scrape_duration_seconds
(which also requires job="linstor-node"
)
drbd_peerdevice_outofsync_bytes
drbd_connection_state
drbd_device_quorum
drbd_device_unintentionaldiskless
But none of these metrics have a node
label returned by the exporter.
-
Some dashboard queries expect an exported_node
label, with these metrics:
linstor_storage_pool_capacity_total_bytes
linstor_storage_pool_capacity_free_bytes
linstor_node_state
linstor_resource_state
linstor_storage_pool_error_count
linstor_volume_state
None of the exporters generate an exported_node
label. However, prometheus will by default rename <original_label>
to exported_<original_label>
, for any label returned in the scrape which clashes with a target label from service discovery, and you haven’t set honor_labels: true
.
Therefore, I think what’s happening with the expected scraping is:
- You add a target label
node: XXX
for each host in the targets file
- On the job which scrapes drbd-reactor instances (and must be called
linstor-node
)
- And also the job which scrapes linstor controller API
- For any metric which doesn’t return a
node
label, the target node
label is retained (this is the case for all drbd metrics)
- For any metric which does return a
node
label, it gets renamed to exported_node
and the original target node
label is retained (this is the case for some metrics from the controller)
This is somewhat confusing, but it makes sense, given that linstor’s idea of the node name is usually a short name which is not the same as the FQDN.
I’ll update my scrape config to add the node
labels.
1 Like