Doc note - deploying Linstor with Proxmox

I was following the blog post at Setting Up Highly Available Storage for Proxmox Using LINSTOR & the LINBIT GUI - LINBIT to deploy Linstor with Proxmox, and I have a small point for feedback.

As one step it suggests you do this:

cat << EOF | sudo tee /etc/pve/storage.cfg
drbd: linstor_storage
    content images, rootdir
    controller 192.168.222.130
    resourcegroup pve-rg
EOF

However, I found out the hard way that this overwrites the full set of Proxmox storage definitions, deleting any existing ones.

I suggest this is changed to tee -a so that it appends to the file, rather than overwriting it.

Regards,

Brian.

1 Like

Great suggestion. When writing the blog article, I was focused on a new deployment and overlooked this consideration. I am updating the article and I will revise this command based on your suggestion.

Thanks for taking the time to post here… and apologies for the “hard way” part.

1 Like

Not at all… your blog post was extremely helpful, thank you, and this is a new system that I have blown away several times before it’s ready for production.

1 Like

I have another minor blog comment: this time on https://linbit.com/blog/monitoring-clusters-using-prometheus-drbd-reactor/ (forum won’t let me post a link)

The Linstor Grafana dashboard (15917) has some hard-coded assumptions in it, which weren’t clear at the time I deployed it.

  1. The expected job name is hard-coded to be “linstor-node”. This only affects one panel, which is the scrape duration, that matches on {job="linstor-node"}.
  2. It expcets label “node” for the node name, rather than the more traditional “instance”. This affects almost all of the graph legends, as well as the drop-down node filter.

If the blog post could show the scrape configuration you’re using, it would be clear what the dashboard expects.

Fixing (1) was easy as it was just one panel query (my prometheus job was called “linstor” rather than “linstor-node”). But fixing (2) would be a lot of edits; I’m considering whether to change my scrape job to add a “node” label in addition to “instance”.

But this is a relatively minor issue. Thanks for providing a ready-made dashboard, even if I might have to tweak things to make it work.

1 Like

I think I understand what’s going on now with the dashboard, but it’s not trivial.

The following metrics have a “node” label returned natively by the exporter.

# count by (__name__) ({node!=""})
linstor_node_state{} 2
linstor_node_reconnect_attempt_count{} 2
linstor_resource_state{} 16
linstor_volume_state{} 16
linstor_volume_allocated_size_bytes{} 16
linstor_storage_pool_capacity_free_bytes{} 8
linstor_storage_pool_capacity_total_bytes{} 8
linstor_storage_pool_error_count{} 8

It gives the short node name as known to linstor, not the host’s FQDN (and therefore is likely different to the “instance” label)

# count by (node) ({node!="", __name__=~"linstor_.*"})
{node="virtual1"} 38
{node="virtual5"} 38

Now, looking further into the dashboard JSON I find that:

  • The query for the node variable, which generates the Grafana drop-down menu, is label_values(drbdreactor_up, node). But drbdreactor_up doesn’t return a node label. It just has the job and instance labels that Prometheus itself adds.

  • Some dashboard queries expect a node label, on these metrics:

    • linstor_scrape_duration_seconds
    • linstor_error_reports_count
    • drbd_device_written_bytes_total
    • drbd_device_read_bytes_total
    • drbd_resource_resources
    • scrape_duration_seconds (which also requires job="linstor-node")
    • drbd_peerdevice_outofsync_bytes
    • drbd_connection_state
    • drbd_device_quorum
    • drbd_device_unintentionaldiskless

    But none of these metrics have a node label returned by the exporter.

  • Some dashboard queries expect an exported_node label, with these metrics:

    • linstor_storage_pool_capacity_total_bytes
    • linstor_storage_pool_capacity_free_bytes
    • linstor_node_state
    • linstor_resource_state
    • linstor_storage_pool_error_count
    • linstor_volume_state

None of the exporters generate an exported_node label. However, prometheus will by default rename <original_label> to exported_<original_label>, for any label returned in the scrape which clashes with a target label from service discovery, and you haven’t set honor_labels: true.

Therefore, I think what’s happening with the expected scraping is:

  1. You add a target label node: XXX for each host in the targets file
    • On the job which scrapes drbd-reactor instances (and must be called linstor-node)
    • And also the job which scrapes linstor controller API
  2. For any metric which doesn’t return a node label, the target node label is retained (this is the case for all drbd metrics)
  3. For any metric which does return a node label, it gets renamed to exported_node and the original target node label is retained (this is the case for some metrics from the controller)

This is somewhat confusing, but it makes sense, given that linstor’s idea of the node name is usually a short name which is not the same as the FQDN.

I’ll update my scrape config to add the node labels.

1 Like