Linstor issues on a virtualized install of proxmox

I’m still learning linbit/drbd and this is my first so please bear with me!

So i’m planning to use a virtualized install of proxmox as essentually just a diskless quorum node. I have two other physical proxmox nodes that I was able to install linstor/drbd on just fine. I will be using the physical nodes as a SAN in a mirrored setup for nvme-of/RoCE with 40GbE with the “proxmox in a vm“ as quorum.

I have 3 other proxmox compute nodes in my cluster. This is where this quorum node is supposed to live with independent shared storage so that it’s HA. My current issue is that during the installation of linstor on virtualized proxmox, my web gui seems to stop functioning upon editing linstor-sattelite.service. and i’m not able to properly add the node to linstor


########Add the below# after edit#####
systemctl edit linstor-satellite.service

\[Service\]
Type=notify
TimeoutStartSec=infinity

Is there something about a virtualized proxmox install that would be different enough that it would cause it to temporarily break the proxmox install until I remove that edit to the service file? Ultimately i’m just using proxmox as my guest vm for ease of installation, but i’m not really understanding why this would break it. Please let me know if anyone has any insight, i’ll probably just look into a simpler quorum vm which is probably what I shoulda done in the first place, but my intent was to use something that i knew worked with ease

From linbits guide on Getting started with Linstor in proxmox, this is the flow of commands i’m using to install

wget -O /tmp/linbit-keyring.deb \
  https://packages.linbit.com/public/linbit-keyring.deb
dpkg -i /tmp/linbit-keyring.deb
PVERS=9 && echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
 http://packages.linbit.com/public/ proxmox-$PVERS drbd-9" > /etc/apt/sources.list.d/linbit.list

apt update
#######################################

apt -y install drbd-dkms
apt -y install drbd-utils 
apt -y install drbd-reactor 
apt -y install proxmox-default-headers 
apt -y install linstor-proxmox 
apt -y install linstor-controller 
apt -y install linstor-satellite 
apt -y install linstor-client

#######################################

systemctl disable --now linstor-controller

cat << EOF > /etc/linstor/linstor-client.conf
[global]
controllers=192.168.20.50
EOF


#######################################


systemctl edit linstor-satellite.service
#########Add the below##################

[Service]
Type=notify
TimeoutStartSec=infinity

After you add the linstor-client.conf to the node, are you able to run linstor node list there and return output?

I recall that systemd configuration requiring a reachable controller via the satellite service, or else it is marked as failed immediately. So if that might not be the case right away as you are setting things up, I’d suggest you add the systemd service unit configuration after you’ve already joined the satellite node fully to the existing cluster, but before you enter production with it.

So you’d want to join it to the cluster where a linstor node list shows it as Online, perform the systemctl edit and then restart the linstor-satellite service on that node. So basically just doing that step a little later in the setup process so it has all that it needs.

If this box being virtualized has an impact on its network connectivity to the rest of the cluster it could potentially make a difference, otherwise it should work like any other Proxmox node as far as LINSTOR is concerned.

I eventually did get it working and I think something like this was the issue! 2 node cluster with a witness is now working, I recently discovered linstor gateway which seems to be the solution to my problem of utilizing linstor/drbd with nvme/roce, but i’ve been completely unable to switch the nvme transport from tcp to rdma for whatever reason, I haven’t seen anything on the docs for it either

EDIT:

looks like this may be possible with drbd-reactorctl edit ?

drbd-reactorctl restart doesn’t same to always make the new config take for whatever reason though. I tried rebooting my node and it looked like it reverted back to tcp.

drbd-reactorctl disable
followed by drbd-reactorctl enable might be a more reliable way because I can see the transport change in dmesg this time

[ 683.575965] nvmet: adding nsid 1 to subsystem schemesec:nvme:rg-pve0
[ 683.586280] nvmet_rdma: enabling port 0 (192.168.20.51:4420)

Really unsure how to get this to stick if its not surviving reboot…

Being able to use nvme/RoCE would be super cool and i feel like im really close.