Migrating Proxmox-VM(s) fails with 'Wrong medium type' on the target node (Linstor-DRBD-VMs)

Dear community,

running here is a three node cluster of Proxmox VE 8.2 with the Linstor-DRBD-Backend enabled (great, I like it!).

Live migration of VMs from one node to another has successfully being tested a few week ago.

Now, when I want to migrate a VM (any of them) to another node, the task produces an error that reads:

[node3] kvm: -drive file=/dev/drbd/by-res/vm-101-disk-1/0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open '/dev/drbd/by-res/vm-101-disk-1/0': Wrong medium type

The system log of the source node says:

Oct 14 14:47:11 ttivs1 pvedaemon[276824]: <root@pam> starting task UPID:ttivs1:00048289:00625DEC:670D12CF:qmigrate:101:root@pam:
Oct 14 14:47:13 ttivs1 pmxcfs[1867]: [status] notice: received log
Oct 14 14:47:14 ttivs1 Controller[4651]: 2024-10-14 14:47:14.013 [grizzly-http-server-16] INFO LINSTOR/Controller/58bb4b SYSTEM - REST/API RestClient(10.10.10.3; 'linstor-proxmox/8.0.4')/ModRscDfn
Oct 14 14:47:14 ttivs1 Controller[4651]: 2024-10-14 14:47:14.014 [grizzly-http-server-16] INFO LINSTOR/Controller/58bb4b SYSTEM - Resource definition modified vm-101-disk-1/false
Oct 14 14:47:14 ttivs1 Controller[4651]: 2024-10-14 14:47:14.017 [grizzly-http-server-17] INFO LINSTOR/Controller/08d41b SYSTEM - REST/API RestClient(10.10.10.3; 'linstor-proxmox/8.0.4')/LstVlm
Oct 14 14:47:14 ttivs1 Controller[4651]: 2024-10-14 14:47:14.305 [grizzly-http-server-19] INFO LINSTOR/Controller/988e30 SYSTEM - REST/API RestClient(10.10.10.3; 'linstor-proxmox/8.0.4')/LstVlm
Oct 14 14:47:14 ttivs1 pmxcfs[1867]: [status] notice: received log
Oct 14 14:47:15 ttivs1 pmxcfs[1867]: [status] notice: received log
Oct 14 14:47:15 ttivs1 Controller[4651]: 2024-10-14 14:47:15.743 [grizzly-http-server-21] INFO LINSTOR/Controller/769f8b SYSTEM - REST/API RestClient(10.10.10.3; 'linstor-proxmox/8.0.4')/LstVlm
Oct 14 14:47:15 ttivs1 pmxcfs[1867]: [status] notice: received log
Oct 14 14:47:15 ttivs1 pvedaemon[295561]: migration problems
Oct 14 14:47:15 ttivs1 pvedaemon[276824]: <root@pam> end task UPID:ttivs1:00048289:00625DEC:670D12CF:qmigrate:101:root@pam: migration problems

Updates have been made since the last successful attempt of moving a VM, so that may be a reason for the problem?

Does or did anyone experience the same issue?

What kind of additional information would you need from me?

Any help is appreciated!

Kind regards
Matthias

What does cat /proc/drbd; drbdadm status show from the target node?
I’m Wondering if the update installed a new kernel without installing a new DRBD kernel module.

Hi Kermat,

thanks for your answer!

That has been checked immediately, and the output is perfectly OK: kernel modules are loaded, and both target nodes are active secondaries.

Best regards
Matthias

I think the reason is that DRBD doesn’t change to primary before the VM is being started.

But still it’s not clear why it doesn’t …

DRBD-reactor is configured and runs on all nodes, the linstor-controller only runs on one.

It has worked before …

The resource files of the vm disks need to be identical on all nodes?
They are …

Best regards
Matthias

How is your Proxmox storage.cfg configured? Does it use the virtual IP address configured in DRBD Reactor for the HA LINSTOR Controller?

Can you post the output of the following commands:

  • linstor resource list from the LINSTOR controller
  • drbdadm status from the source and destination VM migration nodes
  • drbdsetup show from the source and destination VM migration nodes
  • grep -ie satellite -ie controller -ie drbd -ie pve /var/log/syslog from the source and destination VM migration nodes as well as the LINSTOR Controller after an attempted migration.

Hi Kermat,

thanks for your reply!

The storage.cfg:


dir: local
path /var/lib/vz
content backup,iso,vztmpl

zfspool: local-zfs
pool rpool/data
content rootdir,images
sparse 1

zfspool: datapool
pool datapool
content rootdir,images
mountpoint /datapool
nodes ttivs3,ttivs2,ttivs1

drbd: drbdstorage
content images, rootdir
controller 10.10.10.1,10.10.1.2,10.10.10.3
resourcegroup pve-rg
preferlocal yes

The file /etc/drbd-reactor.toml ist somewhat unconfigured while there is /etc/drbd-reactor.d/linstor_db.toml which reads:


[[promoter]]
id = “linstor_db”
[promoter.resources.linstor_db]
start = [“var-lib-linstor.mount”, “linstor-controller.service”]

On the source node:

linstor resource list :


╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db ┊ ttivs1 ┊ 7011 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-08-05 19:05:41 ┊
┊ linstor_db ┊ ttivs2 ┊ 7011 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-08-05 19:05:48 ┊
┊ linstor_db ┊ ttivs3 ┊ 7011 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-08-05 19:05:46 ┊
┊ vm-100-disk-1 ┊ ttivs1 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-03-12 09:10:19 ┊
┊ vm-100-disk-1 ┊ ttivs2 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-03-12 09:10:16 ┊
┊ vm-100-disk-1 ┊ ttivs3 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-07-25 14:01:39 ┊
.
.
. and so on for more disks.

linstor_db role:Primary
disk:UpToDate
ttivs2 role:Secondary
peer-disk:UpToDate
ttivs3 role:Secondary
peer-disk:UpToDate

vm-100-disk-1 role:Secondary
disk:UpToDate
ttivs2 role:Secondary
peer-disk:UpToDate
ttivs3 role:Secondary
peer-disk:UpToDate
.
.
. and so on for more disks.

drbdsetup show :


resource “linstor_db” {
options {
auto-promote no;
quorum majority;
on-no-quorum io-error;
on-suspended-primary-outdated force-secondary;
}
_this_host {
node-id 0;
volume 0 {
device minor 1011;
disk “/dev/zvol/datapool/linstor_db_00000”;
meta-disk internal;
disk {
rs-discard-granularity 16384; # bytes
}
}
}
connection {
_peer_node_id 1;
path {
_this_host ipv4 10.10.10.1:7011;
_remote_host ipv4 10.10.10.2:7011;
}
net {
cram-hmac-alg “sha1”;
shared-secret “wSnUpqB2wWzQ3FUtycSs”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs2”;
}
}
connection {
_peer_node_id 2;
path {
_this_host ipv4 10.10.10.1:7011;
_remote_host ipv4 10.10.10.3:7011;
}
net {
cram-hmac-alg “sha1”;
shared-secret “wSnUpqB2wWzQ3FUtycSs”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs3”;
}
}
}
resource “vm-100-disk-1” {
options {
auto-promote no;
quorum majority;
on-no-quorum io-error;
on-suspended-primary-outdated force-secondary;
}
_this_host {
node-id 0;
volume 0 {
device minor 1005;
disk “/dev/zvol/datapool/vm-100-disk-1_00000”;
meta-disk internal;
disk {
rs-discard-granularity 16384; # bytes
}
}
}
connection {
_peer_node_id 1;
path {
_this_host ipv4 10.10.10.1:7005;
_remote_host ipv4 10.10.10.2:7005;
}
net {
cram-hmac-alg “sha1”;
shared-secret “MljCzHjD1cUBxZvNfm7n”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs2”;
}
}
connection {
_peer_node_id 2;
path {
_this_host ipv4 10.10.10.1:7005;
_remote_host ipv4 10.10.10.3:7005;
}
net {
cram-hmac-alg “sha1”;
shared-secret “MljCzHjD1cUBxZvNfm7n”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs3”;
}
}
}
.
.
. and so on for more disks.

On the destination node:

linstor resource list:


╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db ┊ ttivs1 ┊ 7011 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-08-05 19:05:41 ┊
┊ linstor_db ┊ ttivs2 ┊ 7011 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-08-05 19:05:48 ┊
┊ linstor_db ┊ ttivs3 ┊ 7011 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-08-05 19:05:46 ┊
┊ vm-100-disk-1 ┊ ttivs1 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-03-12 09:10:19 ┊
┊ vm-100-disk-1 ┊ ttivs2 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-03-12 09:10:16 ┊
┊ vm-100-disk-1 ┊ ttivs3 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-07-25 14:01:39 ┊
.
.
. and so on for more disks.

drbdadm status:


linstor_db role:Secondary
disk:UpToDate
ttivs1 role:Primary
peer-disk:UpToDate
ttivs2 role:Secondary
peer-disk:UpToDate

vm-100-disk-1 role:Secondary
disk:UpToDate
ttivs1 role:Secondary
peer-disk:UpToDate
ttivs2 role:Secondary
peer-disk:UpToDate
.
.
. and so on for more disks.

drbdsetup show:


resource “linstor_db” {
options {
auto-promote no;
quorum majority;
on-no-quorum io-error;
on-suspended-primary-outdated force-secondary;
}
_this_host {
node-id 2;
volume 0 {
device minor 1011;
disk “/dev/zvol/datapool/linstor_db_00000”;
meta-disk internal;
disk {
rs-discard-granularity 16384; # bytes
}
}
}
connection {
_peer_node_id 0;
path {
_this_host ipv4 10.10.10.3:7011;
_remote_host ipv4 10.10.10.1:7011;
}
net {
cram-hmac-alg “sha1”;
shared-secret “wSnUpqB2wWzQ3FUtycSs”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs1”;
}
}
connection {
_peer_node_id 1;
path {
_this_host ipv4 10.10.10.3:7011;
_remote_host ipv4 10.10.10.2:7011;
}
net {
cram-hmac-alg “sha1”;
shared-secret “wSnUpqB2wWzQ3FUtycSs”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs2”;
}
}
}
resource “vm-100-disk-1” {
options {
auto-promote no;
quorum majority;
on-no-quorum io-error;
on-suspended-primary-outdated force-secondary;
}
_this_host {
node-id 2;
volume 0 {
device minor 1005;
disk “/dev/zvol/datapool/vm-100-disk-1_00000”;
meta-disk internal;
disk {
rs-discard-granularity 16384; # bytes
}
}
}
connection {
_peer_node_id 0;
path {
_this_host ipv4 10.10.10.3:7005;
_remote_host ipv4 10.10.10.1:7005;
}
net {
cram-hmac-alg “sha1”;
shared-secret “MljCzHjD1cUBxZvNfm7n”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs1”;
}
}
connection {
_peer_node_id 1;
path {
_this_host ipv4 10.10.10.3:7005;
_remote_host ipv4 10.10.10.2:7005;
}
net {
cram-hmac-alg “sha1”;
shared-secret “MljCzHjD1cUBxZvNfm7n”;
rr-conflict retry-connect;
verify-alg “crct10dif”;
_name “ttivs2”;
}
}
}
.
.
. and so on for more disks.

Unfortunately there is no /var/log/syslog !
Actually I wondered where it is before - do you have an idea?

Best regards
Matthias

Sorry - I can’t tell the reason for the changes of font sizes …

Hi Kermar,

journalctl is being used instead of syslog. The output of

journalctl | grep -ie satellite -ie controller -ie drbd -ie pve

after an attempted migration on the source node is:

Oct 27 09:27:51 ttivs1 pveproxy[3903032]: Clearing outdated entries from certificate cache
Oct 27 09:27:55 ttivs1 pvedaemon[1100032]: root@pam starting task UPID:ttivs1:000FA757:06FC7F4A:671DF98B:qmigrate:104:root@pam:
Oct 27 09:27:57 ttivs1 Controller[3780222]: 2024-10-27 09:27:57.342 [grizzly-http server-10] INFO LINSTOR/Controller/85c397 SYSTEM - REST/API RestClient(10.10.10.3; ‘linstor-proxmox/8.0.4’)/ModRscDfn
Oct 27 09:27:57 ttivs1 Controller[3780222]: 2024-10-27 09:27:57.342 [grizzly-http server-10] INFO LINSTOR/Controller/85c397 SYSTEM - Resource definition modified vm-104-disk-1/false
Oct 27 09:27:57 ttivs1 Controller[3780222]: 2024-10-27 09:27:57.345 [grizzly-http server-11] INFO LINSTOR/Controller/22dfab SYSTEM - REST/API RestClient(10.10.10.3; ‘linstor-proxmox/8.0.4’)/LstVlm
Oct 27 09:27:57 ttivs1 Controller[3780222]: 2024-10-27 09:27:57.632 [grizzly-http server-13] INFO LINSTOR/Controller/2aa837 SYSTEM - REST/API RestClient(10.10.10.3; ‘linstor-proxmox/8.0.4’)/LstVlm
Oct 27 09:27:58 ttivs1 Controller[3780222]: 2024-10-27 09:27:58.898 [grizzly-http server-15] INFO LINSTOR/Controller/932f9e SYSTEM - REST/API RestClient(10.10.10.3; ‘linstor-proxmox/8.0.4’)/LstVlm
Oct 27 09:27:58 ttivs1 pvedaemon[1025879]: migration problems
Oct 27 09:27:58 ttivs1 pvedaemon[1100032]: root@pam end task UPID:ttivs1:000FA757:06FC7F4A:671DF98B:qmigrate:104:root@pam: migration problems
Oct 27 09:28:06 ttivs1 Controller[3780222]: 2024-10-27 09:28:06.538 [grizzly-http server-17] INFO LINSTOR/Controller/ce2f27 SYSTEM - REST/API RestClient(10.10.10.3; ‘linstor-proxmox/8.0.4’)/QryAllSizeInfo

There’s no output on the destination node.

Best regards
Matthias

ps Point is, somehow: … it worked before …

Hi there,

maybe(!) I made one mistake in setting up the highly available controller.

The documentation reads:

The last but nevertheless important step is to configure the LINSTOR satellite services to not delete (and then regenerate) the resource file for the LINSTOR controller DB at its startup. Do not edit the service files directly, but use systemctl edit. Edit the service file on all nodes that could become a LINSTOR controller and that are also LINSTOR satellites.

systemctl edit linstor-satellite

[Service]
Environment=LS_KEEP_RES=linstor_db

Now it’s configured that way, but maybe some resource files came out of sync?
How can I check this?

Migration fails with:

2024-11-06 16:05:02 starting migration of VM 104 to node ‘ttivs3’ (10.10.10.3)
2024-11-06 16:05:02 starting VM 104 on remote node ‘ttivs3’
2024-11-06 16:05:03 [ttivs3] kvm: -drive file=/dev/drbd/by-res/vm-104-disk-1/0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open ‘/dev/drbd/by-res/vm-104-disk-1/0’: Wrong medium type
2024-11-06 16:05:04 [ttivs3] start failed: QEMU exited with code 1
2024-11-06 16:05:04 ERROR: online migrate failure - remote command failed with exit code 255
2024-11-06 16:05:04 aborting phase 2 - cleanup resources
2024-11-06 16:05:04 migrate_cancel
2024-11-06 16:05:05 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

I think the underlying DRBD device doesn’t do the primary/secondary switch, because the ‘wrong medium type’ error seems to happen when the device of the secondary is being accessed.

Can I test it manually?

Best regards
Matthias