Drbd-utils regression after commit aa6409657553

Our company’s QA team identified a regression in drbd-utils: it fails to create user-defined device names.

the conf:

resource drbd_zhm {
  protocol  C;
  device    /dev/drbd_passive minor 0;
  meta-disk internal;

  on 15sp7-1 {
    address 192.168.122.153:7788;
    disk    "/dev/disk/by-path/virtio-pci-0000:09:00.0";
    node-id 0;
  }

  on 15sp7-2 {
    address 192.168.122.21:7788;
    disk    "/dev/disk/by-path/virtio-pci-0000:09:00.0";
    node-id 1;
  }
  disk {
    resync-rate 10M;
  }

  connection-mesh {
    hosts 15sp7-1 15sp7-2;
  }
}

Please note the device name: /dev/drbd_passive
After commit aa6409657553, the “/dev/drbd_passive” will no longer be generated by drbd-utils udev rules.

the rootcase is that commit aa6409657553 (“drbd.rules: use drbdsetup udev command”) introduced a regression bug.

in sh_udev()@user/v9/drbdadm_main.c

941        if (!strncmp(vol->device, "/dev/drbd", 9))
942                printf("DEVICE=%s\n", vol->device + 5);
943        else
944                printf("DEVICE=drbd%u\n", vol->device_minor);

the “/dev/drbd_passive” can be generated by line 942.

after the commit aa6409657553, which changes udev maker from drbdadm to drbdsetup.
in udev_cmd()@user/v9/drbdsetup.c

printf(“DEVICE=drbd%u\n”, device->minor);

this line generates “DEVICE=drbd+minor”

Thank you for the report.

In this case, this is an intentional breaking change for non-standard naming conventions. The change to rely on drbdsetup for udev rules was made with good reason, there are all kinds of issues with using the drbdadm command:

  • drbdadm needs to parse the config file to generate the udev output. This meant that an error in unrelated config parts could somehow stop generating udev links.
  • drbdadm also would happily report resource name and links for a new volume, if an old (unmanaged) drbd volume with the same minor number exists, leading to further confusion.
  • drbdadm also needed an up-to-date hostname to find the right “on” clause, which turned out to be an issue as udevd is often started in a separate namespace, meaning drbdadm is also started in a namespace with a generic hostname, not matching the one in the drbd.conf files.

All in all, we made the decision to switch to drbdsetup, taking the configuration directly from the in-kernel data. This also means we do not have access to the “device” stanza, as that is never actually persistet in the kernel.

For us, this is a “won’t fix” issue. We recommend you update your system to use the standard symlinks such as /dev/drbd/by-res/drbd_zhm/0.

If you really have to have /dev/drbd_passive, you can still make use of the drbdadm sh-udev based rules. Take the “old” udev rules (still using drbdadm) from /lib/udev/rules.d/65-drbd.rules and move them to /etc/udev/rules.d/65-drbd.rules. This will override the rules installed by the new drbd-utils package.

Thank you for the quick reply and explanation.

Following your writing, the resource-agent file drbd.ocf should be updated.

There are two issues in this file:

  1. The ${DRBD_DEVICES} from _drbd_validate_all() is initialized by “drbdadm sh-udev”, which does not work with the output of “drbdsetup udev”. This will cause my_udevsettle() to enter an infinite sleep loop, with the following call routine: drbd_start => create_device_udev_settle => my_udevsettle.
  2. In _drbd_validate_all(), if we replace “drbdadm sh-udev” with “drbdsetup udev”, the initialization of ${DRBD_DEVICES} must be placed after the kernel generates the actual DRBD device.