High latency on HDD pool, please help!

Dear community, please help me figure out this issue.

Right now I have a task to build cold storage, and for this purpose I decided to use an HDD pool consisting of Toshiba SAS 1.2 TB 2.5" 10K 128 MB drives.

This pool was assembled by me without any cache/SSD tier disks.

When I started testing performance, my monitoring system noticed w_await (write latency) at 100–150 milliseconds, and at the very beginning of the load it even spikes up to 1000 ms, but then settles down to a stable 100–150 ms.

I’m testing with the command:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=32k --iodepth=10 --size=8G --readwrite=randrw --rwmixread=75

Performance reaches around 300 IOPS for reads and 100 IOPS for writes, which was pretty much expected.

But the latency is what bothers me.

Previously we used the same HDDs in a traditional SAN/storage array with RAID 5 pools. There, even under peak load, latency never exceeded 8–10 milliseconds.

In the case of LINSTOR + DRBD, even if I just start an idle MinIO instance on these disks (practically no load), the latency immediately jumps to around 25 milliseconds.

Could you please tell me:

  • Is this normal behavior for DRBD without cache on an HDD-only pool?

  • Or did I configure something wrong?

It would also be great if someone could suggest how to fix or significantly improve this latency.

Thanks in advance!

Forgot to mention: I’m using LVM_THIN, and the disks are connected to the nodes via HBA controllers .
Also, I’m using KVM virtualization orchestrated by CloudStack. In the same environment, I have another SSD-based pool that is working perfectly without any noticeable performance issues.

UPD: I conducted some tests, running them using the command:

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75

for all cases, and obtained the following results:

  • I simply mounted a pool with these disks as an LVM volume in the system and created a volume, getting about a 10 ms latency.

  • I created manualy DRBD (mounted directly to host) on this pool and got 25 ms with two devices, and when I left only one device, I got same .

  • From a virtual machine, using the same test, I got about 150 ms.

I looked at the output of the drbdsetup show --show-defaults command for both the resource created via CloudStack and the one created manually. The output is absolutely identical except for the following sections:

r

esource "cs-bb0f243a-e737-49b4-bbef-c30a62dc8d86" {

    options {

        cpu-mask            ""; # default

        on-no-data-accessible   suspend-io;

        auto-promote        yes; # default

        peer-ack-window     4096s; # bytes, default

        peer-ack-delay      100; # milliseconds, default

        twopc-timeout       300; # 1/10 seconds, default

        twopc-retry-timeout 1; # 1/10 seconds, default

        auto-promote-timeout    20; # 1/10 seconds, default

        max-io-depth        8000; # default

        quorum              majority;

        on-no-quorum        suspend-io; # default

        quorum-minimum-redundancy   off; # default

        on-suspended-primary-outdated   disconnect; # default

        _unknown drbd8-api-compatibility;   # not supported by kernel

    }


resource "test" {

    options {

        cpu-mask            ""; # default

        on-no-data-accessible   io-error; # default

        auto-promote        yes; # default

        peer-ack-window     4096s; # bytes, default

        peer-ack-delay      100; # milliseconds, default

        twopc-timeout       300; # 1/10 seconds, default

        twopc-retry-timeout 1; # 1/10 seconds, default

        auto-promote-timeout    20; # 1/10 seconds, default

        max-io-depth        8000; # default

        quorum              majority;

        on-no-quorum        io-error;

        quorum-minimum-redundancy   off; # default

        on-suspended-primary-outdated   disconnect; # default

        _unknown drbd8-api-compatibility;   # not supported by kernel

    }
volume 0 {

            disk {

                resync-rate         272384k; # bytes/second

                c-plan-ahead        20; # 1/10 seconds, default

                c-delay-target      10; # 1/10 seconds, default

                c-fill-target       2048s; # bytes

                c-max-rate          819200k; # bytes/second

                c-min-rate          272384k; # bytes/second

                bitmap              yes; # default

                _unknown resync-without-replication;    # not supported by kernel

            }

        }

    }
volume 0 {

            disk {

                resync-rate         272384k; # bytes/second

                c-plan-ahead        20; # 1/10 seconds, default

                c-delay-target      10; # 1/10 seconds, default

                c-fill-target       2048s; # bytes

                c-max-rate          819200k; # bytes/second

                c-min-rate          272384k; # bytes/second

                bitmap              no;

                _unknown resync-without-replication;    # not supported by kernel

            }

        }

    }

Only quorum? accesible data, and bitmap i different.

Re: the parameter differences between quorum options you showed, this knowledge base article, “LINSTOR Quorum Policies and Virtualization Environments”, might give some background on why suspend-io rather than io-error is recommended for LINSTOR storage backing VMs.

Perhaps incidental information but wanted to put it out there in case it might help with understanding.

Thanks for sharing, I am aware of the difference between these parameters. All my resources inherit io-suspend through the resource group settings. This specific resource with io-error was created manually by me for testing purposes, which is why it has the default value.

1 Like

For the testы, I have created a bcache pool on SSD disks and configured it as writeback for HDD pool. I achieved a several-fold increase in IOPS; the latency during similar tests that I mentioned above was around 70 - 90 ms. The question of why such a significant performance drop occurs when using DRBD through a virtual machine still remains unclear.