Very low performance on Linstor with ZFS backed storage for Proxmox

Hi guys,

I tried using Linstor for proxmox with zfs as a backend. I am using NVME disks, but still i am getting very low performance. I have 6 proxmox nodes. 3 nodes for computation(diskless), 2 nodes for storage with nvme disks and 1 node acting as a controller. Each of these nodes are connected via 10G network interface.
I am getting around 1200 Mbps of Read and 300Mbps of write.

Can you guys help me do some optimization and get the max write performance.

Thanks in advance.

How are you testing and what are you expecting?

Hi Kermat,

I have create a zfs shared storage using linstor. The data is being replicated between 2 proxmox storage node which consists of 3.84tb x 2 Nvme drives each. The VMs are created on other 3 proxmox node which are diskless nodes. I am trying to achive maximum nvme performance using zfs shared storage and compression.
Currently on a single VMs created on the shared pool, i am getting around 1200 Mbps of Read speed and 300 Mbps of write.

I want to achieve atleast 1000 Mbps of write speed.

I don’t know where i am going wrong but is it possible for you to provide me any guide which can help me get maximum performance out of ZFS shared pool with linstor and proxmox.

Are you able to get that kind of speed using the same VMs when the VM’s virtual disks are placed directly on the ZFS storage, without DRBD or LINSTOR? I ask because in all the testing I’ve seen, ZFS is not nearly as performant as thin LVM or traditional LVM.

That said, DRBD will always add some amount of overhead to writes, since each write needs to be sent over the network to a peer, which means more latency.

Yes, i am getting much more than that.
Around 3000 Mbps write speed when place on local zfs without Linstor or DRBD.

Okay, thank, just wanted to make sure.

How are you running these tests? (fio, dd , other?)

  • block size?
  • IO queue depth?
  • number of threads?
  • random or sequential writes?
  • size of the test writes?

What is the size of the DRBD device you’re testing?

I am using CrystalDisk Mark with default settings on windows vm create on zfs with linstor. PFA screenshot for your reference.

image

What is the network latency and throughput between the DRBD nodes?

All the servers are placed in a same rack, one above other.
So the network latency is around 1ms and throughput is of 10G.

It’s hard to say with such little data from CrystalDiskMark. With fio (a Linux based tool) you would be able to see the write latency statistics of every single IO operation in a report.

You mentioned that the test VMs are accessing their data disklessly when using LINSTOR/DRBD, but you’re comparing this against local ZFS storage. What is the performance like if you run the VM on one of the Proxmox nodes that has local storage, to rule out diskless performance issues?

I did tried the same by running the VM on proxmox node with Diskfull DRBD zfs storage. But there is no change in the performance.

Also, I have tried NVME over TCP with the same infrastructure and created lvm shared pool, I am getting 1200Mbps of READ and WRITE speeds(Tested using Crystal Disk).

Unable to understand where I am going wrong, is it possible for you to share a step-by-step guide to setup ZFS pool with Linstor/DRBD on proxmox?

The image you shared above shows 1236.18 MB/s (Megabytes, with a “B”).

In your initial post you mentioned:

… which is a 10Gb network, correct? Because that would mean your reaching your upper limit for write speed in your NVMe over TCP test case.

1236.18 MB = 9.581447 Gib

Yes, i am getting that speed on lvm linstor and not on zfs linstor setup.
That is what my concern is, i want to achieve same speed/or bit lower on zfs linstor setup.

Can you share more details about the ZFS pool?

  • Are you using RAIDZ?
  • Compression?
  • Deduplication?

Do you see any CPU load on the storage nodes when running your benchmarks?

@akash Would be great if you could reply to the last questions, I’m also quite curious to find out what the bottleneck is.

If it’s not too much hassle, you may want to add a Linux VM so that you could provide the requested fio report data.