Linstor slow write with ZFS nvme pool

ondrej.lachman · December 17, 2025, 12:36pm

Hello.

I have rather simple kubernetes cluster on baremetal, with 3x controlplane and 3x worker nodes

whole cluster has 2x40G bond directly connected via arista mlag pair

worker nodes are equipped with 2x Xeon Platinum 8173M and 256G of RAM

for datastore, i use 2x samsung 990 pro with in zfs mirror pool

  pool: drbd-volumes
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Sun Dec 14 00:24:01 2025
config:

	NAME          STATE     READ WRITE CKSUM
	drbd-volumes  ONLINE       0     0     0
	  mirror-0    ONLINE       0     0     0
	    nvme0n1   ONLINE       0     0     0
	    nvme1n1   ONLINE       0     0     0

errors: No known data errors

manually created zvol has really nice write performance

all tests are with FIO

raw zvol test (ext4 FS, mounted)

4K - fsync=1
  write: IOPS=18.3k, BW=71.6MiB/s (75.1MB/s)(4297MiB/60001msec); 0 zone resets
  write: IOPS=18.4k, BW=71.7MiB/s (75.2MB/s)(4304MiB/60001msec); 0 zone resets
  write: IOPS=18.6k, BW=72.7MiB/s (76.2MB/s)(4363MiB/60001msec); 0 zone resets
  write: IOPS=18.7k, BW=72.9MiB/s (76.4MB/s)(4374MiB/60001msec); 0 zone resets
4K - fsync=0
  write: IOPS=54.8k, BW=214MiB/s (225MB/s)(12.6GiB/60001msec); 0 zone resets
  write: IOPS=54.9k, BW=214MiB/s (225MB/s)(12.6GiB/60001msec); 0 zone resets
  write: IOPS=54.9k, BW=215MiB/s (225MB/s)(12.6GiB/60001msec); 0 zone resets
  write: IOPS=54.7k, BW=214MiB/s (224MB/s)(12.5GiB/60001msec); 0 zone resets
4M - fsync=1
  write: IOPS=815, BW=3261MiB/s (3419MB/s)(191GiB/60001msec); 0 zone resets
  write: IOPS=816, BW=3266MiB/s (3424MB/s)(191GiB/60001msec); 0 zone resets
4M - fsync=0
  write: IOPS=1039, BW=4157MiB/s (4358MB/s)(244GiB/60002msec); 0 zone resets
  write: IOPS=1021, BW=4085MiB/s (4284MB/s)(239GiB/60002msec); 0 zone resets

linstor, with no replica is significantly worse (this fio test if from kubevirt VM with PVC on linstor), especially on 4K, which i find crucial for my workload

4K - fsync=1
write: IOPS=748, BW=2996KiB/s (3068kB/s)(176MiB/60002msec); 0 zone resets
write: IOPS=749, BW=2996KiB/s (3068kB/s)(176MiB/60003msec); 0 zone resets
write: IOPS=741, BW=2965KiB/s (3036kB/s)(174MiB/60001msec); 0 zone resets
write: IOPS=747, BW=2988KiB/s (3060kB/s)(175MiB/60001msec); 0 zone resets
4K - fsync=0
write: IOPS=18.3k, BW=71.6MiB/s (75.1MB/s)(4298MiB/60001msec); 0 zone resets
write: IOPS=20.2k, BW=79.1MiB/s (82.9MB/s)(4746MiB/60001msec); 0 zone resets
write: IOPS=19.4k, BW=75.8MiB/s (79.5MB/s)(4547MiB/60001msec); 0 zone resets
write: IOPS=21.2k, BW=82.7MiB/s (86.7MB/s)(4962MiB/60001msec); 0 zone resets
4M - fsync=1
write: IOPS=552, BW=2209MiB/s (2316MB/s)(129GiB/60002msec); 0 zone resets
write: IOPS=554, BW=2219MiB/s (2327MB/s)(130GiB/60003msec); 0 zone resets
4M - fsync=0
write: IOPS=750, BW=3000MiB/s (3146MB/s)(176GiB/60004msec); 0 zone resets
write: IOPS=755, BW=3022MiB/s (3169MB/s)(177GiB/60004msec); 0 zone resets

with replicas, it gets even worse (logically due to network overhead)

3x replica

4K - fsync=1
write: IOPS=571, BW=2285KiB/s (2340kB/s)(134MiB/60002msec); 0 zone resets
write: IOPS=572, BW=2288KiB/s (2343kB/s)(134MiB/60002msec); 0 zone resets
write: IOPS=571, BW=2287KiB/s (2342kB/s)(134MiB/60002msec); 0 zone resets
write: IOPS=572, BW=2291KiB/s (2345kB/s)(134MiB/60002msec); 0 zone resets
4K - fsync=0
write: IOPS=12.3k, BW=48.0MiB/s (50.3MB/s)(2879MiB/60001msec); 0 zone resets
write: IOPS=12.2k, BW=47.8MiB/s (50.1MB/s)(2868MiB/60001msec); 0 zone resets
write: IOPS=12.3k, BW=47.9MiB/s (50.3MB/s)(2876MiB/60001msec); 0 zone resets
write: IOPS=12.1k, BW=47.4MiB/s (49.7MB/s)(2843MiB/60001msec); 0 zone resets
4M - fsync=1
write: IOPS=62, BW=249MiB/s (262MB/s)(14.6GiB/60001msec); 0 zone resets
write: IOPS=62, BW=250MiB/s (262MB/s)(14.6GiB/60006msec); 0 zone resets
4M - fsync=0
write: IOPS=108, BW=435MiB/s (456MB/s)(25.5GiB/60056msec); 0 zone resets
write: IOPS=108, BW=435MiB/s (456MB/s)(25.5GiB/60053msec); 0 zone resets

i understand that fsync is not good for network overhead, so i dont care that much, but still with fsync=0, its stil pretty bad, compared to raw zfs zvol

here’s my storageclass, where some tuning options are already in place

apiVersion: ``storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-nvme-replicated
provisioner: ``linstor.csi.linbit.com
parameters:
csi.storage.k8s.io/fstype:`` “ext4”
linstor.csi.linbit.com/storagePool:`` “linstor-nvme”
linstor.csi.linbit.com/allowRemoteVolumeAccess:`` “false”
linstor.csi.linbit.com/autoPlace:`` “3”
DrbdOptions/Disk/disk-flushes: “no”
DrbdOptions/Disk/md-flushes: “no”
DrbdOptions/Net/max-buffers: “10000”
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

i tried all 3 protocols, tests i’ve sent are with protocol B

protocol A does not make much difference

also i tried to make TCP buffers larger on the workers nodes, it did not help

am i missing something , or is it because of the consumer grade nvmes ?

ondrej.lachman · December 17, 2025, 5:29pm

i created DRBD volume, 3x replicated, attached directly to worker nodes as /dev/drbd1000 to make sure that QEMU is not an bottleneck

4K - fsync=1
write: IOPS=8072, BW=31.5MiB/s (33.1MB/s)(1024MiB/32473msec); 0 zone resets
4K - fsync=0
write: IOPS=8651, BW=33.8MiB/s (35.4MB/s)(1024MiB/30300msec); 0 zone resets
4M - fsync=1
write: IOPS=121, BW=485MiB/s (508MB/s)(1024MiB/2113msec); 0 zone resets
4M - fsync=0
write: IOPS=127, BW=508MiB/s (533MB/s)(1024MiB/2014msec); 0 zone resets

looks like it ain’t

ondrej.lachman · December 17, 2025, 5:34pm

Another question, i have old ConnectX-3 NICs, they might be the bottleneck here?

kermat · December 18, 2025, 5:47pm

What do your fio test parameters look like?

What do your PVC manifests look like?

Any chance to put DRBD metadata on some other device?

Topic		Replies	Views
Very low performance on Linstor with ZFS backed storage for Proxmox Proxmox VE	14	1087	September 30, 2024
Linstor/Piraeus 2.8.1 for Kubevirt - slow disk on Linux, fast on Windows General kubernetes , drbd , linstor	12	152	September 10, 2025
Question about 2.15. Disk Error Handling Strategies (from the User Guide) DRBD	33	549	September 11, 2024
New resource becomes Standalone when using LUKS and mixing Storage Drivers LINSTOR drbd , linstor	6	55	January 16, 2026
Is ZFS recommended? LINSTOR kubernetes , drbd	4	981	October 3, 2024

Linstor slow write with ZFS nvme pool

Related topics