I took the time to run some performance benchmarks and wanted to share the results.Hopefully, this can spark some discussion.
I compared the following storage backends:
Raw host devices (for baseline performance)
FileThin
LVM Thin
ZFS Thin
For testing the storage backends I didn’t want to test the network so no replication is used during these tests. I ran 3 passes of the following fio command, where I was mainly interested in random write performance for 4K byte size.
The first pass was always slower than the subsequent runs. FileThin and LVM Thin showed the most improvement across passes. This suggests that caching mechanisms might be in play.
ZFS Thin was more stable, with less variation between runs.
Open Questions
Has anyone else observed caching effects in similar tests?
Are there any recommended tunings for LVM Thin or ZFS Thin to improve performance? I’ve already reviewed Performance Tuning for LINSTOR Persistent Storage in Kubernetes - LINBIT. It mentions disk-flushes and md-flushes can possibly be disabled. But this can only be done with hardware raid right? I could use some help in figuring out if I can disable this.
I wasn’t sure if you were running the same tests using the same PV, or if you were deleteing the PV between “passes”. If you were using the same PV, the performance hit could be due to block allocation during the first pass, that doesn’t have to happen during subsequent passes.
For hardware installations, you would only want to disable flushes if you have battery backed write caches on your storage controllers. Otherwise, you could end up with partial writes or corruption during hard crashes. In virtual machines, you could enable “writethrough” on the virtual storage to allow the host machines caches to handle the flushing, but you’d still leave yourself open to corruption if the host crashes and doesn’t have battery backed write caches.