Sorry about the long gap in communication. I really appreciate the input, but got tied up. The problem I’m encountering with using ZFS below DRBD relates to storage consumption. When I set up the system the canonical way (ZFS below DRBD), the disk usage is so much higher. For example, in the following output, zpool0/site271_ds is just a dataset (32K recordsize), whereas zpool0/site271_zvol_00000 is a drbd resource on a zvol (default volblocksize). Using zvols under DRBD consumes way more storage with the same exact data in both.
No worries, have you tried using a zfsthin LINSTOR storage pool?
linstor sp create zfsthin ...
This way you can take advantage of ZFS sparse (thin) volumes.
I would expect this to be the case when comparing a dataset (in simplified terms, a filesystem) to a zvol (block device). A more appropriate comparison would be something like a ZFS dataset vs EXT4 filesystem used space.
Yes, understood. I realize I wasn’t very clear in my previous message. What I’m trying to highlight is that the zvol is inexplicably large. The DRBD resource was spawned as a 130GB volume, which somehow created a 157GB zvol. If I need to create 400 zvols, and every zvol is going to consume 20% more storage than the requested size of the DRBD disk, we’ll run out of storage quickly. Datasets do not require size specification. They consume the pool storage as needed. Plus, they benefit from compression, which zvols apparently do not. So, not only are they easier to manage, but they are much more efficient in storage utilization. I really want to build the stack the canonical way, with ZFS below DRBD, but these findings are forcing me to try it a different way. If I create one DRBD disk per physical disk, and build my raidz on top of the DRBD disks, then everything gets a lot easier and more efficient.
Any idea why spawning a 130GB DRBD resource creates a 157GB zvol? (Actually, it was 235GB because zfs gave it a default refreservation of 180%. I got it down to 157GB by setting refreservation to the same size as the zvol).
Not yet. That will be my next test. The following article has a nice breakdown of storage consumption based on array width and volblocksize.
That said, it’s only half the battle. Even if I can get the zvol down to somewhere near the size of the DRBD disk, it still does not benefit much from compression.
Just curious, what is the volblocksize currently in use for DRBD volumes?
zfs get volblocksize zpool0/site271_zvol_00000
I do expect that to be the case here, that is, assuming the output of zfs get volsize zpool0/site271_zvol_00000 matches the requested volume size in LINSTOR/DRBD.
Can you post the output of zfs get all zpool0/site271_zvol_00000 just so we have all the information?
Sorry, I destroyed it and started over fresh. With more testing, I confirmed that the 157GB size of the zvol is not Linstor’s fault. I found the command that Listor was issuing in the zfs history and it is perfectly normal. I was also able to reproduce the condition outside of DRBD just using zfs commands. I don’t know what’s going on, but I know it’s not a Linstor/DRBD issue.
That said, the zvols seem to be taking up 157GB of storage whether I create them sparse or not.
Whoops, I missed this section of output in my last reply:
Sounds good. I was just about to ask you what happens when you copy the filesystems between the zvols with differing volblocksize values before your last replies.
It appears you’re hitting some underlying issues related to padding with RAID-Z.
After considerable soul-searching, I ended up building the stack using my original plan. There are 9 x NVME drives in each server. I have a DRBD disk for each one. I then built a raidz on top of the 9 DRBD disks, and I’m creating individual zfs datasets for each customer. I realized the main reason I wanted zfs was for its compression, and that is working great. Also, using datasets makes daily administration much easier and more straightforward. No more worrying about the huge difference in storage requirements from customer to customer, no resizing drbd disks, LVM volumes, xfs filesystems, etc., and overall storage utilization is fantastically better than it would have been using zvols under DRBD disks. Also, the stack is simpler because of fewer layers. I realize I’m giving up some of zfs’ sexier capabilities, but the tradeoffs are worth it to me.
The next question is: pacemaker or drbd-reactor? I have both in my environment, and I much prefer using drbd-reactor on my database servers because customer-facing services are atomic. There’s one drbd-reactor resource with one drbd disk, one filesystem, and one mysql instance per customer. DRBD disks and their dependent resources can be individually moved between cluster nodes with ease, and drbd-reactor is overall much easier to administer than pacemaker. That said, the situation now is different. I have 9 DRBD disks all part of one raidz. Any actions that drbd-reactor triggers would have to be coordinated with all the other DRBD disks below the zpool. If any drbd-reactor resource wants to trigger a failover, it might have to gracefully manage failing over ALL drbd disks, which might involve stopping application services, unmounting all zfs datasets, exporting and imprting pools, etc. Any thoughts on whether all this is feasible with drbd-reactor?
DRBD Reactor does not support complex collocation or ordering constraints between services running on different DRBD resources. You cannot tell DRBD Reactor that all ZFS pools must be successfully exported before attempting to demote any of the DRBD devices. For that you will have to use Pacemaker.
That was the conclusion I reached. While it’s probably feasible to use drbd-reactor in conjunction with scripts to accomplish some type of coordinated failover, it’s more work than it’s worth.