Disk setup advice

Hi there!

I’m setting up a new PVE cluster and looking for some best practices advice on the disk setup.

I have 3 identical HPE servers all fitted with Smart Array P440ar Raid controllers and 8 SSD disks.

As I see there are two routes I can take for the disk setup;

  1. Set up volume using HW RAID on each server and put DRDB on a partition
  2. Enable HBA to expose the disks directly to Proxmox, one SSD for the OS, and set up the remaining SSDs as a ZFS RAID volume for DRDB.

Are both fine or does one have an advantage over the other?

Any other tips are also welcome of course :pray:

Thanks!

Since you have a full 3 node setup, did you consider using ceph? If the hardware requirements are satisfied, I prefer a hyperconverged PVE+Ceph for a general purpose virtualization cluster. (There is a guide in the Proxmox Wiki how to install that). At least a 10 GbE Storage Network for Ceph is highly recommended. A small Cluster like yours will probably work very well with LinStor/DRBD with a dedicated 1 GbE Network.

For a comparison for LinStor / Ceph, you may look at the LINBIT blog:
See Blog article: How does LinStor compare to Ceph?

A general recommendation from my side is to generally use 2 ssds for the OS either as hardware raid or software raid, as you prefer. (The SSDs for the OS may be cheaper models, not the expensive, high duration, high capacity, enterprise SSDs)

1 Like

I would start with exploring LINSTOR and using our storage plugin for Proxmox:

Here’s a quick overview of some things you can do with LINSTOR and Proxmox:

Either way will work. If you have a hardware RAID card that pairs nicely with the drives in your servers, that is still a valid option. Some prefer ZFS, or spec their systems to build RAID-Z arrays.

A good question to ask yourself - If you built a single Proxmox node without replication or high availability, would you choose ZFS or your hardware RAID controller? DRBD/LINSTOR tends to be fairly agnostic about the storage layer underneath.

Keep in mind using ZFS with LINSTOR only uses ZVOLs, not the ZFS dataset/filesystem functionality. It is merely a way to slice up the storage in the system for replicated virtual machine disk images.

1 Like

Thanks, I will look into both options!

Hi there, I am sorry to drag up an old thread here but interested to know what the guidance is here. I’ve followed some of the LINBIT docs to create a PROXMOX Cluster, using 3 servers with a total of 8 hard drives in each. I have 2 disks RAID1 mirror for PROXMOX and 1 SSD was configured during following of the doc.

I do however have a total of 6 SSD’s which I’d like to use in a software RAID 6. However, upon reading the ‘LINBIT - Striping LINSTOR Volumes Across Several Physical Devices’ doc, there should be configurations of 4k stripes for the metadata (I’m thinking of allowing 1GB for that) and then 256k or 512k file system chunks for streaming / working VM disk images, and finally smaller 32k or 64k filesystem chunks for Database or messaging que applications.

So, I’m thinking maybe I could use a software RAID6 setup with LVM or some other mechanism to create the 3 different volumes on that LVM?

Not sure atm, it’s more complex than I imagined and I haven’t found anything which can directly answer my questions so far. Wouldn’t mind understanding what others have done or would recommend in a similar situation. Thanks! Sel

Hello and welcome to the forum!

This knowledge base article, LINSTOR and DRBD Hardware Considerations, has some general information about hardware considerations, including RAID, when using LINSTOR and DRBD.

Might have more general advice than what you are looking for but mentioning it in case it could be a jumping off point, or until someone else might be able to provide ideas specific to your hardware and RAID choice.

1 Like

To a degree you can mix and match, but it depends on whether you prefer to have LVM or ZFS underneath for the block provisioning.

On the 3-node cluster I recently built, I use LVM. There are four storage disks on each node. I put them all into a single volume group “sata_ssd”. I then created mirrored thin pools - “thin0” on sda+sdb, and “thin1” on sdc+sdd, using a portion of available space.

(The reason for two separate thin pools with mirroring was to reduce the blast radius if a thin pool dies)

I created Linstor storage groups “sata_ssd” to use the volume group directly, and “sata_ssd_thin0” and “sata_ssd_thin1” for the thin pools.

$ linstor sp list -p
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| StoragePool          | Node     | Driver   | PoolName       | FreeCapacity | TotalCapacity | CanSnapshots | State | SharedName                    |
|===================================================================================================================================================|
| DfltDisklessStorPool | virtual1 | DISKLESS |                |              |               | False        | Ok    | virtual1;DfltDisklessStorPool |
| DfltDisklessStorPool | virtual5 | DISKLESS |                |              |               | False        | Ok    | virtual5;DfltDisklessStorPool |
| DfltDisklessStorPool | virtual6 | DISKLESS |                |              |               | False        | Ok    | virtual6;DfltDisklessStorPool |
| sata_ssd             | virtual1 | LVM      | sata_ssd       |     1.63 TiB |      3.49 TiB | False        | Ok    | virtual1;sata_ssd             |
| sata_ssd             | virtual5 | LVM      | sata_ssd       |     1.43 TiB |      3.30 TiB | False        | Ok    | virtual5;sata_ssd             |
| sata_ssd             | virtual6 | LVM      | sata_ssd       |     1.43 TiB |      3.30 TiB | False        | Ok    | virtual6;sata_ssd             |
| sata_ssd_thin0       | virtual1 | LVM_THIN | sata_ssd/thin0 |   109.57 GiB |       150 GiB | True         | Ok    | virtual1;sata_ssd_thin0       |
| sata_ssd_thin0       | virtual5 | LVM_THIN | sata_ssd/thin0 |    76.90 GiB |       150 GiB | True         | Ok    | virtual5;sata_ssd_thin0       |
| sata_ssd_thin0       | virtual6 | LVM_THIN | sata_ssd/thin0 |    99.41 GiB |       150 GiB | True         | Ok    | virtual6;sata_ssd_thin0       |
| sata_ssd_thin1       | virtual1 | LVM_THIN | sata_ssd/thin1 |    64.00 GiB |       150 GiB | True         | Ok    | virtual1;sata_ssd_thin1       |
| sata_ssd_thin1       | virtual5 | LVM_THIN | sata_ssd/thin1 |   137.74 GiB |       150 GiB | True         | Ok    | virtual5;sata_ssd_thin1       |
| sata_ssd_thin1       | virtual6 | LVM_THIN | sata_ssd/thin1 |   135.03 GiB |       150 GiB | True         | Ok    | virtual6;sata_ssd_thin1       |
+---------------------------------------------------------------------------------------------------------------------------------------------------+

I then created two linstor resource groups. One has a replication factor of 2, and uses both thin0 and thin1 storage pools. The other has a replication factor of 3, and uses the underlying LVM volume group directly - i.e. no local disk mirroring.

|-----------------------------------------------------------------------------------------------------------------|
| sata_ssd_thick | PlaceCount: 3                                  |        | LVM thick provisioning, no snapshots |
|                | StoragePool(s): sata_ssd                       |        |                                      |
|-----------------------------------------------------------------------------------------------------------------|
| sata_ssd_thin  | PlaceCount: 2                                  |        | LVM thin provisioning                |
|                | StoragePool(s): sata_ssd_thin0, sata_ssd_thin1 |        |                                      |
+-----------------------------------------------------------------------------------------------------------------+

These two resource groups are exposed as “storages” in Proxmox.

drbd: linstor-thick
	resourcegroup sata_ssd_thick
	content images,rootdir
	controller xxxx

drbd: linstor-thin
	resourcegroup sata_ssd_thin
	content rootdir,images
	controller xxxx

This then gives me a choice when creating a VM. If it’s a “small” VM, and I want snapshot capability, then I create it using “linstor-thin”. I end up with 4 copies: two mirrors on each of two nodes. If it’s a “large” VM then I create it using “linstor-thick”, and I get 3 copies, one on each of the three nodes; these have no risk of running out of space due to over-provisioning, but they have no snapshot capability. (These “large” VMs are typically Incus container hosts with ZFS, so I have internal snapshot capability on those)

This setup may be more complicated than your needs require, but I like the fact that there’s a short disk I/O path.

If you choose to use ZFS/ZVOL as the underlying storage then you don’t have to worry about snapshots, as these are always available (“ZFS_THIN” in this case just stops ZFS making reservations for snapshots). If you want local mirroring you’ll need to configure this at the zpool (vdisk) level. However, some block I/O workloads might not perform well with ZFS underneath; you’ll have to test for your own use case.

Note that zfs mirroring is far better than RAID controller mirroring or mdraid mirroring, because it’s able to detect and correct bitrot errors.

I do however have a total of 6 SSD’s which I’d like to use in a software RAID 6

RAID6 will perform very poorly for writes, and very poorly for both reads and writes when degraded or when resilvering. I’d strongly recommend against it for VM-type workloads, including Proxmox. However, if it’s just doing long-term archival storage it’ll be OK.

With 6 disks, RAID6 will give you 4 usable. RAID10 or equivalent will give you 3 usable, but far better performance. Disks are cheap, so buy more if you need them. With a RAID10, you are fine if one disk fails, but have a 1-in-5 chance that a second disk failure will lose all data. However, if you have DRBD on top then you’re less worried about that.

Using no local RAID, but DRBD with three replicas, gives you good protection against disk failure.

1 Like