Quick DR & HA through new x-replicas-on-different feature

Now that the xReplicasOnDifferent feature is out in the latest operator release, I figured it was time to revisit this multi-datacenter DRBD setup and share my findings. I hope others experimenting with this feature can also find these insights useful!

Test Environment

  • Cluster Topology: A “stretched” cluster across two main datacenters (DC A & DC B) plus a third “tiebreaker” datacenter (DC C).

  • Hardware: Five CCX33 Hetzner cloud instances.

    • CPU: 8 Dedicated cores
    • RAM: 32GB
    • Network: ~7.05 Gbits/sec. between nodes (tested with iPerf3)
  • Storage Backend: LVM Thin

  • Workload:

    • A single MariaDB:11.4 Pod, with attached DRBD back PVC.
    • Sysbench (oltp_write_only) running in a separate Job, always scheduled on the same node as the database pod.

Benchmark command

# paramerters
sysbench \
 --table-size=100000 \
 --tables=20 \
 --threads=64 \
 --events=100000 \
 --time=5000 \
 oltp_write_only run

Time to start testing! :test_tube: :lab_coat:

First Test: Baseline (2 Replicas, Same DC)

Goal: Get a baseline measurement by having only one extra replica in the same datacenter.

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: lvm-thin
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
  linstor.csi.linbit.com/storagePool: vg1-thin #LVM Thin
  linstor.csi.linbit.com/placementCount: "2"
  linstor.csi.linbit.com/resourceGroup: "lvm-thin"
  linstor.csi.linbit.com/allowRemoteVolumeAccess: "false"

Results:

General statistics:
    total time:                          19.8495s
    total number of events:              100000

Latency (ms):
         min:                                    1.25
         avg:                                   12.69
         max:                                  851.76
         95th percentile:                       33.72
         sum:                              1268680.55

Performance: Around 19.85s total time. This serves as the baseline for comparison. I ran the benchmark multiple times, but results were consistent with little variance.

Second Test: 4 Replicas (2 in Each DC) - Protocol C (Sync)

Goal: Measure the impact of fully synchronous replication (Protocol C) across two datacenters. This setup also includes a diskless tiebreaker node in the third DC.

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-available
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
  linstor.csi.linbit.com/storagePool: vg1-thin #LVM Thin
  linstor.csi.linbit.com/placementCount: "4"
  linstor.csi.linbit.com/resourceGroup: "high-available"
  linstor.csi.linbit.com/allowRemoteVolumeAccess: "false"
  linstor.csi.linbit.com/xReplicasOnDifferent: |
    topology.kubernetes.io/zone: 2

Results:

General statistics:
    total time:                          31.4133s
    total number of events:              100000

Latency (ms):
         min:                                    4.38
         avg:                                   20.08
         max:                                 1227.20
         95th percentile:                       50.11
         sum:                              2008262.40

Performance: About 31.41s total time, roughly 58% slower than the baseline.

Third Test: 4 Replicas (2 in Each DC) - Protocol A (Async)

Goal: Same setup a previous test, but this time compare performance using asynchronous replication between the two primary datacenters. I applied the following NodeConnection to achieve this:

apiVersion: piraeus.io/v1
kind: LinstorNodeConnection
metadata:
  name: selector
spec:
  selector:
    - matchLabels:
        - key: topology.kubernetes.io/region
          op: NotSame
  properties:
    - name: DrbdOptions/Net/protocol
      value: A

Results:

General statistics:
    total time:                          27.3795s
    total number of events:              100000

Latency (ms):
         min:                                    1.29
         avg:                                   17.50
         max:                                 1155.95
         95th percentile:                       49.21
         sum:                              1750079.59

Performance: ~27.38s total time—around 38% slower than the baseline.

Tweaking DRBD options like --al-extents=65534 --sndbuf-size=0 didn’t significantly shift the results.

Thoughts & Questions

I was hoping to see performance closer to the baseline when using asynchronous replication between the DC’s, but there’s still a noticeable slowdown. I guess some overhead is expected when adding more replicas, but I’m curious if there are any tuning parameters or best practices that could help close this gap further.

DRBD Proxy?
Could DRBD Proxy, bring performance closer to the baseline, or is some performance hit inevitable as soon as multiple replicas (even asynchronous ones) are involved?

Methodology & Setup
I tried to keep each test consistent; same Sysbench parameters, same node scheduling, etc. But if you see anything that should be adjusted, let me know. If there’s any additional data or context you’d like me to provide, feel free to ask!

Final Thoughts
In the end, I’m feeling a tiny bit disappointed. Though the performance drop isn’t catastrophic, I was hoping to have all my cake and eat it too! :birthday:

@kermat, @ghernadi, I’d love to hear your insights on whether I might be missing any optimizations or if this is just the expected trade-off. Also, any guidance on diagnosing where the real bottleneck lies would be super helpful!

Either way, this was a lot of fun to test, and I’m looking forward to hearing your thoughts!

1 Like