Hello dear community,
My goal is to implement geo-replication for GFS2 using a single node DRBD setup and wanted to explain what I tried to do in hope somebody wandered along the same (forbidden) path.
Let me start by saying that I researched the topic enough to understand that this use case IS NOT SUPPORTED, endorsed or even in development afaik.
That being said, since I cannot change architecture and/or filesystem at the moment I’ve started to experiment stuff found on various Githubs and other documentation pieces that led me to think that something could be done for cluster locking via ugly workarounds.
In my RHEL cluster I configured Pacemaker to handle DLM and LVMLOCKED in order to keep the cluster in sync.
The physical storage (multipath device) is provided to the RHEL VMs via RDM in VMWare through ISCSI physical bus sharing, and seen by linux as /dev/mapper/mpathx after initializing the multipath module: every VM of the MAIN site have the same VMDK attached and same for the GEO site.
Then I initialize an async resource with DRBD on said mpath devices and after initial sync I format with GFS2 (or create a VG with LVs format with GFS2, this ends up in the same way but it’s slightly longer to test out since you need to create more PCS resources… e.g VG Activate).
Next step is to create PCS resources to mount /dev/drbdx on the DRBD node and the old identifier on the other nodes, which still see the disk as /dev/mapper/mpathx.
This setup can be summarized with 1 cluster comprised of 8 nodes:
- MAIN SITE: 1 DRBD node + 3
- GEO SITE: 1 DRBD node + 3
The DRBD node on the MAIN site is always primary, and there is no automatic failover to the DRBD node in GEO SITE. Since the MAIN node is a DRBD Primary the disk is open and can be mounted in write mode. The same could be true for the GEO node if promoted.
Well what about the rest of the nodes?
I discovered that if a main node is primary the other that share the same disk can write on the multipath filesystem exploiting PCS cluster locking management and native multiwrite support provided by GFS2 with no issue. Basically the node not configured for DRBD see the storage from another “point of view” and don’t know if DRBD replica should be written or not and therefore ignore it, treating their mountpoints as regular GFS2 mounts.
What happens to the replica?
Well as soon as the DRBD node write anything on the storage, it will update its disk status with the new operations performed by the other nodes, thus aligning DRBD device to the state of the storage, which is later replicated on the GEO site.
Hope the following schema helps to understand the infrastructure setup, in yellow are main active nodes and orange the geo nodes (named RMSx or RMSx-geo). Green arrows mean R/W and red arrows mean NoAccess.
I tested the stability and performance using DD and FIO, seems to handle excessive workloads well although when all nodes write concurrently the IOPS degrade quickly.
I tested what happens when the DRBD node is down and the others from the same site keep writing on the multipath: the replica will stop and be out of date until DRBD is brought up again and the new /dev/drbdx drive will sync the new changes found on the storage and start replication again. (On this part I’m less sure and would like advice testwise).
I am stuck in validating such setup once and for all: I sincerely hoped that someone from DRBD development or some experienced user could help out by guiding me through more consistent testing and/or better understanding on how to debug DRBD in terms of sync in progress and state of the replica.
Right now the only way I found to consistently check the other side was to
df -h
from a geo node after mounting the storage in read-only.
Although this serves my purpose it’s the only operation allowed that does not corrupt GFS2, even a simple “ls” causes a disconnection if the node is not on the site which has the DRBD Primary node.
Even if you don’t have suggestions though, please don’t refrain from commenting if you want more details/have ideas about it because I literally hit the wall. It seems to work but I cannot quantify it.
Thanks in advance for reading.
ps: sorry for my flawed english I’m not native speaker