[SOLVED] File corruption with NFS HA Cluster

Hello everyone,

I’ve created a test environment with 3 nodes based on AlmaLinux 9 and followed the how-to guide “NFS High Availability Clustering Using DRBD and Pacemaker on RHEL 9”. And I have a NFS share visible on the network.

The issue is with testing the failover as is stated in Chapter 5.

I did not create a file using “dd” but instead I copied a video file of ~2Gb and when I did “hard” reset, as exemplified in the aforementioned chapter, the cluster did its thing and jumped to one of the available nodes and the file continued copying but the resulted file got corrupted (checked the file hash as seen in the following screenshot).
Screenshot 2024-07-24 at 14.31.12

If I do the same test but instead of “hard” reset I do a normal “reboot” of the primary node the resulting file is good (checked with file hash).

I do know the reason why the file gets corrupted as the primary DRBD node was “hard” killed before was able to send the data to secondary DRBD node.

So my question is : there is a way to configure DRBD to mitigate this situation ?

Best regards

I suspect it might just be write buffers not getting flushed to disk with the hard reboot versus the soft one.

Try mounting the filesystem with -o sync option. In Pacemaker speak that would mean adding options=sync as a parameter to the OCF:heartbeat:Filesystem resource.

This will likely cause a noticeable hit to write performance, but test with the above and see if you can recreate.

1 Like

Thank you @Devin . That was the missing piece of information.

Once the sync option was added to the options present in the pcs command from Chapter 4.2 of the How-to Guide the file copied while doing a “hard” reset has the same hash as the original file.

The performance hit is low as I the test nodes are on a system with NVMe (I will do some test and get the numbers and see exactly the real hit on performance). I will test it on a SATA/SAS system and get back with real life stats.

Hi @Argadonis, I’m curious, were you able to test it on SATA/SAS and did you collect any stats?

@Devin Do I understand correctly that in a system with enterprise disks that support PLP this would not have happened, and the -o sync option is not necessary?

As I understand it PLP is just capacitor backed disk cache. While this would certainly help in the event of a power outage, I don’t think it would do anything to protect any writes saved in system memory and not yet flushed down to the disk.

I suspect you would still want the -o sync option set.