I developed a Bash script to automate the installation and configuration of open-source software (i.e., launchpad.net/linuxha). I want to make sure the syntax of this script is perfect so I can use it as a teaching tool to educate people about Linux.
I need to know if there is anything misconfigured with my DRBD syntax.
I get a 503 when trying to access the code via launchpad (https://bazaar.launchpad.net/~tbean74/linuxha/trunk/files). Regardless, if trying to develop this to be as portable as possible and work well in different environments on different hardware. I would advise you use a default DRBD configuration. After all we chose the default values as defaults for a reason.
Please try again. The Launchpad server may be overloaded because of how many people are trying to connect, or they may be performing server maintenance.
Bazaar link worked today. Took a very brief look at the script. Only have a few comments:
Dual Primary as you have it configured is no longer supported in DRBD v9. Just a word of warning, as we’re slowly trying to phase out DRBD v8.4.
The use of after-sb-1pri discard-secondary; is effectively automating potential data-loss. Just because a node is primary after a split doesn’t always mean, in every circumstance, that it has the latest or most valuable data. Granted, with proper STONITH configured, split-brains should really be a non-issue. However, I don’t ever advising this option with production data. I could only really suggest consensus.
While you do have STONITH configured, you’re using the stonith:meatware STONITH agent. I mean, this is perfectly valid for educational or testing purposes, but I couldn’t suggest ever using it otherwise.
My only other thought, is that perhaps this would be better suited for an Ansible playbook? Many years ago I helped develop and maintained a monolithic bash script to automatically deploy an HA cluster similar to what you’ve written here. This was developed for one particular client, with one particular use-case, for one particular set of hardware/environment. While this worked fine with only minor issues over the course of some years. When it came time to overhaul this for DRBD v9 and Pacemaker v2, I pushed hard to move away from a monolithic bash script and instead use Ansible. Which we did and has proven much easier to maintain. Here is an example NFS cluster a colleague of mine wrote: GitHub - kermat/linbit-ansible-nfs: The ansible playbook in this repository will setup an HA NFS cluster for users with credentials/access to LINBIT's repositories. This uses the customer only LINBIT repositories, but should still give you a general idea.
Bazaar link worked today. Took a very brief look at the script. Only have a few comments:
Dual Primary as you have it configured is no longer supported in DRBD v9. Just a word of warning, as we’re slowly trying to phase out DRBD v8.4.
I updated my LinuxHA script using the Dual Primary configuration found in the latest DRBD v9 user guide: https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-enable-dual-primary. If there is anything I overlooked (i.e., outdated DRBD v8.4 syntax), please let me know with specific references so I can compare to the latest DRBD v9 user guide and update my LinuxHA script accordingly.
The use of after-sb-1pri discard-secondary; is effectively automating potential data-loss. Just because a node is primary after a split doesn’t always mean, in every circumstance, that it has the latest or most valuable data. Granted, with proper STONITH configured, split-brains should really be a non-issue. However, I don’t ever advising this option with production data. I could only really suggest consensus.
Thank you. I changed “after-sb-1pri discard-secondary;” to “after-sb-1pri consensus;”.
While you do have STONITH configured, you’re using the stonith:meatware STONITH agent. I mean, this is perfectly valid for educational or testing purposes, but I couldn’t suggest ever using it otherwise.
Thank you for your valuable feedback. I updated my Pacemaker comment to “IMPORTANT: The meatware plugin is only a temporary placeholder. You need to run the ‘stonith -L’ command to list all available plugins and replace meatware with a plugin that matches your hardware.”
I updated my LinuxHA script using the Dual Primary configuration found in the latest DRBD v9 user guide: https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-enable-dual-primary. If there is anything I overlooked (i.e., outdated DRBD v8.4 syntax), please let me know with specific references so I can compare to the latest DRBD v9 user guide and update my LinuxHA script accordingly.
Please note the warning in this section which reads: “In DRBD 9.0.x Dual-Primary mode is limited to exactly two Primaries for the use in live migration.” We leverage dual-primary in DRBD v9 to allow live migrations of VMs on some platforms. How you’re using it, for a clustered filesystem, is untested and unsupported. Should work, but may also scramble your data after sometime. This is something we don’t test or want to support.
I updated my LinuxHA script using the Dual Primary configuration found in the latest DRBD v9 user guide: https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-enable-dual-primary. If there is anything I overlooked (i.e., outdated DRBD v8.4 syntax), please let me know with specific references so I can compare to the latest DRBD v9 user guide and update my LinuxHA script accordingly.
Please note the warning in this section which reads: “In DRBD 9.0.x Dual-Primary mode is limited to exactly two Primaries for the use in live migration.” We leverage dual-primary in DRBD v9 to allow live migrations of VMs on some platforms. How you’re using it, for a clustered filesystem, is untested and unsupported. Should work, but may also scramble your data after sometime. This is something we don’t test or want to support.
What is the best way to set up Dual Primary in DRBD v9?
What is the best way to set up Dual Primary in DRBD v9?
You just don’t.
In most clusters dual-primary is completely unneeded and usually slower than a single primary. I can understand how one might think leveraging two systems at the same time might be faster, but in 90% or more cases the additional overhead of the locking, which must traverse the network, needed for a cluster aware filesystem is actually going to be slower than just a single system with a normal, non-cluster aware, filesystem.
Consider all the additional risk of split-brain and scrambled data, and we just decided to drop it in v9. We instead focused on expanding beyond two replicas. In the end it was dangerous, often abused, and the benefits just really weren’t there. So, we just didn’t bother with it.