High-availability cluster
Encyclopedia : H : HI : HIG : High-availability cluster
High-availability clusters (also known as HA Clusters) are computer clusters that are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail.
HA clusters are often used for key databases, file sharing on a network, business applications, and customer services such as electronic commerce websites.
HA cluster implementations attempt to build redundancy into a cluster to eliminate single points of failure, including multiple network connections and data storage which is multiply connected via Storage area networks.
HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster.
Node configurations
The most common size for an HA cluster is two nodes, since that's the minimum required to provide redundancy, but many clusters consist of many more, sometimes dozens, of nodes. Such configurations can sometimes be categorized into one of the following models:
- Active/Active — traffic intended for the failed node is either passed onto an existing node or load balance across the remaining nodes. This is usually only possible when the nodes utilize a homogenous software configuration.
- Active/Passive — provides a fully redundant instance of each node, which is only brought online when its associated primary node fails. This configuration typically requires the most amount of extra hardware.
- N+1 — provides a single extra node that is brought online to take over the role of the node that has failed. In the case of heterogeneous software configuration on each primary node, the extra node must be universally capable of assuming any of roles of the primary nodes it is responsible for. This normally refers to clusters which have multiple services running simultaneously; in the single service case, this degenerates to Active/Passive.
- N+M - In cases where a single cluster is managing many services, having only one dedicated failover node may not offer sufficient redundancy. In such cases, more than one (M) standby servers are included and available. The number of standby servers is a tradeoff between cost and reliability requirements.
Application Design Requirements
Not every application can run in a high-availability cluster environment, and the necessary design decisions need to be made early in the software design phase. In order to run in a high-availability cluster environment, an application must satify at least the following technical requirements:
- There must be a relatively easy way to start, stop, force-stop, and check the status of the application. In practical terms, this means the application must have a command line interface or scripts to control the application, including support for multiple instances of the application.
- The application must be able to use shared storage (NAS/SAN).
- Most importantly, the application must store as much of its state on non-volatile shared storage as possible. Equally important is the ability to restart on another node at the last state before failure using the saved state from the shared storage.
- Application must not corrupt data if it crashes or restarts from the saved state.
HA Cluster products
There are many commercial implementations of High-Availability clusters for many operating systems.
- OpenVMS - The original clustering OS
- Linux-HA — a commonly used free software HA package for the Linux OS.
- [Novell Cluster Services]
- Veritas Cluster Server
- Sun Cluster
- High Availability Cluster Multiprocessing aka IBM HACMP for AIX
- [OpenClovis ASP] Open Source Solution
- Microsoft Cluster Server (MSCS)
- [GoAhead SelfReliant] for Linux, Windows, VxWorks and Solaris
- [SteelEye LifeKeeper] for Linux and Windows
- [HP ServiceGuard] for HPUX and Linux
See also
References
- Greg Pfister: In Search of Clusters, Prentice Hall, ISBN 0138997098
- Evan Marcus, Hal Stern: Blueprints for High Availability: Designing Resilient Distributed Systems, John Wiley & Sons, ISBN 0471356018
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
