Server and Services Fault Tolerance

In addition to providing fault tolerance for individual hardware components, some organizations go the extra mile to include the entire server in the fault-tolerant design. Such a design keeps servers and the services they provide up and running. When it comes to server fault tolerance, two key strategies are commonly employed: stand-by servers and server clustering.

Stand-by Servers

Stand-by servers are a fault-tolerant measure in which a second server is configured identically to the first one. The second server can be stored remotely or locally and set up in a failover configuration. In a failover configuration, the secondary server is connected to the primary and ready to take over the server functions at a heartbeat's notice. If the secondary server detects that the primary has failed, it will automatically cut in. Network users will not notice the transition, as there will be little or no disruption in data availability.

The primary server communicates with the secondary server by issuing special notification notices referred to as heartbeats. If the secondary server stops receiving the heartbeat messages, it assume that the primary has died and so assumes the primary server configuration.

Server Clustering

Those companies wanting maximum data availability that have the funds to pay for it can choose to use server clustering. As the name suggests, server clustering involves grouping servers together for the purposes of fault tolerance and load balancing. In this configuration, other servers in the cluster can compensate for the failure of a single server. The failed server will have no impact on the network, and the end users will have no idea that a server has failed.

The clear advantage of server clusters is that they offer the highest level of fault tolerance and data availability. The disadvantages are equally clearcost. The cost of buying a single server can be a huge investment for many organizations; having to buy duplicate servers is far too costly.