One of the best traits that you see touted about cloud hosting is that it offers a distributed network of servers, leading to no single-point-of-failure (SPOF). What are SPOFs, what do they look like in a datacenter, and how can you avoid them on your team and in your technology? Through this discussion, we can better understand how to avoid SPOFs with personnel and why businesses value the anti-SPOF distribution of cloud technology.
- SPOF – what is it?
- SPOFs in the wild
- Check-list to SPOF-proof your team
- Step #1. Figure out who your SPOF people are.
- Step #2. Think about how to rectify your SPOFs.
- Step #3. Create redundancies to mitigate the SPOFs.
- Step #4. Allow your development plan to serve as guidance.
- Cloud hosting and single points of failure
SPOF – what is it?
You might have heard the term single point of failure (SPOF) in passing but not know the exact meaning. Clearly since it has to do with failure, it’s a key topic in networking and something that every datacenter manager wants to avoid as a top operating priority. Specifically, a SPOF is a vulnerability, arising because of a mistake in the way a system or circuit is set up, deployed, or designed, that makes it possible for one fault to crash the whole system.
SPOFs in the wild
If a SPOF exists in a datacenter, it means that the data or certain services can become unavailable just because of a seemingly isolated malfunction. In fact, explains Stephen J. Bigelow in TechTarget, the datacenter can completely go down, if the interdependencies and location are mission-critical enough. “Consider a data center where a single server runs a single application,” he says. “The underlying server hardware would present a single point of failure for the application’s availability.”
Think about it: it’s just like a PC that isn’t back up in any way. If the computer dies or gets hacked, that SPOF means you’ve lost all your files. In a similar way, if that solo server goes down, the app either becomes unreliable or goes down with it. People become unable to get into the program. Data could be lost as well, which is both highly frustrating and highly expensive. A basic idea on the floor in the datacenter is to cluster servers so that more than one copy of the program is running; at least one additional server is used in this scenario.
If the original machine goes down, the additional one jumps in so that users are able to keep using the app. That simple anti-SPOF technique (and you can get much more complex, of course) essentially means that you can hide a failure behind the scenes, allowing users to seamlessly transition to the new server, unaware of any issues (as occurs standardly in cloud hosting environments, invisible to the end-user).
Looking at single-point-of-failure from a different angle shows us how broad this challenge is. Bigelow gives the example of one network switch that supplies networking for an array of servers. That is a SPOF. “If the switch failed (or simply disconnected from its power source), all of the servers connected to that switch would become inaccessible from the remainder of the network,” says Bigelow. “For a large switch, this could render dozens of servers and their workloads inaccessible.” By building in multiple redundancies, in the form of additional network connections and switches, you allow your machines access to a different pathway if a malfunction takes place. That, again, is a basic anti-SPOF method.
Someone who is the engineer of a datacenter is tasked with locating and fixing any SPOF instances within the system, at any level. Now, keep in mind that the head of infrastructure cannot properly create the flexibility and redundancy needed without a reasonable budget. Obviously in the situations listed above, you have to pay for the extra physical servers, switches, cables, and network connections. Anyone who is the architect of a datacenter or any system should consider how mission-critical a workload is against the price to rid the system of all possible SPOFs. In most cases, not every system is mission-critical. There are situations in which an architect might decide that it makes sense to intentionally disregard the SPOF and save the money.
The other option is to go with cloud hosting to get rid of single points of failure through a broad distribution of servers. Before we get into cloud, though, let’s look at SPOF-proofing your staff.
Check-list to SPOF-proof your team
You know to remove single points of failure from your systems, but you may not think to do it with your people as well. That’s important too, advises Tomas Kucera of The Geeky Leader. “One of the most often overlooked tasks of any leader is to plan his succession and to ensure he has a plan how to ensure his team works even if he loses a key contributor,” he says. “We are all so submerged in the daily tasks that we often don’t realize that we fail to make the team resilient and disaster-proof.”
Here are a few tactics you can use to remove single points of failure within your staff:
Step #1. Figure out who your SPOF people are.
Which people within your company are mission-critical? Now, it may seem obvious to point to C-level executives or other leaders. Keep in mind that directors are sometimes easier to replace than others. Really review your people with a few tough questions:
- Does the individual “hold a unique knowledge?” Kucera says to ask yourself. This insight could mean “institutional knowledge, technical or just knowing lots of people that are key to your team survival and no one else knows them or has that knowledge,” he adds.
- Do they have capabilities that are difficult to replace? That could be a top salesperson, someone who’s a big source of mentorship, or a business negotiator who keeps down your costs and gets you what you need to excel.
- Is the individual fulfilling a specialized role that is essential to the seamless viability of your team? This person could serve in some ways as a leader even though their official role might not be executive. It can also be someone who’s pleasant or funny and helps with morale.
Step #2. Think about how to rectify your SPOFs.
Just like with a single point of failure within an IT system, you must have a mitigation process to remove single points of failure from your staff. However, when it comes to people, your solutions won’t be cookie-cutter.
Consider how the flow and/or culture of your workplace might change if each SPOF person were to quit or otherwise stop showing up to work. What types of insights, capabilities, or roles would need to be filled by another party? How would the business be harmed, on any level (internally and externally)? Think about today and about next year.
As you consider these VIP people, look also at your entire workforce. Are there colleagues who share some of the rare qualities of the SPOF? If not, you need redundancies; it might be a good idea to hire.
Step #3. Create redundancies to mitigate the SPOFs.
Did you find a colleague who might be a reasonable backup person? You need to make sure that second person is trained as a potential replacement. Create and closely monitor a development plan to share knowledge as an anti-SPOF maneuver.
Step #4. Allow your development plan to serve as guidance.
You want this development plan to be central to your overall team’s development. Are you going to give someone a new set of work? Are you considering your organizational structure? The SPOF development plan should be reviewed. Any time you make adjustments, try to eliminate SPOF instances. Generally, make sure you aren’t assigning everything to the same top individuals. When you rely heavily on a few people, they become single points of failure. It makes the organization less flexible and more vulnerable.
Moving forward, update the list every few months.
Cloud hosting and single points of failure
A strong cloud hosting infrastructure is decidedly built to be anti-SPOF. Single points of failure no longer need to be part of your company’s technological foundation. At Total Server Solutions, our cloud uses the fastest hardware, coupled with a far-reaching network, making everything easier and SPOF-free. We do it right.