In any business, time is money. This is especially true on the IT side of a company. Network failures and outages, called network downtime, can cost companies thousands of dollars in lost revenue, lost productivity and recovery costs. On top of these costs, downtime can be frustrating for your business and its employees, particularly for the IT department.
So what exactly is network downtime and how can it be fixed? In this article, we’ll explain what network downtime is, why it happens and how to prevent network downtime in your business.
What Is Network Downtime?
Downtime refers to periods when a system cannot complete its primary function. Depending on the situation, this system may be temporarily unavailable, offline or completely unable to operate. Downtime may apply to a single application, computer, server or entire network. If a critical component of the network goes down, this can result in network downtime.
Depending on the nature of a company, network downtime can look very different. Network downtime within a retail business may result in point-of-sale (POS) terminals not working or phones going down, leaving the business unable to make sales. For a service provider, this may look like an inoperable portal, cutting off service to its customers. Regardless of what it looks like, network downtime is a massive loss of service that impacts the company’s network and functionality.
What Is Unplanned and Planned Downtime?
Not all downtime is the same. Downtime for a network is split into two types — planned and unplanned downtime. So what is planned downtime versus unplanned downtime?
Planned downtime is a period where the IT department intentionally takes down the network to complete scheduled maintenance and upgrades. While the network is not useable at this time, planned downtime is essential to ensure that the network functions optimally in the long term.
Unplanned downtime is another story. This is an unexpected network outage that can occur at any time due to unforeseen system failures. Unplanned downtime can occur as a result of many different failures, including hardware and software malfunctions, operator mistakes or even cyberattacks. This is the most costly type of downtime, as it can occur during business hours.
Reasons for Planned Downtime
System owners and IT staff set up a planned outage ahead of time. These are typically scheduled during off-hours to minimize service interruptions and sale losses. Planned downtime can facilitate many IT maintenance tasks, including the following:
- System diagnostics: IT staff can run diagnostic tests during this time to identify and isolate potential problems.
- Hardware replacements: IT can take down applicable systems during network downtime to replace outdated or malfunctioning hardware.
- Network repairs: Staff may use a planned network downtime to repair hardware, restart certain systems or perform software patches and maintenance.
- Configuration updates: Planned downtime may be used to change the network configuration to make updates or fix errors and omissions.
- Application updates: Especially in the case of essential applications, network downtime can be used to switch out, update or reconfigure network applications.
- Expected natural events: In some cases, a network may be taken down in anticipation of a natural event, such as an oncoming storm or power outage.
Planned downtime can sometimes be avoided or mitigated by implementing a rolling upgrade schedule, where the IT team takes down portions of the system for upgrades and maintenance without shutting down the entire network. When planned downtime is absolutely necessary, however, it is essential to communicate the downtime and schedule it carefully to avoid busy periods.
Reasons for Unplanned Downtime
Out of the two types of downtime, unplanned downtime is the most harmful to a business. So what is unplanned downtime? Essentially, this is any network downtime that is not expected. As for what causes network downtime, there are many reasons why a network may fail unexpectedly. Some of these causes of network downtime are explained in detail below:
- Human error: Computers don’t make mistakes, but when humans are involved, errors can happen. The more humans involved in the system, the more likely human error can occur. These mistakes can be as simple as accidentally unplugging essential hardware, following outdated procedures or taking an ill-advised technical shortcut. Regardless, human error is the most common cause of unplanned network downtime. In one survey, 97% of IT personnel stated that human error is the cause or a contributing factor in at least some network outages.
- Understaffed IT departments: A well-staffed IT department is essential for keeping networks, servers and hardware running smoothly. Unfortunately, not all companies allocate sufficient funds and personnel to ensure that their IT departments are adequately staffed. Short-staffed IT departments mean that staff is spread thin trying to maintain and support daily operations. For this reason, they may not have the time and resources to monitor the network or perform sufficient maintenance. As a result, the network is at an increased risk of unplanned downtime.
- Outdated equipment and software: The older the components of a network are, the more likely they can fail and trigger a system outage. With continuous updates and technological advancements, hardware and software systems become outdated within the span of a few years, resulting in reduced performance and system crashes. Because of this threat to network functionality, it is essential to take regular inventory of IT components and proactively plan necessary upgrades.
- Hardware failures: Engineering has allowed hardware to have significantly increased functional lives, but network devices will break down eventually. Outdated hardware, as previously noted, is especially vulnerable to failure, but hardware problems can occur even in newer equipment. While built-in redundancies can help mitigate the effects of hardware failure, this isn’t always possible to achieve for smaller businesses, resulting in network downtime due to a single point of failure.
- Server bugs: Server bugs and vulnerabilities also pose a significant threat to performance. Any IT professional knows that keeping server operating systems up to date is necessary, but these need to be done right. If a patch isn’t applied quickly, it can lead to the system being vulnerable to bugs and holes the patch was designed to fix. On the other hand, if a patch is applied without being tested, it can result in applications being corrupted to the point of failure. The best solution is to test patches immediately and thoroughly when they come available and apply them as soon as tests are complete.
- Incorrect configurations: incorrect device configurations are another significant cause of network downtime. Configuration changes can create outages if done incorrectly. A study conducted at the University of Michigan found that 36% of router problems resulting in downtime were a direct result of configuration errors.
- Incompatible changes: Unlike configuration errors, incompatible changes occur when an intended change does not work with the systems and equipment already in place. One survey found that 44% of IT professionals agreed that incompatible network changes resulted in downtime or performance problems several times a year.
- Power outages: Power failures happen unexpectedly and affect every system within a network. These unexpected outages can be mitigated by uninterruptible power supply (UPS) and generator systems, but it is essential to test these power backup systems regularly and maintain them to ensure functionality.
- Natural disasters: Natural disasters represent a small portion of network downtime causes, but they can be devastating for business networks affected. Unexpected natural disasters such as storms, earthquakes, and tornadoes can take down power services and communications and even destroy hardware.
While some causes of network downtime cannot be avoided, many of them can be minimized with a fully staffed IT department, regular maintenance protocols and the use of network monitoring software to catch problems before they take down the network.
The Cost of Network Downtime
When systems go down, it can represent massive losses — according to Gartner, companies lose an average of $5,600 per minute of network downtime or over $300,00 per hour. While companies can schedule planned network downtime to minimize these costs, unplanned downtime can result in significant unexpected costs, which can be especially painful for smaller businesses. But where do these costs come from?
The costs of downtime come from four primary sources, explained in detail below:
- Lost revenue: The primary cost of network downtime is the loss of revenue due to being unable to provide critical services to customers. For example, if your customer service team cannot access an essential system, such as a POS terminal, you may lose current or potential customers and their sales.
- Lost productivity: Outages of essential work systems may result in employees being unable to work entirely. As a result, employees are being paid for the time they’re not working, while the IT team may be working overtime to perform maintenance or fix the source of the downtime.
- Recovery costs: There are several IT costs incurred while fixing the source of the downtime. These include the overtime, repair and replacement costs needed to remedy the issue. Also, network failures can result in a breach of a service level agreement (SLA), which may result in the company losing certification or incurring penalty fees. On top of this, data losses and damage to customers can result in legal costs.
- Intangible costs: Finally, multiple costs are unquantifiable but contribute to the total losses incurred by network downtime. These include increased inefficiencies, losses in customer and employee confidence and even reduced business competitiveness.
Most companies quantify downtime by calculating productivity and revenue losses, but recovery and intangible costs are important to consider as well, as they can result in increased long-term costs following a period of downtime.
How to Communicate Network Downtime
Regardless of why downtime occurs, when it happens, it’s essential to communicate with all affected staff. The Joint Commission International, which directs compliance for hospitals, recommends that clear, timely and accurate communication of downtime progress is best in any downtime situation, planned or unplanned. This is good advice for any industry. Quick communication reduces staff stress and minimizes the distractions for the IT department by reducing inquiries about the downtime event.
In a planned downtime event, early communication to all affected employees will help them prepare appropriately. In these communications, including the following information:
- All systems and applications expected to be down
- Which departments and service areas will be affected
- The start time and expected duration of the downtime
- The reason for the downtime
- Any changes expected after the downtime is complete, such as system enhancements
In an unplanned downtime event, communicate immediately following the discovery of the event. Using whatever communication channels are available, convey the following information to all affected staff members:
- All systems and applications affected by the downtime
- IT’s awareness of and work toward resolving the downtime
- Any expected effects on external customers
- The reason for the downtime, if known
- The expected duration of the downtime
In addition to the initial communication, it may also be wise to communicate when the downtime event is over. Whether a planned or unplanned downtime event, communicate the resolution immediately to all affected parties and direct them to contact the IT team if they are still encountering issues.
How to Calculate Network Downtime
The ideal situation for any business is that their network would never go down. However, downtime, whether planned or unplanned, is inevitable. Because of this, it’s important to know how network downtime is calculated and how to interpret these calculations when provided by your service providers. It’s also important to know what uptime and availability mean within the context of network downtime.
First, let’s define uptime versus availability. These terms are often used synonymously, but mean slightly different things and are expressed in differing units:
- Uptime: This term is used to refer to the amount of time that a network or system is working properly. It is expressed in units of time, such as years, months, days, minutes and seconds. In other words, it is the time when you are not experiencing network downtime.
- Availability: Availability is the percentage of time within a time interval in which a network or system is working properly. For example, if the network is down for a full day within a calendar month, that means that the system was up for 29 out of 30 days, resulting in an average availability of 96.666% for that month.
Companies often boast their uptime using terms of availability. For example, a cloud service provider can advertise a guaranteed availability of 99% within a calendar year for one of their servers. This means that you could expect up to 3.65 days of downtime within a year or 7.2 hours of downtime within a month.
When talking about availability, you may hear the term “five nines.” This is a highly desired availability of 99.999%, which translates to about 5 minutes of downtime a year. Practically speaking, this is as close to 100% availability as a company can expect. While desirable, this level of availability is also costly, as it requires significant redundancies to maintain. This is usually only found among large service providers because of the costs needed to maintain this level of availability. Keep in mind that service level providers boasting five nines will also tend to be more expensive to work with because of the costs involved in maintaining their high level of availability.
This brings us to how to measure downtime and availability in your own company. The formula is very simple — availability = uptime/total time. Below is a step-by-step instruction for how to calculate this and what each term means.
- Start by calculating how much network downtime your company experienced within a given period. For example, you can look at the last month of network functionality and find that your network was down for a total of 5 hours and 6 minutes, which converts to 306 minutes.
- Next, take the period for which you are calculating downtime and convert that to the same unit of measurement. In our example, we are calculating for a 30-day month, which converts to 43,200 minutes.
- Subtract the downtime from the total time within the period to find the total uptime. In our example, 43,200 minus 306 equals 42,894, so the company experienced 42,894 minutes of uptime within the month.
- Finally, divide the uptime by the total time. In our example, this would mean you divide 42,894 by 43,200, which gives you 0.99291. Multiply this by 100 to get your percentage availability, which in this case would be 99.291% availability.
It’s important to know how this calculation works, but also note that network monitoring software will often calculate uptime and availability automatically.
How to Prevent Network Downtime
So how do you fix your network downtime to maximize uptime and availability? The key is to minimize risk, focus on maintenance and implement redundancies. By setting up IT systems to prepare for the worst, your company can minimize downtime, enabling you to focus on day-to-day operations. Below are just a few steps any company can take to avoid network downtime:
- Schedule updates and maintenance regularly: First, it is essential to schedule regular maintenance with your IT team. Plan ahead for periods where the team will come in during off-hours to check the stability and security of hardware, software and general systems. If the maintenance requires planned downtime, be particularly careful to communicate to all affected parties and plan ahead to maximize efficiency and productivity during the downtime period.
- Conduct regular server tests: Schedule server tests alongside general IT maintenance to make sure your servers work properly. These tests should also include checking all backup servers, both physical and virtual, as these backups are your company’s lifeline in the event of a server failure.
- Perform facility tests: On top of testing your hardware and software on a regular basis, be sure to also check your facilities. Human error, animal activity, fire hazards and water damage can all pose a threat to the safety of your network hardware. Be sure to perform regular facility checks in addition to IT maintenance, looking specifically for hazards like faulty wires, airflow blockages, tripping hazards and temperature issues.
- Implement network monitoring: Finally, implement systems that can empower your IT team to get a better view of your network. Network monitoring systems continuously check the health of all components within a network and alert your IT team of any problems so they can act immediately.
By implementing these steps, your company can effectively reduce your chances of experiencing catastrophic network downtime. This is especially true if you choose to use high-quality network monitoring software backed by a third-party maintenance provider to augment your IT team’s effectiveness.
Work With a Network Expert
If your company is looking for a third-party maintenance provider to help you avoid network downtime, Worldwide Services can help. Our around-the-clock network operations center services supplement your existing IT team, managing performance and quickly resolving any infrastructure failures. Our services include high-quality network monitoring and infrastructure management, network security and lifecycle management, asset recovery and maintenance programs topped with 24×7 technical support and field services.
But why choose third-party network monitoring? When you work with Worldwide Service’s 24×7 network operations center monitoring and reporting solutions, you can experience the following benefits:
- Increased uptime: Network downtime is costly, but Worldwide Services can help prevent it. With lightning-quick response rates, our services detect, record and resolve issues before they affect your business. This means your company can enjoy maximum uptime so you can focus on your business and its customers.
- Improved visibility: Worldwide Services allows your company to benefit from third-party maintenance while still maintaining full visibility of your network at all times. Our web-based portal allows you to see what we see, including key metrics, active tickets, alarms and trends, so you can watch your infrastructure performance right alongside us.
- Expert advice: On top of our cutting-edge technology and sophisticated software, our staff consists of experts in the industry with decades of experience under their belts. With our deep knowledge of the industry, we can be your go-to resource for solutions.
- Cost savings: Custom solutions from Worldwide Services enhance your network while helping lower your costs. By maximizing uptime and reducing the workload for your IT department, we can help free your teams to focus on day-to-day operations and business objectives.
Contact Worldwide Services today to learn more about our network monitoring services and how we can help you prevent network downtime.