What Can The Healthcare.Gov Failures Teach Us About Data Center Downtime?

What Can The Healthcare.Gov Failures Teach Us About Data Center Downtime?

it-838384_640

 

Back when the Affordable Care Act was first introduced, the federal government created a website known as Healthcare.gov, intended to serve as a hub for the new healthcare market. Unfortunately, there was a failure somewhere in the planning phase. Almost immediately after being announced, the site collapsed on itself, causing a host of problems with enrollment and access.

Naturally, Obama’s opponents hopped on the outage with gusto.

Thing is, it’s not actually all that surprising that healthcare.gov encountered the problems it did. According to technology expert Doctor Richard Cook, it’s actually surprising that this sort of thing doesn’t happen more often.

“The systems we build are so expensive, and so important that we always seem to run at the edge of failure,” Cook explained in a 2013 keynote. “Every system always operates at its capacity. As soon as there is some improvement or some new technology, we stretch it.” And that, says Cook, is the problem.

““When you have a healthcare.gov experience, everybody says ‘I don’t care what it costs, get it back up!’” he continued. “I don’t care how many people you have to put there, get it up! You don’t care about (cost) anymore, because you’ve got a big problem…yet even accidents and downtime rarely have a permanent effect on the tendency to push systems to the hairy edge of failure.”

To some extent, this is understandable. We want to stretch our infrastructure as much as we possibly can – to get as high a return for as small an expenditure as possible. At the same time, we cannot afford to take things too far; the trick is to find the right balance.

And in order to do that, we need to regularly and extensively test our systems for reliability. In broad strokes, that entails a few things. And not all of them are directly related to reliability testing:

  • Training and qualification programs for your employees – a high level of competency is essential in order to maximize uptime.
  • Frequent inspections and established standards for infrastructure management, security, upkeep, and equipment lifecycles.
  • Documented operational process controls to ensure a proper approach to operations, maintenance, and service delivery.
  • Fully redundant, fully-tested systems to be used in the event of equipment failure.

If that all sounds a bit too complex for your organization to handle, there’s an alternative – you could host with Liberty Center One, and let us take care of things for you. Our highly-trained staff know exactly what’s necessary to keep all the equipment in our facility running, and are available 24/7 to assist you with any technical challenges you might encounter. Contact us today to see what we can offer you.

 


Subscribe with Feedly