Extreme Availability | Nexbridge Inc.

Category Archives: Extreme Availability

Category for our extreme availability blog topics and posts. Look here for thoughts and advice on the subject.

The Extreme Availability Blog is our attempt at framing what this is and is not, and establishing a solid dialog on the subject in the hopes of raising the expectations we have for our friend the computer. In our daily experience with computers, we see spam, viruses, slow web pages, crashes, spinning clocks, and other minor annoyances that really don’t help our perception of what computers can do. What we don’t see is the infrastructure that quietly runs in the background, making sure that our money is moving around correctly without prying eyes, running our power plants, giving us the security of knowing that we can pick up the phone and call 9-1-1, and letting us go to the grocery store and pay for food with confidence that the computers will be up, even if our credit is maxed-out.

This is a special news flash blog entry from a real life, so do not interrupt your set. We’ll return to our regularly scheduled blog shortly.

My company has established some pretty good controls and redundancies for handling a variety of scenarios relating to software and hardware failure. For example, my book is backed up on a RAID drive on a server and has a redundant copy on my laptop in case the ceiling falls in on my server. The server is under a main support beam in the building and nowhere near a water supply, so it’s even somewhat protected from an earthquake. While flooding is a possibility, the rest of Toronto would be lost first, so I think it’s an acceptable risk. Anyway, to the point. Tonight, a Friday, all of a sudden after 14 months of working properly one of our primary third party software publishing products stopped working because it hit a “genuine version violation”. I’ll leave you to guess who made that product. Anyway, I can’t install the product on another machine because it already hit the violation and wouldn’t be able to be activated. I should mention that we have a very strong anti-piracy policy and I’ve got the original software media on my desk beside me. So now we’re down because we’re unable to use a key resource of our company. The response from the vendor is that the situation will be resolved within 1 business day, which puts it sometime at the end of Monday. The rating assigned by the vendor was “Minimum business impact”. Ha!

So how does this relate to indestructibility? Well, it shouldn’t, but it does. Because a key service is no longer available, our business is interrupted. It wasn’t because of a process issue or a procedure issue in our company. Nor was it a hardware or software failure. It was a flaw something the vendor of our document preparation software did or did not do properly, their assessment of the severity of the issue, and their responsive times – all of which are outside our control.

Vulnerabilities to your ability to deal with failures come from all over the place. Sometimes they’re in your control. Sometimes, like tonight, they’re not. And it’s extremely frustrating and in this case embarrassing. But mostly it’s because of the unplanned and unacceptable outage for an unreasonable amount of time.

More to come on perceptions soon. Come to think of it, this is a partly a perception issue, isn’t it? A difference in the perceived importance of a service from a client’s point of view compared with a vendor’s.

The Indestructible Computing Blog is my attempt at framing what Indestructible Computing is and is not, and establishing a solid dialog on the subject in the hopes of raising the expectations we have for our friend the computer. In our daily experience with computers, we see spam, viruses, slow web pages, crashes, spinning clocks, and other minor annoyances that really don’t help our perception of what computers can do. What we don’t see is the infrastructure that quietly runs in the background, making sure that our money is moving around correctly without prying eyes, running our power plants, giving us the security of knowing that we can pick up the phone and call 9-1-1, and letting us go to the grocery store and pay for food with confidence that the computers will be up, even if our credit is maxed-out. But why are the two types of experiences so different?

Much comes down to expectation. We expect our computers at home to misbehave, but we are intolerant of retailers who lose our online shopping carts after an hour of our latest buying spree. After all, the commodity computers we purchase at our large electronic stores are throw-away, right? But how are they different than the commodity computers our banks and phone companies use? They’re not, actually. Advances in the quality of hardware have benefited everyone alike. So why do we expect our computers at home to stop every so often and become enraged when our banks web sites are down for a few scheduled moments?

I remember one incident at a border crossing, where an immigration agent asked me what I did. My response was that I help companies design systems that will run for twenty to thirty years. She was incensed at the idea and told me that that was impossible. That was an epiphany for me. In a few words, almost two decades of frustration at trying to convey the concept of indestructibility was explained and left me feeling like a pile of broken glass. Perception of the unreliability of computers has become so ingrained in our culture that people simply don’t believe systems could be built to withstand disasters, yet only when the systems are visible. Infrastructure, however, isn’t perceived to be a “computer”, so it, whatever it is, supports our society and had better be always there.

Stay tuned for the next entry where I’ll explore this perception further.

Bringing DevOps to Legacy Platforms

Nexbridge Inc.

Category Archives: Extreme Availability

Sidebar – Software Licenses Impacting Indestructibility?

Welcome to the Indestructible Computing Blog

Bringing DevOps to Legacy Platforms