Data Protection Software

 View Only

Worst Black-Out in 22 Years

By Tony Pearson posted Fri April 27, 2007 09:20 AM

  

Originally posted by: TonyPearson


I'm wrapping up my week in Latin America.

Yesterday morning, the entire country of Colombia suffered their worst black-out (power outage) in 22 years. 98% of the country was out for 4 1/2 hours.This is just 5 months after an outage that hit 25% of the country, December 7, 2006.Ironically, this one happened the week I am here explaining the need for Business Continuity plans to IBM Business Partners from Argentina, Peru, Velenzuela, Ecuador and Colombia. As is oftenthe case, people often need a real example to recognize the need for planning is important.

It reminded me of the Northeast Black-out of 2003 that impacted USA and Canada. I was speaking to a crowd of 800 people at the SHARE conference in Washington D.C. when it happened, and hundreds of pagers and cell-phones went off all at the same time. Although we were outside the effected area and had plenty of lighting, we ended up canceling therest of my talk, and many people left immediately to help execute their business continuity plans.Of course, terrorism was immediately assumed, but a final report showed that it was initiated in Ohiodue to overgrown trees, and then propagated due to a software bug to hundreds of other plants.

According to this morning's Bogota newspaper, "El Tiempo", nobody knows the root cause of yesterday's outage. Immediately, the country's leftist rebels were blamed, but now the leading theory is that it was initiated byoperator error (a technician touching something he shouldn't have), and then propagated by a faulty distribution system.

Another example of the need for a robust and resilient infrastructure, and appropropriate business continuity plans.

technorati tags: , , , , ,

2 comments
4 views

Permalink

Comments

Sat April 28, 2007 05:25 PM

Clark, it is so true. A system should be designed for no single points of failure and this should include single points of "operator" failure. The latest I heard before leaving Bogota was that the electric company was contemplating lucite plastic covers that snap over the switches involved, so that a technician would have to take an overt act to open the cover before hitting a switch. At least that's an improvement.

Fri April 27, 2007 03:09 PM

Hey Tony, great meeting you at SNW - San Diego last week.
I hate that the pubilc release is that it was "operator error". Doesn't that stop short of the real problem?
How about the 'system' had inadequate controls that understood that the HUMAN, a fallable entity, would be part of the SYSTEM. Systems should be engineered knowing that parts (humans) will fail (make mistakes).
Reminds me of an explanation of pilot error - that I've always liked: "Pilot error is what a committee of experts determines over months of thorough and exhaustive research with seemingly limitless resources, what the pilot should have done, with roughly 20 seconds to make that same analysis."
Now you've got me going - I'll try and blog something more on this, this afternoon.
..clark (www.storageswitch.com/blog)