Cruise Ship Captains and Normal Accidents

With the official death toll at eleven and plenty more persons missing in the Costa Concordia cruise ship disaster, reports have trickled in that this accident was mostly “operator error” induced on part of the ship’s captain. However, as we will see in any system –whether it is tightly or loosely coupled, there is always probability of accident, leaving us to calculate if we should build more slack or buffers into a given system, or accept risks of leaving things just as they are.

As the Costa Concordia departed from the Italian port of Civitavecchia, no one predicted it would crash and sink.  According to news reports, the ship’s captain wanted to “salute” the island residents of Giglio and took the ship near shallow waters. However, through mis-calculation, the cruise ship hit rocks, took on water and eventually capsized causing passenger injuries and some deaths.

The captain of the Costa Concordia has admitted to navigational error, even though on previous journeys he performed similar maneuvers taking the cruise ship too close to the rocky island.  And while it’s easy to pin blame solely on operator error, or in this instance bold incompetence, we should not forget that accidents such as these should be expected in moderately coupled systems.

We’re all risk takers to some degree. Implicitly, we understand whether we fly commercial airlines, drive a car, or take our family vacation on a cruise there is risk of injury or worse involved.  However with probabilities of “accident” so small, we judge risk vs. reward and usually settle on taking our chances with an understanding that accidents cannot be altogether avoided.

Author Charles Perrow reminds us that in systems—whether they are tightly coupled (little to no slack) or loosely coupled (linear interactions with possible delays) there “will be failures”.  In fact, in the maritime industry, which Perrow calls “moderately coupled”, it’s surprising there are not more boating accidents.

In his book, “Normal Accidents” Perrow mentions how the maritime industry is full of risky behavior where captains make poor judgments in “playing chicken” with other boats, or take sightseeing tours that deviate from planned navigational routes.  In addition, on the sea, operator error abounds where captains “zig when they should have zagged” or make mistakes because they often work 14-hour work days. Aboard a ship, Perrow says, the captain is supreme and with so much riding on one person, it’s not surprising many of the worse maritime disasters are due to poor decisions of the captain.

So we then realize—whatever the system—whether it’s a transformation industry like nuclear power, or one that’s less complex like maritime, there will be failures and there will be accidents. There’s no such thing as a non-risky system.

In fact, since risk in systems cannot be eliminated, we can either take drastic steps such as shutting down such systems (i.e. Germany’s pledge to eliminate nuclear power by 2022) or learn to live with risks by inserting more buffers and slack to stop chain reactions—if possible. This last strategy of course is more costly to society.

Perrow reminds us that when it comes to systems, “nothing is perfect, neither designs, equipment, procedures, operators, supplies or the environment.” Indeed, we know risk is all around us. What we can do however, as business managers and/or members of society, is to think more carefully about risks we’re willing to accept vs. plausible rewards. In addition, where possible we should strongly consider building in redundancies and planning for disaster so that when the inconceivable strikes, we’re much more prepared to deal with potential outcomes.

Questions:

  • It is alleged the Costa Concordia captain took the ship close to shore so that families on the island could see the ship passing. Might system automation such as “auto-pilot” prevented the ship’s captain from deviating course?
  • In the Costa Concordia disaster there is potential for fuel leakage polluting one of the “most picturesque stretches of the Italian coast”.  Perrow asks; “What risks should we promote to enable the private profit of a few?” Thoughts?

Extreme Redundancy – Don’t Leave Home Without It!

Visa’s command and control center—somewhere on the east coast of the United States—is a case study in redundancy with intense security, backups (for everything), and failover processes for system failures. And while most businesses don’t require these same levels of high availability for their own information networks and data centers, Visa can certainly teach a course on risk management to those who don’t believe such redundancy is necessary.

In the past four years, the unthinkable is happening more often than predicted. From the global stock market meltdown of 2008, to the Japanese tsunami disaster at Fukushima, to flooding in Thailand, and many other prominent examples, there’s Black Swans aplenty without signs of abatement.

Most of these events were near impossible to predict, but it’s entirely within the realm of risk management to prepare for heavy tails. The key to countering effects from extreme outlier events, says Black Swan author Nassim Taleb, is to build redundancy into business processes. He counsels companies, enterprises and even countries to “avoid optimization (and) learn to love redundancy.”

And while it may be a progressive example, Visa provides a well documented case study into managing fourth quadrant risk.  A Fast Company article titled, “Visa is Ready for Anything” provides a rare view in Visa’s data centers where 150 million daily transactions are processed.  Taking a page from NASA’s Mission Control, this data center monitors Visa network security, availability and capacity on a 24x7x365 basis. And meeting the definition of a “Tier 4” data center; “every major system—mainframes, air conditioners batteries etc, has a backup.”  With backups for backups and disaster planning in place, this data center is designed to withstand attacks from all sorts of events – terrorism, hackers, and even natural disasters such as earthquake and/or tornado.

While it’s true that Black Swans are about “unknown, unknowns”, and preparation for every type of risk is not only unrealistic but also too expensive, there is much companies can do to build more redundancy into daily operations. Redundancy is especially relevant as “interlocking fragility” of global communication, supply chain, and financial networks means correlations move to one with increasing frequency.

Today’s business culture celebrates “optimization” for every process; in essence living as near the edge as possible without falling off the cliff.  That’s a dangerous strategy for today’s interconnected world, especially because extreme events can take networks offline for hours, days and even weeks, costing companies tens of millions in lost revenues.

Critics of redundancy argue that robustness is too expensive. But Dr. Taleb counters; “Redundancy is like (being) long on an option, you certainly pay for it but it may be necessary for survival.”