Problems with the Language of Probability

By Dave_s. Flickr - Creative Commons.

The language of probability to statisticians and most scientists is clear—they understand terms such as “correlation”, “statistically significant”, “confidence” and more. However, using probabilistic terminology to communicate the “likelihood” of an event occurring to those untrained in understanding such terms, can in some instances lead to the ruin of careers, companies and in worst cases—loss of life.

By Dave_s. Flickr - Creative Commons.
By Dave_s. Flickr – Creative Commons.

The “After Shocks” is a disturbing article on what some have called “science on trial”. In 2009, a swarm of small earthquakes hit L’Aquila, a small town in the mountains of Italy. This area in central Italy—much like those living near the San Andreas fault in California—is prone to continual earthquakes. In fact, over the centuries there have been tens of thousands of earthquakes in the area of L’Aquila with some having small effect and others killing hundreds of people.

Citizens in L’Aquila constantly live with an underlying fear of “the big one”—an earthquake so big that it shakes buildings to their foundation. So when a series of tremors hit the area in 2009, citizens were keen to get answers to questions such as; “is the big one coming soon?” and “if so, should I be leaving my home?” With most of the homes in the L’Aquila area unenforced—and thus unable to defend against a sizeable earthquake—government authorities came to the rescue by convening a scientific panel to answer citizen questions.

What happened next is a travesty. According to the above article, one of the scientists, Enzo Boschi, examined the earthquake data and concluded; “a large earthquake along the lines of 1703 event (the last one that killed 10,000) is improbable in the short term, but the possibility cannot definitively be excluded.”

Let’s dissect the use of the word “improbable”. Most statisticians would define “improbable” equating to a low probability, but definitely not zero. It appears this is what Boschi meant when he used the term “improbable”.  As further evidence, notice how Boschi qualified his statement; “but the possibility cannot definitively be excluded”. However, the article notes that to the untrained—even worse, the media—improbable means “ain’t gonna happen”.

Long story, short: the small shakes in L’Aquila eventually led to the big one six days later. On April 6, 2009, cumulative probabilities caught up with L’Aquila, with a 6.3 magnitude earthquake killing 308 people.  After spending weeks digging through the rubble, enraged citizens brought a lawsuit against the scientists, accusing them of negligence in not adequately sizing the risks of a large earthquake. In 2012, the scientists were convicted of manslaughter, however in November 2014; they won their appeal and are now free from jail–but not free from the associated costs of their legal defense.

There are challenges in the language of probability. What do terms such as “unlikely”, “serious possibility”, “likely” and others actually mean? The trained scientist might know in his or her mind how they are defined, but does your typical business associate, much less your CEO understand?

Surely, when we have data we can make calculations to estimate the probability of an event. But what happens when we do not? Subjective probability statements—where we’re trying to measure belief—can also get us in trouble if we don’t agree on definitions, especially for events that have never occurred.

We should not eliminate the language of probability. Even though we really don’t know everything that can happen, we still have to run our businesses and predict what’s coming next. However, we must also remember that what is “likely” to us, may be deemed “unlikely” to another—especially if they have a pre-conceived notion in mind. We should also remember that sometimes the use of statistics and probability gives us the illusion of control, where in fact there is none.

As we communicate to those not trained in the language of probability, it is critical to couch our language with key qualifiers of “estimate”,  “educated guess”, “margin of error”, “rare does not mean impossible” and more. We should avoid generalizations and any language that could be misinterpreted as “a sure thing” or “no chance in heck”. Barring that, the best solution is to keep our expert opinions to ourselves.

Text Analytics for Tracking Executive Hubris?

Courtesy of Flickr. By NS Newsflash.

The next audacious “off the cuff” statement your CEO makes could tank your company’s stock price in minutes. That’s because machines are increasingly analyzing press conferences, earnings calls and more for “linguistic biomarkers” and possibly placing sell orders accordingly. Indeed, with technology’s ability to mine speech patterns of corporate, political, and social leaders, the old proverb; “A careless talker destroys himself”, rings truer than ever.

Courtesy of Flickr. By NS Newsflash.
Courtesy of Flickr. By NS Newsflash.

Financial Times’ columnist Gillian Tett writes how researchers are starting to analyze corporate and political speech for signs of hubris. By analyzing historical speeches alongside existing speeches from today’s leaders, researchers are identifying “markers of hubris”, where a particular leader may be getting a bit too full of their own accomplishments.

Such communications, according to experts in Tett’s article, increasingly consist of words such as “I”, “me” and “sure” as tell-tale signs of leaders increasingly believing their own hype. And consequently, if such “markers of hubris” can increasingly be identified, they could indicate to stakeholder that it’s  time to take a course of action (e.g. liquidating a given stock position).

Now as you can imagine, there are challenges with this approach. The first difficulty is in identifying which linguistic markers equate to hubris—an admittedly subjective process. The second challenge is establishing hubris as a negative trait. In other words, should increasing hubris and/or narcissism mean that the executive has lost touch with reality? Or that he or she is incapable of driving even better results for their company, agency or government? Surely, the jury is still out for these questions.

Today’s technology has made endeavors such as text mining of executive, political and other communications much more feasible en masse. Streaming technologies can enable near real time analysis, map-reduce type operators can be used for word counts and text analysis, and off the shelf sentiment applications can discern meaning and intent on a line-by-line basis.

When computers are tuned to pour through executive speeches, corporate communications, press interviews and more, such analysis could ultimately indicate whether a company is prone to “excessive optimism”, and help investors and other stakeholders “punch through the hype” of corporate speak. To the naked ear, speech patterns of executives, politicians and other global players probably change little over time. However, if data scientists are able to run current and past communications through text analytics processes, interesting patterns may emerge that could be actionable.

The largest challenge in analyzing executive hubris doesn’t appear to be standing up a technology infrastructure for analytics, especially when cloud based solutions are available. Nor does the actual sentiment analysis seem to be the sticking point, because with enough data scientists, such algorithms can be tweaked for accuracy over time.

The ultimate difficulty is deciding what—if anything to do—when analyses of leader’s speech patterns reveal a pattern of hubris. As an employee, does this mean it’s time to look for another job? As an investor, does this mean it’s time to sell? As a citizen, does this mean it’s time to pressure the government to change its course of action—and if so, how? All good questions for which there are few clear answers.

Regardless, with computers reading the news, it’s more important than ever for leaders of all stripes to be cognizant that stakeholders are watching and acting on their words—often in near real time.

Writer Gillian Tett says that we humans“ instinctively know, but often forget that power not only goes to the head, but also to the tongue.” With this in mind, leaders in political, business and social circles then need to understand that when it comes to signs of arrogance, we’ll not only be watching and listening, but counting too.

What the Sharing Economy Means for Cloud Computing

Courtesy of Flickr. By laura_m_billings

The sharing movement is in full swing. Innovative “collaborative consumption” companies are helping pool under-utilized assets such homes, boats, cars and then renting them out as services. With the rise of peer-to-peer sharing, it also makes sense that cloud computing—which is compute and storage “resource pooling” and renting—would also gain traction. But just as there are risks in sharing property and other assets, there are also risks in sharing cloud computing infrastructures.

Courtesy of Flickr. By laura_m_billings
Courtesy of Flickr. By laura_m_billings

Jessica Scorpio of Fast Company has it right when she says; “A few years ago, no one would have thought peer-to-peer asset sharing would become such a big thing.”

Indeed, since the launch of Airbnb, more than 4 million people have rented rooms—in their own houses—to complete strangers. And in San Francisco, a new company called FlightCar offers to park and wash your car at the airport –with a catch, that while you’re away on a business trip your car is available as a “rental” to others (at half the cost of other companies).

Intrinsically, the rise of the sharing economy makes sense. Why not take underutilized assets and make them available to others for a temporary amount of time, thus gaining higher utilization and earning extra income?

But to make a sharing economy work, a key issue of “trust” is necessary. In the case of Airbnb, homeowners must trust the company has carefully vetted those who would rent out rooms, especially when security and privacy concerns are very real. However, while there have been a few scary tales in terms of sharing homes, cars, and other services, for the most part the marketplace has run smoothly.

In a similar vein, the big target on the back of cloud computing is trust. Cloud computing providers are still wrestling with perceptions that they are not as safe and trustworthy in terms of privacy, security and availability. And while it’s true that cloud providers have greatly improved in these areas, myriad surveys show there’s still significant work to do in overcoming initial perceptions that sensitive corporate data is often “lost, corrupted or accessed by unauthorized individuals”.

For both cloud computing and the sharing economy, overcoming trust issues is job one. That said, the trend towards sharing is unmistakable. Neal Gorenflo, publisher of Shareable Magazine says; “People don’t want the cognitive load associated with owning.”  The same mindset can also be attributed to global CIOs and CFOs who want someone else to do the work of capitalizing, maintaining, updating and running their IT systems in the cloud while they focus on driving business value.

Forbes estimates that in 2013, $3.5 billion dollars will change hands in the sharing economy. We also know that cloud revenues are on torrid trajectory. If peer-to-peer sharing and cloud computing providers can overcome trust issues, there are few constraints on how big these markets can really be.

Societal Remedies for Algorithms Behaving Badly

Courtesy of Flickr. By 710928003

In a world where computer programs are responsible for wild market swings, advertising fraud and more, it is incumbent upon society to develop rules and possibly laws to keep algorithms—and programmers who write them—from behaving badly.

Courtesy of Flickr. By 710928003
Courtesy of Flickr. By 710928003

In the news, it’s hard to miss cases of algorithms running amok. Take for example, the “Keep Calm and Carry On” debacle, where t-shirts from Solid Gold Bomb Company were offered with variations on the WWII “Keep Calm” propaganda phrase such as “Keep Calm and Choke Her” or “Keep Calm—and Punch Her.” No person in their right mind would sell, much less buy, such an item. However, the combinations were made possible by an algorithm that generated random phrases and added them to the “Keep Calm” moniker.

In another instance, advertising agencies are buying online ads across hundreds of thousands of web properties every day. But according to a Financial Times article, PC hackers are deploying “botnet” algorithms to click on advertisements and run up advertiser costs.  This click-fraud is estimated to cost advertisers more than $6 million a month.

Worse, the “hash crash” on April 23, 2013, trimmed 145 points off the Dow Jones index in a matter of minutes. In this case, the Associated Press Twitter account was hacked by the Syrian Electronic Army, and a post went up mentioning “Two Explosions in the White House…with Barack Obama injured.”  With trading computers reading the news, it took just a few seconds for algorithms to shed positions in stock markets, without really understanding whether the AP tweet was genuine or not.

In the case of the “Keep Calm” and “hash crash” fiascos, companies quickly trotted out apologies and excuses for algorithms behaving badly.  Yet, while admission of guilt with promises to “do better” are appropriate, society can and should demand better outcomes.

First, it is possible to program algorithms to behave more honorably.  For example, IBM’s Watson team noticed that in preparation for its televised Jeopardy event that Watson would sometimes curse.  This was simply a programming issue as Watson would often scour its data sources for the most likely answer to a question, and sometimes those answers contained profanities. Watson programmers realized that a machine cursing on national television wouldn’t go over very well, thus programmers gave Watson a “swear filter” to avoid offensive words.

Second, public opprobrium is a valuable tool. The “Keep Calm” algorithm nightmare was written up in numerous online and mainstream publications such as the New York Times. Companies that don’t program algorithms in an intelligent manner could find their brands highlighted in case studies of “what not to do” for decades to come.

Third, algorithms that perform reckless behavior could (and in the instance of advertising fraud should) get a company into legal hot water. That’s the suggestion of Scott O’Malia, Commissioner of the Commodities Futures Trading Commission. According to a Financial Times article, O’Malia says in stock trading, “reckless behavior” might be “replacing market manipulation” as the standard for prosecuting misbehavior.  What constitutes “reckless” might be up for debate, but it’s clear that more financial companies are trading based on real-time news feeds. Therefore wherever possible, Wall Street quants should be careful to program algorithms to not perform actions that could wipe out financial holdings of others.

Algorithms –by themselves—don’t actually behave badly; after all, they are simply coded to perform actions when a specific set of conditions occurs.

Programmers must realize that in today’s world, with 24 hour news cycles, variables are increasingly correlated. In other words, when one participant moves, a cascade effect is likely to happen. Brands can also be damaged in the blink of an eye when poorly coded algorithms run wild. With this in mind, programmers—and the companies that employ them—need to be more responsible with their algorithmic development and utilize scenario thinking to ensure a cautious approach.

Preserving Big Data to Live Forever

Long term horizon by Irargerich. Courtesy of Flickr.

If anyone knows how to preserve data and information for long term value, it’s the programmers at Internet Archive, based in San Francisco, CA.  In fact, Internet Archive is attempting to capture every webpage, video, television show, MP3 file, or DVD published anywhere in the world. If Internet Archive is seeking to keep and preserve data for centuries, what can we learn from this non-profit about architecting a solution to keep our own data safeguarded and accessible long-term?

Long term horizon by Irargerich. Courtesy of Flickr.
Long term horizon by Irargerich. Courtesy of Flickr.

There’s a fascinating 13-minute documentary on the work of data curators at the Internet Archive. The mission of the Internet Archive is “universal access to all data”. In their efforts to crawl every webpage, scan every book, and make information available to any citizen of the world, the Internet Archive team has designed a system that is resilient, redundant, and highly available.

Preserving knowledge for generations is no easy task. Key components of this massive undertaking include decisions in technology, architecture, data storage, and data accessibility.

First, just about every technology used by Internet Archive, is either open source software or commodity hardware. For web crawling and adding content to their digital archives Heritrix was developed by Internet Archive. To enable full text search on Internet Archive’s website, Nutch running on Hadoop’s file system is utilized to “allow Google-style full-text search of web content, including the same content as it changes over time.”  There are also web sites that mention HBase could also be in the mix as a database technology.

Second, the concepts of redundancy and disaster planning are baked into the overall Internet Archive architecture. The non-profit has servers located in San Francisco, but in keeping a multi-century and beyond vision, Internet Archive mirrors data in Amsterdam and Egypt to weather the volatility of historical events.

Third, many companies struggle to decide what data they should use, archive, or throw away. However with the plummeting cost of hard disk storage, and open source Hadoop, capturing and storing all data in perpetuity is more feasible than ever. For Internet Archive all data are captured and nothing is thrown away.

Finally, it’s one thing to capture and store data, and another to make it accessible. Internet Archive aims to make the world’s knowledge base available to everyone. On the Internet Archive site, users can search and browse through ancient documents, view recorded video from years past and listen to music from artists that no longer walk planet earth. Brewster Kahle, founder of the Internet Archive says, that with a simple internet connection; “A poor kid in Keyna or Kansas can have access to…great works no matter where they are, or when they were (composed).”

Capturing a mountain of multi-structured data (currently 10 petabytes and growing) is an admirable feat, however the real magic lies in Internet Archive’s multi-century vision of making sure the world’s best and most useful knowledge is preserved. Political systems come and go, but with Internet Archive’s Big Data preservation approach, the treasures of the world’s digital content will hopefully exist for centuries to come.

No Gold Medals for “Black Swan” Criers?

Courtesy of Flickr. By Al S

It’s extremely unfashionable to be the “Black Swan” crier in your organization, or the person who warns line of business managers about the heavy impact of extreme but unlikely events.  In fact just the opposite is the norm, where plenty of company executives get rewarded in career growth and compensation for ignoring risks, or sweeping them under the rug for others to tackle down the road.  It’s time to listen—really listen—to what Black Swan criers in your own company are saying.

Courtesy of Flickr. By Al S
Courtesy of Flickr. By Al S

In 18th century England, the town crier would be dressed in fine clothing, given a bell, and told to “cry” or proclaim significant news to merchants and citizens alike. Sometimes the town crier brought bad news—such as tax increases. Fortunately, such a person was protected by laws stating that anyone causing harm to the town crier could be convicted of treason.  Wikipedia notes the phrase; “don’t shoot the messenger” was a real command!

Fast forwarding to our current time, there are few rewards for those who “cry” or warn about the dangers of “Black Swans” or extreme but rare events that carry a high impact.  See here for a list of “Black Swan” events since 2001.

Case in point, leading up to the September 2008 financial crisis, only a few prognosticators could see that quasi-government agencies such as Fannie Mae and Freddie Mac were buying too many no-documentation, no-income (NINJA) loans that could go bust if the US economy went into recession.  Nassim Taleb, author of the Black Swan, was a key figure that needed no more than a glance at these agency’s financials in 2007 to declare, “(They seem) to be sitting on a barrel of dynamite, vulnerable to the slightest hiccup”.

And of course, that dynamite was lit as the global economy teetered on the edge of major depression, and the agencies ultimately lost a combined $15B. Of course, Mr. Taleb was ridiculed as a “clown” and “rabble rouser” for many of his prognostications.

Today’s corporate potential whistleblowers don’t fare much better in terms of warning about everyday risks whether they reside in supply chains, nuclear power plants, cloud computing infrastructures or other such complex systems prone to fragility. It’s much easier to carry on with business as usual, than plan and prepare for events that however unlikely, could end up disabling or dismantling your organization in one fell swoop.

Indeed, Taleb argues it’s much easier for managers to tout what they “did”, rather than what they avoided by taking proper risk management precautions.  “The corporate manager who avoids a loss will often not be rewarded,” he says.

Business executives should not turn their eyes and ears from their own “town criers” preaching Black Swans. While painful to listen to, and sometimes counter-intuitive for today’s “business wisdom”, those closest to your business operations often see what can blow up, long your before mid-level and corporate executives gain visibility.

These “Black Swan” criers may never be personally rewarded with a gold medal for highlighting key risks, but it’s the smart business that ultimately finds a way to seek their opinions and at least scenario plan for their noted “worst case event” outcomes.

Better Capacity Management ABCs: “Always Be (Thinking) Cloud”

Courtesy of Flickr. By M Hooper

Sales personnel have a mantra, “ABC” or “Always Be Closing,” as a reminder to continually drive conversations to selling conclusions or move on. In a world where business conditions remain helter-skelter, traditional IT capacity management techniques are proving insufficient. It’s time to think different – or “ABC”: Always Be (Thinking) Cloud.

Getting more for your IT dollar is a smart strategy, but running your IT assets at the upper limits of utilization—without a plan to get extra and immediate capacity at a moment’s notice—isn’t so brainy. Let me explain why.

Courtesy of Flickr. By M Hooper
Courtesy of Flickr. By M Hooper

Author Nassim Taleb writes in his latest tome, “Anti-Fragility,” about how humans are often unprepared for randomness and thus fooled into believing that tomorrow will be much like today. He says we often expect linear outcomes in a complex and chaotic world, where responses and events are frequently not dished out in a straight line.

What exactly does this mean? Dr. Taleb often bemoans our pre-occupation with efficiency and optimization at the expense of reserving some “slack” in systems.

For example, he cites London’s Heathrow as one of the world’s most “over-optimized” airports. At Heathrow, when everything runs according to plan, planes depart on time and passengers are satisfied with airline travel. However, Dr. Taleb says that because of over-optimization, “the smallest disruption in Heathrow causes 10-15 hour delays.”

Bringing this back to the topic at hand, when a business runs its IT assets at continually high utilization rates it’s perceived as a beneficial and positive outcome. However, running systems at near 100% utilization offers little spare capacity or “slack” to respond to changing market conditions without affecting expectations (i.e. service levels) of existing users.

For example, in the analytics space, running data warehouse and BI servers at high utilization rates makes great business sense, until you realize that business needs constantly change: new users and new applications come online (often as mid-year requests), and data volumes continue to explode at an exponential pace. And we haven’t even yet mentioned corporate M&A activities, special projects from the C-suite, or unexpected bursts of product and sales activity. In a complex and evolving world, solely relying on statistical forecasts (i.e. linear or multiple linear regression analysis) isn’t going to cut it for capacity planning purposes.

On premises “capacity on demand” pricing models and/or cloud computing are possible panaceas for better reacting to business needs by bursting into extra compute, storage and analytic processing when needed. Access to cloud computing can definitely help “reduce the need to forecast” for traffic.

However, many businesses won’t have a plan in place, much less the capability or designed processes—at the ready—to access extra computing power or storage at a moment’s notice. In other words, many IT shops know “the cloud” is out there, but they have no idea how they’d access what they need without a whole lot of research and planning first. By then, the market opportunity may have passed.

Businesses must be ready to scale (where possible) to more capacity in minutes or hours—not days, weeks or months. This likely means having a cloud strategy in place, completion of vendor negotiation (if necessary), adaptable and agile business processes, identifying and architecting workloads for the cloud, and a tested “battle plan” so that when demands for extra resources filter in, you’re ready to respond to whatever the volatile marketplace requires.