Text Analytics for Tracking Executive Hubris?

Courtesy of Flickr. By NS Newsflash.

The next audacious “off the cuff” statement your CEO makes could tank your company’s stock price in minutes. That’s because machines are increasingly analyzing press conferences, earnings calls and more for “linguistic biomarkers” and possibly placing sell orders accordingly. Indeed, with technology’s ability to mine speech patterns of corporate, political, and social leaders, the old proverb; “A careless talker destroys himself”, rings truer than ever.

Courtesy of Flickr. By NS Newsflash.
Courtesy of Flickr. By NS Newsflash.

Financial Times’ columnist Gillian Tett writes how researchers are starting to analyze corporate and political speech for signs of hubris. By analyzing historical speeches alongside existing speeches from today’s leaders, researchers are identifying “markers of hubris”, where a particular leader may be getting a bit too full of their own accomplishments.

Such communications, according to experts in Tett’s article, increasingly consist of words such as “I”, “me” and “sure” as tell-tale signs of leaders increasingly believing their own hype. And consequently, if such “markers of hubris” can increasingly be identified, they could indicate to stakeholder that it’s  time to take a course of action (e.g. liquidating a given stock position).

Now as you can imagine, there are challenges with this approach. The first difficulty is in identifying which linguistic markers equate to hubris—an admittedly subjective process. The second challenge is establishing hubris as a negative trait. In other words, should increasing hubris and/or narcissism mean that the executive has lost touch with reality? Or that he or she is incapable of driving even better results for their company, agency or government? Surely, the jury is still out for these questions.

Today’s technology has made endeavors such as text mining of executive, political and other communications much more feasible en masse. Streaming technologies can enable near real time analysis, map-reduce type operators can be used for word counts and text analysis, and off the shelf sentiment applications can discern meaning and intent on a line-by-line basis.

When computers are tuned to pour through executive speeches, corporate communications, press interviews and more, such analysis could ultimately indicate whether a company is prone to “excessive optimism”, and help investors and other stakeholders “punch through the hype” of corporate speak. To the naked ear, speech patterns of executives, politicians and other global players probably change little over time. However, if data scientists are able to run current and past communications through text analytics processes, interesting patterns may emerge that could be actionable.

The largest challenge in analyzing executive hubris doesn’t appear to be standing up a technology infrastructure for analytics, especially when cloud based solutions are available. Nor does the actual sentiment analysis seem to be the sticking point, because with enough data scientists, such algorithms can be tweaked for accuracy over time.

The ultimate difficulty is deciding what—if anything to do—when analyses of leader’s speech patterns reveal a pattern of hubris. As an employee, does this mean it’s time to look for another job? As an investor, does this mean it’s time to sell? As a citizen, does this mean it’s time to pressure the government to change its course of action—and if so, how? All good questions for which there are few clear answers.

Regardless, with computers reading the news, it’s more important than ever for leaders of all stripes to be cognizant that stakeholders are watching and acting on their words—often in near real time.

Writer Gillian Tett says that we humans“ instinctively know, but often forget that power not only goes to the head, but also to the tongue.” With this in mind, leaders in political, business and social circles then need to understand that when it comes to signs of arrogance, we’ll not only be watching and listening, but counting too.

The Dirty (Not so Secret) Secret of IT Budgets

Courtesy of Flickr. By Val.Pearl

Some business users believe that every year IT is handed a budget, which is then fully used to drive new and productive ways to enable more business. This is, however, far from reality. In fact, in most instances the lion’s share of the IT budget is dedicated towards supporting legacy code and systems. With so precious few IT dollars to support experimentation with new technologies, it makes sense why pay-per-use cloud computing options are so alluring.

Courtesy of Flickr. By Val.Pearl
Courtesy of Flickr. By Val.Pearl

There’s an interesting story in the Financial Times, where author Gillian Tett discusses how in western economies most of the dollars lent by banks go to supporting existing assets, and not innovation. The problem is highlighted by former UK regulator Adair Turner who says, out of every dollar in credit “created” by UK banks; only 15% of financial flows go into “productive investment”. The other 85% goes to supporting “existing corporate assets, real estate and unsecured personal finance.”

Essentially, there are fewer dollars lent by banks for innovative projects, startups, and new construction with most of the monies dedicated to maintaining existing assets. Sounds a lot like today’s IT budgets.

As evidence, a Cap Gemini report mentions; “Most organizations do not have a clear strategy for retiring legacy applications and continue to spend up to three quarters of their IT budgets just “keeping the lights on” – supporting outdated, redundant and sometimes entirely obsolete systems.”  Now if this “75%” statistic is still in fashion, and there is evidence that it’s accurate, it leaves very little funds for high potential projects like mobile, analytics, and designing new algorithms that solve business problems.

Here’s where cloud computing can make a difference.  Cloud computing infrastructures, platforms and applications often allow users to try before they buy with very little risk. Users can test applications, explore new application functions and features, experiment with data, and stand up analytic capabilities with much less fuss than traditional IT routes. Best of all, much of this “experimentation” can be funded with operating budgets instead of going through the painful process of asking the CFO for a CAPEX check.

Speaking of innovation, the real value of cloud isn’t just the fact that information infrastructure is at the ready and scalable, but more what you can use it for. Take for example, the use of cloud based analytics to drive business value, such as sniffing out fraud in banking and online payment systems, exploring relationships between customers and products, optimizing advertising spend, analyzing warranty data to produce more quality products and many more types of analyses.

These kinds of analytics stretch far beyond the mundane “keeping the lights on” mindset that IT is sometimes associated with, and instead can really show line of business owners how IT can be more than a just a “game manager” but rather a “play-maker”.

Fortunately, the modernization of legacy systems is a top priority for CIOs. But much like turning an aircraft carrier, it’s going to take time to make a complete switch from maintaining legacy systems to supporting innovative use cases.

But in the meantime, with cloud, there’s no need to wait to experiment with the latest technologies and/or try before you buy. Cloud infrastructures, platforms and applications are waiting for you. The better question is, are you ready to take advantage of them?

Moving to the Public Cloud? Do the Math First

Math concepts

A manufacturing executive claims that many companies “didn’t do the math” in terms of rushing to outsource key functions to outside suppliers. Are companies making the same mistake in terms of rushing to public cloud computing infrastructures?

Image courtesy of Flickr. By peachy177
Image courtesy of Flickr. By peachy177

The herd mentality—we know it well. Once a given topic (i.e. agile development, Hadoop “Big Data” implementation, etc.) becomes the darling of business and management publications, a gold rush usually follows to implement. Unfortunately, sometimes there isn’t much, or any, thought put into gauging enterprise fit or building a business case for the latest and most fashionable idea.

Take for example the concept of outsourcing. During the early 2000s, cheap labor rates in China and India caused senior managers to see dollar signs as they could cut labor costs nearly in half, while gaining a specialized workforce dedicated to developing and building products and/or servicing customers.

There was a catch however. When considering topics such as delivery lag times, transportation costs, loss of corporate agility, language and communication barriers and more, the so-called cost savings often failed to materialize.

“About 60% of the companies that offshored manufacturing didn’t really do the math,” says Harry Mosler, an MIT-trained engineer who runs the Reshoring Initiative. “They looked only at the labor rate, they didn’t look at the hidden costs.”

The concept of shifting compute needs to public cloud computing infrastructures is an idea gaining traction. As the C-suite contemplates methods to deliver better, respond to market changes faster and reduce costs, cloud is an increasingly tantalizing option. In fact, the market for public cloud computing is said to be $131B in 2013 and growing, according to a tier one analyst firm.

While companies are choosing cloud for myriad reasons, it’s readily apparent that procuring public infrastructure, development platforms or applications from a cloud provider is really just another form of outsourcing.

This then brings some challenges to the forefront, specifically the need to understand the business case and use cases for cloud computing for your own company. And the needs must go beyond simple cost savings analysis.

Don’t make the same mistakes of those executives who rushed to outsourcing in the past decade. Tally up the cost savings, but also spend time diagnosing “hidden risks” of public cloud in terms of well-known issues of costs of downtime/availability, data security/privacy in a multi-tenant environment and data latency.

In addition, think about the level of control you want over your IT infrastructure. Are you comfortable relying on another vendor for critical IT infrastructure needs? In case of the inevitable IT failure or worse case cyber-attack, are you one of those who would want to start working a problem right away, rather than opening a trouble ticket and waiting for an answer?

You’ll also need to consider skill sets (tally those you have, and those you’ll need), in addition to architecting your various workloads for cloud infrastructures.

Please don’t get me wrong. For many companies, sourcing computing needs to public infrastructures makes a lot of sense, but when only supported by a thorough business case, and detailed risk analysis.  You’ll need a thorough understanding of what you’re jumping into before “joining the herd,” especially when an on-premise solution might work better.

In other words, “do the math” (figuratively and literally).

Better Capacity Management ABCs: “Always Be (Thinking) Cloud”

Courtesy of Flickr. By M Hooper

Sales personnel have a mantra, “ABC” or “Always Be Closing,” as a reminder to continually drive conversations to selling conclusions or move on. In a world where business conditions remain helter-skelter, traditional IT capacity management techniques are proving insufficient. It’s time to think different – or “ABC”: Always Be (Thinking) Cloud.

Getting more for your IT dollar is a smart strategy, but running your IT assets at the upper limits of utilization—without a plan to get extra and immediate capacity at a moment’s notice—isn’t so brainy. Let me explain why.

Courtesy of Flickr. By M Hooper
Courtesy of Flickr. By M Hooper

Author Nassim Taleb writes in his latest tome, “Anti-Fragility,” about how humans are often unprepared for randomness and thus fooled into believing that tomorrow will be much like today. He says we often expect linear outcomes in a complex and chaotic world, where responses and events are frequently not dished out in a straight line.

What exactly does this mean? Dr. Taleb often bemoans our pre-occupation with efficiency and optimization at the expense of reserving some “slack” in systems.

For example, he cites London’s Heathrow as one of the world’s most “over-optimized” airports. At Heathrow, when everything runs according to plan, planes depart on time and passengers are satisfied with airline travel. However, Dr. Taleb says that because of over-optimization, “the smallest disruption in Heathrow causes 10-15 hour delays.”

Bringing this back to the topic at hand, when a business runs its IT assets at continually high utilization rates it’s perceived as a beneficial and positive outcome. However, running systems at near 100% utilization offers little spare capacity or “slack” to respond to changing market conditions without affecting expectations (i.e. service levels) of existing users.

For example, in the analytics space, running data warehouse and BI servers at high utilization rates makes great business sense, until you realize that business needs constantly change: new users and new applications come online (often as mid-year requests), and data volumes continue to explode at an exponential pace. And we haven’t even yet mentioned corporate M&A activities, special projects from the C-suite, or unexpected bursts of product and sales activity. In a complex and evolving world, solely relying on statistical forecasts (i.e. linear or multiple linear regression analysis) isn’t going to cut it for capacity planning purposes.

On premises “capacity on demand” pricing models and/or cloud computing are possible panaceas for better reacting to business needs by bursting into extra compute, storage and analytic processing when needed. Access to cloud computing can definitely help “reduce the need to forecast” for traffic.

However, many businesses won’t have a plan in place, much less the capability or designed processes—at the ready—to access extra computing power or storage at a moment’s notice. In other words, many IT shops know “the cloud” is out there, but they have no idea how they’d access what they need without a whole lot of research and planning first. By then, the market opportunity may have passed.

Businesses must be ready to scale (where possible) to more capacity in minutes or hours—not days, weeks or months. This likely means having a cloud strategy in place, completion of vendor negotiation (if necessary), adaptable and agile business processes, identifying and architecting workloads for the cloud, and a tested “battle plan” so that when demands for extra resources filter in, you’re ready to respond to whatever the volatile marketplace requires.

Technologies and Analyses in CBS’ Person of Interest


Person of Interest is a broadcast television show on CBS where a “machine” predicts a person most likely to die within 24-48 hours. Then, it’s up to a mercenary and a data scientist to find that person and help them escape their fate. A straight forward plot really, but not so simple in terms of the technologies and analyses behind the scenes that could make a modern day prediction machine a reality. I have taken the liberty of framing some components that could be part of such a project.  Can you help discover more?

CBSIn Person of Interest, “the machine” delivers either a single name or group of names predicted to meet an untimely death. However, in order to predict such an event, the machine must collect and analyze reams of big data and then produce a result set, which is then delivered to “Harold” (the computer scientist).

In real life, such an effort would be a massive undertaking on a national basis, much less by state or city. However, let’s dispense with the enormities—or plausibility of such a scenario and instead see if we can identify various technologies and analyses that could make a modern day “Person of Interest” a reality.

It is useful to think of this analytics challenge in terms of a framework: data sources, data acquisition, data repository, data access and analysis and finally, delivery channels.

First, let’s start with data sources. In Person of Interest, the “machine” collects data from various sources such as interactions from: cameras (images, audio and video), call detail records, voice (landline and mobile), GPS for location data, sensor networks, and text sources (social media, web logs, newspapers, internet etc.). Data sets stored in relational databases that are publicly and not publicly available might also be used for predictive purposes.

Next, data must be assimilated or acquired into a data management repository (most likely a multi-petabyte bank of computer servers). If data are acquired in near real time, they may go into a data warehouse and/or Hadoop cluster (maybe cloud based) for analysis and mining purposes. If data are analyzed in real time, it’s possible that complex event processing technologies (i.e. streams in memory) are used to analyze data “on the fly” and make instant decisions.

Analysis can be done at various points—during data streaming (CEP), in the data warehouse after data ingest (which could be in just a few minutes), or in Hadoop (batch processed).  Along the way, various algorithms may be running which perform functions such as:

  • Pattern analysis – recognizing and matching voice, video, graphics, or other multi-structured data types. Could be mining both structured and multi-structured data sets.
  • Social network (graph) analysis – analyzing nodes and links between persons. Possibly using call detail records, web data (Facebook, Twitter, LinkedIn and more).
  • Sentiment analysis – scanning text to reveal meaning as in when someone says; “I’d kill for that job” – do they really mean they would murder someone, or is this just a figure of speech?
  • Path analysis – what are the most frequent steps, paths and/or destinations by those predicted to be in danger?
  • Affinity analysis – if person X is in a dangerous situation, how many others just like him/her are also in a similar predicament?

It’s also possible that an access layer is needed for BI types of reporting, dashboard, or visualization techniques.

Finally, delivery of the result set –in this case – name of the person “the machine” predicts most likely to be killed in the next twenty four hours, could be sent to a device in the field either a mobile phone, tablet, computer terminal etc.

These are just some of the technologies that would be necessary to make a “real life” prediction machine possible, just like in CBS’ Person of Interest. And I haven’t even discussed networking technologies (internet, intranet, compute fabric etc.), or middleware that would also fit in the equation.

What technologies are missing? What types of analysis are also plausible to bring Person of Interest to life? What’s on the list that should not be? Let’s see if we can solve the puzzle together!

Data Warehouse “as a Service” – A Good Pick for Mid-Sized Companies

Courtesy of Flickr. by Henk Achtereekte

Plenty of mid-sized businesses don’t have the time, talent, or investment dollars to manage a data warehouse environment, much less monitor and maintain these services within their own data center. That’s why companies seeking analytic capabilities are increasingly looking at cloud-based options to shift responsibilities for data warehouse ownership, administration and support to a contracted “as a service” provider.

For mid-sized businesses, cloud computing makes a lot of sense. With cloud, no longer do such businesses have to worry about procuring and maintaining and continually investing in IT resources. Now, companies that might not have been able to afford world-class infrastructure and talent can access such capabilities on a monthly or subscription basis.

Previously, a mid-sized business had three selections in terms of acquiring analytics. With the right talent, they could build their own solution from scratch, or utilize open source applications – a very impractical approach for small to medium enterprises (SMEs). Another common alternative was to procure ‘off-the-shelf’ applications and/or consulting resources from mid-tier system integrators to cobble together a working solution to meet business needs. These two choices (build vs. buy) in most cases still required a company to staff and manage the service within their own data center.

Courtesy of Flickr. by Henk Achtereekte
Courtesy of Flickr. by Henk Achtereekte

A third option was to get out of the service support business altogether by contracting with a hosting provider to provide network connectivity, security, and monitoring of the data warehouse environment.

While hosting is an attractive choice, mid-sized companies still must maintain responsibility for purchasing technology assets, and application DBA support, providing backup/archive/restore activities, application tuning, and the security protection of their business intelligence assets, among other things. These data warehousing options – build, buy, and/or host are still available today.

However, some medium sized enterprises are looking to cloud computing models as a fourth option. For companies seeking analytics capabilities to manage and optimize their business with the ultimate goal of delivering value to their business and their customers, another intriguing delivery model is acquiring data warehouse resources “as a service”.

More than hosting, cloud based Data Warehousing “as a Service” (DWaaS) typically provides an integrated solution of hardware, software and services in a bundled package. These as-a-service offers may include monitoring, securing, maintenance and support for entire data warehouse environment including data integration, core data warehouse infrastructure and business intelligence applications.

DWaaS is seen as a good choice for enterprises seeking an alternative to owning, managing and investing upfront for information technology. And CIOs and CFOs for mid-sized businesses are finding “as a service” delivery models especially valuable because many lack capital budgets to acquire technology, or have trouble affording the expertise needed to run a data warehouse environment.

These are all good reasons to think about the “as a service” delivery model for data warehousing. But there are added benefits in terms of a shifting of responsibility to an “as a service” provider.

First, solution ownership in terms of capital expenditures becomes a thing of the past. No longer must CFOs worry about keeping data warehousing assets on the corporate books. With DWaaS capabilities, “solution ownership” transfers to the service provider, thus freeing up capital budgets to acquire other business assets and ultimately reduces investment risk in buying rapidly depreciating information technology.

In addition, with DWaaS, data warehouse support should be included in the service bundle. This means that DBA activities such as database and system administration, backup/archive/recovery, security and performance and capacity management are all likely included in one monthly price.  ETL and BI support might also extend to monitoring of data integration jobs for completion and ensuring delivery of daily/weekly/monthly reports.

Thus, DWaaS should be a complete, integrated and managed service offer—a very appealing choice for mid-sized companies. These types of cloud based service offers are appropriate for companies that don’t have the time, resources or upfront capital expenditures to acquire advanced capabilities that were once limited to much larger type companies.

In terms of taking advantage of the power of analytics, who says big companies should have all the fun?

Where is the Cloud?

server farm

When the term “cloud computing” comes to mind, it’s fair to say that most people think of it as some nebulous group of computers in the sky delivering content to mobile devices and workstations whenever it’s required. How far off is that definition, and where exactly is “the cloud”?

Courtesy of Flickr. By M Hooper

In a dusty corner of San Antonio, Texas, the cloud is about to come to life. As a Microsoft corporate VP takes a shovel and firmly plants it into the soil, she proclaims; “The cloud is not the cloud in the sky, it’s what we are about to break ground on (right here).”*

That’s because San Antonio, Prineville (OR), Quincy (Wash), among many other cities across the globe, are now host to massive data centers filled with tens of thousands of blinking computers owned by Microsoft, Rackspace, Facebook, Amazon and others.

Imagine this: racks upon racks of Intel based servers. Multi-colored wires networked from computer to computer. Huge vaults of pipes for cooling and air-conditioning massive computer farms. A few sleepy network engineers scurrying from machine to machine checking connections. Is this the cloud?

Thomas M. Koulopoulos, author of “Cloud Surfing” says that’s part of the story. “(The cloud is) is a heavily monitored, fortified, and secure array of computers that are built with the objective of securing data with multiple layers of physical and cyber security,” he says. But those asking ‘where’s the cloud’ aren’t asking the right question Koulopoulos argues. “This is sort of like asking, where does electricity exist?”

That’s because cloud computing is much more than the device in your hand streaming music, the corporate dashboard on your wirelessly connected tablet, or even megawatt powered data centers.

Instead, think of cloud as a service of computational power, storage and more, much like the service you’d get from a utility company. The cloud allows you to plug into a required capability—whether it’s for print servers or analytics.

The cloud is typically available on a metered basis when demanded, and can be accessed via self-service methods—simply plug in via a portal and access what you need. And it’s delivered via a host of technologies, software, processes, devices and physical locations that power this “service”.

Thomas M. Koulopoulos asserts that where the cloud physically exists doesn’t matter; “What counts instead is the question, ‘is it there when I need it?’” he says.  For people like me, this is too much of a utilitarian approach.  I want to know “the where” of cloud computing.

Coming back to the original question then, the cloud exists—in your connected handheld device, on your laptop, in your data center, in another company’s data center, across millions of miles of fiber optic cables, and wirelessly in the air. The cloud then, really isn’t just a place, it’s more of a system, a massive investment in people, dollars, infrastructure, time and talent.

So where is the cloud? The answer is places seen and unseen. In short – everywhere.

*as told in “The Shadow Factory” by James Bamford.