Beware Big Data Technology Zealotry

Undoubtedly you’ve heard it all before: “Hadoop is the next big thing, why waste your time with a relational database?” or “Hadoop is really only good for the following things” or “Our NoSQL database scales, other solutions don’t.” Invariably, there are hundreds of additional arguments proffered by big data vendors and technology zealots inhabiting organizations just like yours. However, there are few crisp binary choices in technology decision making, especially in today’s heterogeneous big data environments.

Courtesy of Flickr. Creative Commons. By Eden, Janine, and Jim.
Courtesy of Flickr. Creative Commons. By Eden, Janine, and Jim.

Teradata CTO Stephen Brobst has a great story regarding a Stanford technology conference he attended. Apparently in one session there were “shouting matches” between relational database and Hadoop fanatics as to which technology better served customers going forward. Mr. Brobst wasn’t amused, concluding; “As an engineer, my view is that when you see this kind of religious zealotry on either side, both sides are wrong. A good engineer is happy to use good ideas wherever they come from.”

Considering various technology choices for your particular organization is a multi-faceted decision making process. For example, suppose you are investigating a new application and/or database for a mission critical job. Let’s also suppose your existing solution is working “good enough”. However, the industry pundits, bloggers and analysts are hyping and luring you towards the next big thing in technology. At this point, alarm bells should be ringing. Let’s explore why.

First, for companies that are not start-ups, the idea of ripping and replacing an existing and working solution should give every CIO and CTO pause. The use cases enabled by this new technology must significantly stand out.

Second, unless your existing solution is fully depreciated (for on-premises, hardware based solutions), you’re going to have a tough time getting past your CFO. Regardless of your situation, you’ll need compelling calculations for TCO, IRR and ROI.

Third, you will need to investigate whether your company has the skill sets to develop and operate this new environment, or whether they are readily available from outside vendors.

Fourth, consider your risk tolerance or appetite for failure—as in, if this new IT project fails—will it be considered a “drop in the bucket” or could it take down the entire company?

Finally, consider whether you’re succumbing to technology zealotry pitched by your favorite vendor or internal technologist. Oftentimes in technology decision making, the better choice is “and”, not “either”.

For example, more companies are adopting a heterogeneous technology environment for unified information where multiple technologies and approaches work together in unison to meet various needs for reporting, dashboards, visualization, ad-hoc queries, operational applications, predictive analytics, and more. In essence, think more about synergies and inter-operability, not isolated technologies and processes.

In counterpoint, some will argue that technology capabilities increasingly overlap, and with a heterogeneous approach companies might be paying for some features twice. It is true that lines are blurring regarding technology capabilities as some of today’s relational databases can accept and process JSON (previously the purview of NoSQL databases), queries and BI reports can run on Hadoop, and “discovery work” can complete on multiple platforms. However, considering the maturity and design of various competing big data solutions, it does not appear—for the immediate future—that one size will fit all.

When it comes to selecting big data technologies, objectivity and flexibility are paramount. You’ll have to settle on technologies based on your unique business and use cases, risk tolerance, financial situation, analytic readiness and more.

If your big data vendor or favorite company technologist is missing a toolbox or multi-faceted perspective and instead seems to employ a “to a hammer, everything looks like a nail” approach, you might want to look elsewhere for a competing point of view.

When Ideology Reigns Over Data

Increasingly, the mantra of “let the data speak for themselves” is falling by the wayside and ideology promotion is zooming down the fast lane. There are dangers to reputations, companies and global economies when researchers and/or statisticians either see what they want to see—despite the data, or worse, gently massage data to get “the right results.”

Courtesy of Flickr. By Windell Oskay
Courtesy of Flickr. By Windell Oskay

Economist Thomas Piketty is in the news. After publishing his treatise “Capital in the Twenty First Century”, Mr. Piketty was lauded by world leaders, fellow economists, and political commentators for bringing data and analysis to the perceived problem of growing income inequality.

In his book, Mr. Piketty posits that while wealth and income were grossly unequally distributed through the industrial revolution era, the advent of World Wars I and II changed the wealth dynamic as tax raises helped pay for war recovery and social safety nets. Then, after the early 1970s, Piketty claims that once again his data show the top 1-10% of earners take more than their fair share. In Capital, Piketty’s prescriptions to remedy wealth inequality include an annual tax on capital and harsh taxation of up to80% for the highest earners.

In this age of sharing and transparency, Mr. Piketty received acclaim for publishing his data sets and Excel spreadsheets for the entire world to see. However, this bold move could also prove to be his downfall.

The Financial Times, in a series of recent articles, claims that Piketty’s data and Excel spreadsheets don’t exactly line up with his conclusions. “The FT found mistakes and unexplained entries in his spreadsheet,” the paper reports. The articles also mention that a host of “transcription errors”, “incorrect formulas” and “cherry-picked” data mar an otherwise serious body of work.

Once all the above errors are corrected, the FT concludes; “There is little evidence in Professor Piketty’s original sources to bear out the thesis that an increasing share of total wealth is held by the richest few.” In other words, ouch!

Here’s part of the problem; while income data are somewhat hard to piece together, wealth data for the past 100 years is even harder to find because of data quality and collection issues. As such, the data are bound to be of dubious quality and/or incomplete. In addition, it appears that Piketty could have used some friends to check and double check his spreadsheet calculations to save him the Ken Rogoff/Carmen Reinhardt treatment.

In working with data, errors come with the territory and hopefully they are minimal. There is a more serious issue for any data worker however; seeing what you want to see, even if the evidence says otherwise.

For example, Nicolas Baverez, a French economist raised issues with Piketty’s data collection approach and “biased interpretation” of those data long before the FT report.  Furthermore, Baverez thinks that Piketty had a conclusion in mind before he analyzed the data. In the magazine Le Point, Baverez writes; “Thomas Piketty has chosen to place himself under the shadow of (Karl Marx), placing unlimited accumulation of capital in the center of his thinking”.

The point of this particular article is not to knock down Mr. Piketty, nor his lengthy and researched tome. Indeed we should not be so dismissive of Mr. Piketty’s larger message that there appears to be an increasing gap between haves and have nots, especially in terms of exorbitant CEO pay, stagnant middle class wages, and reduced safety net for the poorest Western citizens.

But Piketty appeared to have a solution in mind before he found a problem. He will readily admit; “I am in favor of wealth taxation.”  When ideology drives any data driven approach, it becomes just a little easier to discard data, observations and evidence that don’t exactly line up with what you’re trying to prove.

In 1977, statistician John W. Tukey said; “The greatest value of a picture is when it forces us to notice what we never expected to see.” Good science is the search for causes and explanations, sans any dogma, and willingness to accept outcomes contrary to our initial hypothesis. If we want true knowledge discovery, there can be no other way.

 

The High Cost of Low Quality IT

In times of tight corporate budgets, everyone wants “a deal.” But there is often a price to be paid for low quality, especially when IT and purchasing managers aren’t comparing apples to apples in terms of technology capability or experienced implementation personnel.  Indeed, focusing on the lowest “negotiated price” is a recipe for vendor and customer re-work, delayed projects, cost overruns and irrecoverable business value.

Courtesy of Flickr. By Rene Schwietzke
Courtesy of Flickr. By Rene Schwietzke

Financial Times columnist Michael Skapinker recently lamented about the terrible quality of his dress shirts.  In years prior, his shirts would last two to three years. However, as of late, his shirts –laundered once a week—now only last three months.

Of course, this equates to a terrible hit to Mr. Skapinker’s clothing budget, not to mention environmental costs in producing, packaging, and discarding sub-standard clothing.  Consumers, Skapinker says, should “start searching out companies that sell more durable clothes. They may cost more, but should prove less expensive in the long run.”

Much like it’s short sighted to buy low quality shirts that don’t last very long, it’s also very tempting to select the low cost provider for technology or implementation, especially if they meet today’s immediate needs. The mindset then is that tomorrow can worry about itself.

This myopic thinking is exacerbated by the rise of the procurement office.  Today’s procurement offices are highly motivated by cost control. In fact, some are goaled to keep costs down. This of course, can be dangerous because in this model procurement professionals have little to no “skin in the game”. Meaning, if something goes wrong with the IT implementation, procurement has no exposure to the damage.

Now to be fair, some procurement offices are more strategic and are involved in IT lifecycle processes. From requirements, to request for proposal, to final sign-off on the deal, procurement is working hand-in-hand with IT the entire time.  In this model, the procurement department (and IT) wants the best price of course, but they’re also looking for the best long-term value. However, the cost conscious procurement department seems to be gaining momentum, especially in this era of skimpy corporate budgets.

Ultimately, technology purchases and implementations aren’t like buying widgets. A half-baked solution full of “second choice” technologies may end up being unusable to end-users, especially over a prolonged period of time. And cut-rate implementations that are seriously delayed or over-budget can translate into lost revenues, and/or delayed time to market.

When evaluating information technology (especially for new solutions), make sure to compare specs to specs, technical capabilities to capabilities, and implementation expertise to expertise.

Some questions to consider: Is there a 1:1 match in each vendor’s technologies? Will the technical solution implemented today scale for business user needs next year or in three years? What does the technology support model look like, and what are initial versus long term costs? Is the actual vendor supporting the product or have they outsourced support to a third party?

For the implementation vendor make sure to evaluate personnel, service experience, customer references, methodologies, and overall capabilities. Also be wary of low service prices as some vendors are able to arrive at cut rates by dumping a school bus of college graduates on your project (which of course then learn on your dime!). The more complex your project, the more you should be concerned with hiring experienced service companies.

A discounted price may initially look like a bargain. But there’s a cost to quality. If you’re sold on a particular (higher priced) technology or implementation vendor don’t let procurement talk you out of it. And if you cannot answer the questions listed above with confidence, it’s likely that the bargain price you’re offered by technology or implementation vendor X is really no bargain at all.

 

Be Wary of the Science of Hiring

Like it or not, “people analytics” are here to stay. But that doesn’t mean companies should put all their eggs in one basket and turn hiring and people management over to the algorithms. In fact, while reliance on experience/intuition to hire “the right person” is rife with biases, there’s also danger in over-reliance on HR analytics to find and cultivate the ultimate workforce.

Courtesy of Flickr. By coryccreamer
Courtesy of Flickr. By coryccreamer

The human workforce appears to be ripe with promise for analytics. After all, if companies can figure out a better way to measure the potential “fit” of employees to various roles and responsibilities, subsequent productivity improvements could be worth millions of dollars.  In this vein, HR analytics is the latest rage—where algorithms team through mountains of workforce data to identify the best candidates and predict which ones will have lasting success.

According to an article in Atlantic Magazine, efforts to quantify and measure the right factors in hiring and development have existed since the 1950s. Employers administered tests for IQ, math, vocabulary, vocational interest and personality to find key criteria that would help them acquire and maintain a vibrant workforce. However, with the Civil Rights Act of 1964, some of those practices were pushed aside due to possible bias in test formulation and administration.

Enter “Big Data”. Today, data scarcity is no longer the norm. In actuality, there’s an abundance of data on candidates who are either eager to supply them, or ignorant of the digital footprint they’ve left since leaving elementary school. And while personality tests are no longer in vogue, new types of applicant “tests” have emerged where applicants are encouraged to play games that watch—and measure how they solve problems and navigate obstacles—in online dungeons or fictitious dining establishments.

Capturing “Big Data” seems to be the least of challenges in workforce analytics. The larger issues are identifying key criteria for what makes a successful employee—and discerning how those criteria relate and interplay with each other.  For example, let’s say you’ve stumbled upon nirvana and found two key criteria for employee longevity.  Hire for that criteria and now you may have more loyal employees, but you still need to account and screen for “aptitude, skills, personal history, psychological stability, discretion”, work ethic and more. And how does one weight these criteria in a hiring model?

Next, presuming you’ve developed a reliable analytic model, it’s important to determine under which circumstances the model works.  In other words, does a model that works for hiring hamburger flippers in New York, also work for the same role in Wichita, Kansas?  Does seasonality have a play? Does weather? Does it matter the size of the company, or the prestige of its brand? Does the model work in economic recessions and expansions? As you can see, discovering all relevant attributes for “hiring the right person” in a given industry, much less role, and then weighting them appropriately is a challenge for the ages.

Worse, once your company has a working analytic model for human resource management, it’s important to not completely substitute it for subjective judgment.  For example in the Atlantic Magazine article, a high tech recruiting manager lamented: “Some of our hiring managers don’t even want to interview anymore, they just want to hire the people with the highest scores.”  It probably goes without saying, but this is surely a recipe for hiring disaster.

While HR analytics seems to have room to run, there’s still the outstanding question of whether “the numbers” matter at all in hiring the right person. For instance, Philadelphia Eagles coach, Chip Kelly was recently asked why he hired his current defensive coordinator, who had less than stellar numbers in his last stint with the Arizona Cardinals.

Chip Kelly responded: “I think people get so caught up in statistics that sometimes it’s baffling to me. You may look at a guy and say, ‘Well, they were in the bottom of the league defensively.’ Well, they had 13 starters out. They should be at the bottom of the league defensively.”

He continued: “I hired [former Oregon offensive coordinator and current Oregon head coach] Mark Helfrich as our offensive coordinator when I was at the University of Oregon. Their numbers were not great at Colorado. But you sit down and talk football with Helf for about 10 minutes. He’s a pretty sharp guy and really brought a lot to the table, and he’s done an outstanding job.”

Efficient data capture, data quality, proper algorithmic development and spurious correlations in too much big data are just a few of the problems yet to be solved in HR analytics. However, that won’t stop the data scientists from trying. Ultimately, the best hires won’t come exclusively from HR analytics, but will be paired with executive (subjective) judgment to find the ideal candidate for a given role. However, in the meantime, buckle your seatbelt for much more use of HR analytics. It’s going to be a bumpy ride.

 

Text Analytics for Tracking Executive Hubris?

The next audacious “off the cuff” statement your CEO makes could tank your company’s stock price in minutes. That’s because machines are increasingly analyzing press conferences, earnings calls and more for “linguistic biomarkers” and possibly placing sell orders accordingly. Indeed, with technology’s ability to mine speech patterns of corporate, political, and social leaders, the old proverb; “A careless talker destroys himself”, rings truer than ever.

Courtesy of Flickr. By NS Newsflash.
Courtesy of Flickr. By NS Newsflash.

Financial Times’ columnist Gillian Tett writes how researchers are starting to analyze corporate and political speech for signs of hubris. By analyzing historical speeches alongside existing speeches from today’s leaders, researchers are identifying “markers of hubris”, where a particular leader may be getting a bit too full of their own accomplishments.

Such communications, according to experts in Tett’s article, increasingly consist of words such as “I”, “me” and “sure” as tell-tale signs of leaders increasingly believing their own hype. And consequently, if such “markers of hubris” can increasingly be identified, they could indicate to stakeholder that it’s  time to take a course of action (e.g. liquidating a given stock position).

Now as you can imagine, there are challenges with this approach. The first difficulty is in identifying which linguistic markers equate to hubris—an admittedly subjective process. The second challenge is establishing hubris as a negative trait. In other words, should increasing hubris and/or narcissism mean that the executive has lost touch with reality? Or that he or she is incapable of driving even better results for their company, agency or government? Surely, the jury is still out for these questions.

Today’s technology has made endeavors such as text mining of executive, political and other communications much more feasible en masse. Streaming technologies can enable near real time analysis, map-reduce type operators can be used for word counts and text analysis, and off the shelf sentiment applications can discern meaning and intent on a line-by-line basis.

When computers are tuned to pour through executive speeches, corporate communications, press interviews and more, such analysis could ultimately indicate whether a company is prone to “excessive optimism”, and help investors and other stakeholders “punch through the hype” of corporate speak. To the naked ear, speech patterns of executives, politicians and other global players probably change little over time. However, if data scientists are able to run current and past communications through text analytics processes, interesting patterns may emerge that could be actionable.

The largest challenge in analyzing executive hubris doesn’t appear to be standing up a technology infrastructure for analytics, especially when cloud based solutions are available. Nor does the actual sentiment analysis seem to be the sticking point, because with enough data scientists, such algorithms can be tweaked for accuracy over time.

The ultimate difficulty is deciding what—if anything to do—when analyses of leader’s speech patterns reveal a pattern of hubris. As an employee, does this mean it’s time to look for another job? As an investor, does this mean it’s time to sell? As a citizen, does this mean it’s time to pressure the government to change its course of action—and if so, how? All good questions for which there are few clear answers.

Regardless, with computers reading the news, it’s more important than ever for leaders of all stripes to be cognizant that stakeholders are watching and acting on their words—often in near real time.

Writer Gillian Tett says that we humans“ instinctively know, but often forget that power not only goes to the head, but also to the tongue.” With this in mind, leaders in political, business and social circles then need to understand that when it comes to signs of arrogance, we’ll not only be watching and listening, but counting too.

Science Needs Less Certainty

A disturbing trend is afoot, where key topics in science are increasingly considered beyond debate—or in other words settled. However, good science isn’t without question, discovery and even a bit of “humility”—something that scientists of all stripes (chemists, mathematicians, physicists and yes even data scientists) should remember.

Courtesy of Flickr. By epSos.de
Courtesy of Flickr. By epSos.de

Recently, the online site for Popular Science discontinued its online comments for certain topics. The reasoning for such a policy was clear according to an editor; “A politically motivated, decades-long war on expertise has eroded the popular consensus on a wide variety of scientifically validated topics. Everything, from evolution to the origins of climate change, is mistakenly up for grabs again. Scientific certainty is just another thing for two people to “debate” on television.”

Thus, it was clear that because the science behind a smattering of topics was settled, there was no need for further debate. Instead, the magazine promised to open comments for topics on “select articles that lend themselves to vigorous and intelligent discussion.”

Now one can hardly blame Popular Science. Commenting online has been out of hand for some time, especially when denizens of the internet choose character assassination and cheap shots to prove a point. And to be sure, instead of enlightened discussion, sometimes comment sections devolve to least common denominator thinking.

That said, Popular Science couldn’t be any more wrong. Last I checked, good science was all about hypothesizing, testing, discovery and repeatability. It was about debate on fresh and ancient ideas, with an understanding that there was little certitude and more probabilities in play, especially because the world around us is constantly changing. We’re learning more, discovering more, and changing our theories to reflect the latest evidence. We’re testing ideas, failing fast and moving on to the next experiment. And things we believe to be true today are sometimes proven either less true or completely false tomorrow.

However, it disturbs me to see debate cut off—on any topic—because we know the facts and the numbers prove them true. Facts change—as Christopher Columbus would attest, were he alive today. And worse, we have scientists who disparage others because 97% of “the collective” agree on a given topic. As if consensus determined what is true.

The blogger Epicurean Dealmaker laments on the same topic; “The undeniable strength of science as a domain of human thought is that it embeds skepticism…science is not science if it does not consist of theorems and hypotheses which are only—always and forever more—taken as potentially true until they are proven otherwise. And science itself declares its ambition to constantly test and retest its theories and assumptions for completeness, accuracy, and truth, even if this happens more often in theory than in fact.”

As we travel down the path of the next big thing –the transformation of multiple disciplines including business, medicine, artificial intelligence and more with “Big Data,” let us not forget that in a complex world—while our analysis and numbers prove one thing today—they may be woefully inadequate for tomorrow’s challenges.

So let’s encourage debate, discussion, testing and re-testing of theories and experimentation using data and analytic platforms to learn more about our customers, our companies and ourselves. And don’t shut off debate because everyone agrees—chances are they do not. The old adage, ‘conflict creates’ holds true, whether in the chemistry or data lab. The future of our companies, economies and societies depends on it.

The Dirty (Not so Secret) Secret of IT Budgets

Some business users believe that every year IT is handed a budget, which is then fully used to drive new and productive ways to enable more business. This is, however, far from reality. In fact, in most instances the lion’s share of the IT budget is dedicated towards supporting legacy code and systems. With so precious few IT dollars to support experimentation with new technologies, it makes sense why pay-per-use cloud computing options are so alluring.

Courtesy of Flickr. By Val.Pearl
Courtesy of Flickr. By Val.Pearl

There’s an interesting story in the Financial Times, where author Gillian Tett discusses how in western economies most of the dollars lent by banks go to supporting existing assets, and not innovation. The problem is highlighted by former UK regulator Adair Turner who says, out of every dollar in credit “created” by UK banks; only 15% of financial flows go into “productive investment”. The other 85% goes to supporting “existing corporate assets, real estate and unsecured personal finance.”

Essentially, there are fewer dollars lent by banks for innovative projects, startups, and new construction with most of the monies dedicated to maintaining existing assets. Sounds a lot like today’s IT budgets.

As evidence, a Cap Gemini report mentions; “Most organizations do not have a clear strategy for retiring legacy applications and continue to spend up to three quarters of their IT budgets just “keeping the lights on” – supporting outdated, redundant and sometimes entirely obsolete systems.”  Now if this “75%” statistic is still in fashion, and there is evidence that it’s accurate, it leaves very little funds for high potential projects like mobile, analytics, and designing new algorithms that solve business problems.

Here’s where cloud computing can make a difference.  Cloud computing infrastructures, platforms and applications often allow users to try before they buy with very little risk. Users can test applications, explore new application functions and features, experiment with data, and stand up analytic capabilities with much less fuss than traditional IT routes. Best of all, much of this “experimentation” can be funded with operating budgets instead of going through the painful process of asking the CFO for a CAPEX check.

Speaking of innovation, the real value of cloud isn’t just the fact that information infrastructure is at the ready and scalable, but more what you can use it for. Take for example, the use of cloud based analytics to drive business value, such as sniffing out fraud in banking and online payment systems, exploring relationships between customers and products, optimizing advertising spend, analyzing warranty data to produce more quality products and many more types of analyses.

These kinds of analytics stretch far beyond the mundane “keeping the lights on” mindset that IT is sometimes associated with, and instead can really show line of business owners how IT can be more than a just a “game manager” but rather a “play-maker”.

Fortunately, the modernization of legacy systems is a top priority for CIOs. But much like turning an aircraft carrier, it’s going to take time to make a complete switch from maintaining legacy systems to supporting innovative use cases.

But in the meantime, with cloud, there’s no need to wait to experiment with the latest technologies and/or try before you buy. Cloud infrastructures, platforms and applications are waiting for you. The better question is, are you ready to take advantage of them?

Follow

Get every new post delivered to your Inbox.

Join 43 other followers

%d bloggers like this: