In Big Data Endeavors, Don’t Neglect Softer Business Skills

With technical skills such as Java, C++, Python and more in high demand for “Big Data” analytics, it seems like softer business skills such as speaking, writing, planning, leadership, negotiation etc. are falling by the wayside. But the ability to communicate, relate and navigate throughout an organization—so called “softer skills”—are especially needed to propagate analysis and communicate the impact of data-driven decision-making.

Courtesy of Flickr. By coryccreamer

Courtesy of Flickr. By coryccreamer

In 2012, cloud computing blogger David Linthicum penned a short piece explaining “3 Winners and 3 Losers in the Move to Big Data”.  In the post Linthicum identified one “loser” as data warehouse and BI specialists, presumably because these folks were accustomed to using languages like old-school SQL and supporting “legacy BI” systems.

It’s interesting that as we find ourselves nearing mid-2013, those “legacy” skills of writing for and supporting various BI tools and relational databases are not going away. In fact, the opposite seems true as open source programmers seek more ways to make projects SQL-like to access various distributed file systems, NoSQL and NewSQL data stores. And while the development of SQL-like interfaces helps the business analyst utilize some of these newer platforms, business skills seem to get short-shrift in the equation of making an analytics program a success.

It appears the burgeoning role of “data scientist” intends to bridge the gap between technical skills and business acumen.  An IBM blog states that while the formal training of a data scientist should include an understanding of computer science, applications and ability to write in various languages, they also need to have business smarts.  Thus the data scientist role must marry technical skills with “the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.”

It would seem that bridging the technical and business acumen gap with the data scientist role is an excellent idea. However as many articles on this site point out, data scientists are in high demand and can cost an organization a pretty penny. And at this point, there just aren’t that many data scientists available on job boards, or willing to move out of Silicon Valley.  So it appears that while there are plenty of employees with technical skills, and line of business leaders that understand the inner workings of the enterprise, there’s still a gap that needs bridging. What’s a company to do?

While it’s debatable whether a business analyst can be taught the necessary technical skills to become a data scientist, we can definitely ensure that we don’t neglect softer business skills in the evolution towards a data-driven organization. For example, there are universities that offer classes and executive course work on negotiation, communication and selling skills. In addition, there are programs available such as Toastmasters that can teach leadership and public speaking skills.

Need help writing? Your local university likely has coursework and workshops to improve business writing for proposals, sales briefs, whitepapers and more. Finally, there are too few employees that can perform “critical thinking”, or the ability to conceptualize, analyze and then evaluate various streams of information. Coursework from universities across the globe can also assist in this area.

What say you? Are better business skills needed for analytics professionals? If so, what are those skills? Finally, how would you recommend developing an action plan to “perform a business skills upgrade”?

Technologies and Analyses in CBS’ Person of Interest

Person of Interest is a broadcast television show on CBS where a “machine” predicts a person most likely to die within 24-48 hours. Then, it’s up to a mercenary and a data scientist to find that person and help them escape their fate. A straight forward plot really, but not so simple in terms of the technologies and analyses behind the scenes that could make a modern day prediction machine a reality. I have taken the liberty of framing some components that could be part of such a project.  Can you help discover more?

CBSIn Person of Interest, “the machine” delivers either a single name or group of names predicted to meet an untimely death. However, in order to predict such an event, the machine must collect and analyze reams of big data and then produce a result set, which is then delivered to “Harold” (the computer scientist).

In real life, such an effort would be a massive undertaking on a national basis, much less by state or city. However, let’s dispense with the enormities—or plausibility of such a scenario and instead see if we can identify various technologies and analyses that could make a modern day “Person of Interest” a reality.

It is useful to think of this analytics challenge in terms of a framework: data sources, data acquisition, data repository, data access and analysis and finally, delivery channels.

First, let’s start with data sources. In Person of Interest, the “machine” collects data from various sources such as interactions from: cameras (images, audio and video), call detail records, voice (landline and mobile), GPS for location data, sensor networks, and text sources (social media, web logs, newspapers, internet etc.). Data sets stored in relational databases that are publicly and not publicly available might also be used for predictive purposes.

Next, data must be assimilated or acquired into a data management repository (most likely a multi-petabyte bank of computer servers). If data are acquired in near real time, they may go into a data warehouse and/or Hadoop cluster (maybe cloud based) for analysis and mining purposes. If data are analyzed in real time, it’s possible that complex event processing technologies (i.e. streams in memory) are used to analyze data “on the fly” and make instant decisions.

Analysis can be done at various points—during data streaming (CEP), in the data warehouse after data ingest (which could be in just a few minutes), or in Hadoop (batch processed).  Along the way, various algorithms may be running which perform functions such as:

  • Pattern analysis – recognizing and matching voice, video, graphics, or other multi-structured data types. Could be mining both structured and multi-structured data sets.
  • Social network (graph) analysis – analyzing nodes and links between persons. Possibly using call detail records, web data (Facebook, Twitter, LinkedIn and more).
  • Sentiment analysis – scanning text to reveal meaning as in when someone says; “I’d kill for that job” – do they really mean they would murder someone, or is this just a figure of speech?
  • Path analysis – what are the most frequent steps, paths and/or destinations by those predicted to be in danger?
  • Affinity analysis – if person X is in a dangerous situation, how many others just like him/her are also in a similar predicament?

It’s also possible that an access layer is needed for BI types of reporting, dashboard, or visualization techniques.

Finally, delivery of the result set –in this case – name of the person “the machine” predicts most likely to be killed in the next twenty four hours, could be sent to a device in the field either a mobile phone, tablet, computer terminal etc.

These are just some of the technologies that would be necessary to make a “real life” prediction machine possible, just like in CBS’ Person of Interest. And I haven’t even discussed networking technologies (internet, intranet, compute fabric etc.), or middleware that would also fit in the equation.

What technologies are missing? What types of analysis are also plausible to bring Person of Interest to life? What’s on the list that should not be? Let’s see if we can solve the puzzle together!

Where is the Cloud?

When the term “cloud computing” comes to mind, it’s fair to say that most people think of it as some nebulous group of computers in the sky delivering content to mobile devices and workstations whenever it’s required. How far off is that definition, and where exactly is “the cloud”?

Courtesy of Flickr. By M Hooper

In a dusty corner of San Antonio, Texas, the cloud is about to come to life. As a Microsoft corporate VP takes a shovel and firmly plants it into the soil, she proclaims; “The cloud is not the cloud in the sky, it’s what we are about to break ground on (right here).”*

That’s because San Antonio, Prineville (OR), Quincy (Wash), among many other cities across the globe, are now host to massive data centers filled with tens of thousands of blinking computers owned by Microsoft, Rackspace, Facebook, Amazon and others.

Imagine this: racks upon racks of Intel based servers. Multi-colored wires networked from computer to computer. Huge vaults of pipes for cooling and air-conditioning massive computer farms. A few sleepy network engineers scurrying from machine to machine checking connections. Is this the cloud?

Thomas M. Koulopoulos, author of “Cloud Surfing” says that’s part of the story. “(The cloud is) is a heavily monitored, fortified, and secure array of computers that are built with the objective of securing data with multiple layers of physical and cyber security,” he says. But those asking ‘where’s the cloud’ aren’t asking the right question Koulopoulos argues. “This is sort of like asking, where does electricity exist?”

That’s because cloud computing is much more than the device in your hand streaming music, the corporate dashboard on your wirelessly connected tablet, or even megawatt powered data centers.

Instead, think of cloud as a service of computational power, storage and more, much like the service you’d get from a utility company. The cloud allows you to plug into a required capability—whether it’s for print servers or analytics.

The cloud is typically available on a metered basis when demanded, and can be accessed via self-service methods—simply plug in via a portal and access what you need. And it’s delivered via a host of technologies, software, processes, devices and physical locations that power this “service”.

Thomas M. Koulopoulos asserts that where the cloud physically exists doesn’t matter; “What counts instead is the question, ‘is it there when I need it?’” he says.  For people like me, this is too much of a utilitarian approach.  I want to know “the where” of cloud computing.

Coming back to the original question then, the cloud exists—in your connected handheld device, on your laptop, in your data center, in another company’s data center, across millions of miles of fiber optic cables, and wirelessly in the air. The cloud then, really isn’t just a place, it’s more of a system, a massive investment in people, dollars, infrastructure, time and talent.

So where is the cloud? The answer is places seen and unseen. In short – everywhere.

*as told in “The Shadow Factory” by James Bamford.

In the Future, Will Software Be More Important than Hardware?

From  talent wars going on in Silicon Valley for software engineers, to the hundreds of thousands of new smartphone applications coming online, it’s not far-fetched to believe that software rules the world today and will continue to rule in the future. However, some hardware makers strongly disagree- that it’s the physical design, construction and production of the device, machine or infrastructure that will take precedence. Who holds the future – hardware makers, software makers—or both?

Flickr for android, courtesy of Flickr.

A Financial Times article by Andrew Keen highlights a brewing battle between hardware and software makers for investor dollars. Both sides believe that they are the smarter investment for the long run. And both have a point.

First, it’s tempting to see hardware manufacturing as nothing more than something that should be outsourced. After all, companies such as Amazon source the production of the Kindle to offshore manufacturers, and it’s commonly understood that most large computer companies leave production of machines to Chinese/Taiwanese contract manufacturers such as Flextronics, FoxConn and others.

However, increasingly companies such as CPU manufacturers and tablet makers are taking some of these manufacturing capabilities in-house, especially as product complexity increases and integration between software and hardware becomes more commonplace.

In addition, taking manufacturing capabilities in-house means less bureaucracy in terms of working with an outsourced vendor, arguably higher accountability (no one to blame for failures), and more control over manufacturing processes. Net, net in many cases the higher a product moves up the value chain in terms of complexity and integration, the more it makes sense for companies to assert authority, control and accountability for manufacturing operations—sometimes all the way to the point of assuming full responsibility for hardware production.

The counter argument however is hardware will always be a commodity. Designs and specs can be written so that just about any respectable contract manufacturer can produce a product. The real value, say software makers is the design of user interfaces all the way to behind the scenes algorithms responsible for executing complex processes.

Proof points for the “software will rule” camp include software companies gaining a bigger slice of VC funding, and the number of applications developed for iPhone (650k) and Android (400k). For further reading on this perspective, review VC and market maker Marc Andreessen’s comments.

Ultimately, the most likely answer of who will win the future (hardware vs. software) is that there’s a place for both camps. For example, it’s the integration of commodity hardware with advanced software that seems to be the best fit for many companies looking to acquire analytics capabilities.

This is evidenced by the data warehouse appliance trend of an engineered and integrated solution stack of hardware and software coupled with services for implementation, maintenance and operations. These solution stacks are architected, performance tested, certified and supported. And they usually come from a single vendor responsible for the entire end-to-end package.

In the meantime, we have a strong debate. VC’s like Marc Andreessen say software companies are primed to “take over large swathes of the economy”. Hardware makers claim the user experience in terms of design, touch and feel is more relevant than ever. What say you?

Counteracting Our Obsession with Speed

In the quest to get as close as possible to the speed of light for faster decision making, it appears some companies are moving too fast and thus making very costly mistakes. When windows of time are compressed to near zero, there’s no recovery time for critical errors.  In fact, for some decisions (especially those of a strategic nature) it’s much better to take it slow.

Courtesy of Flickr. By Theseanster93

In baseball, scouts love to find pitchers that can throw “the heater”.  Prospects that can throw near 100 mph a few times a game are coveted over those who can rarely top 90.  The mantra for pitchers now is; “throw it faster and see if hitters can keep up”.

However, there’s a renewed interest in knuckleballers, or those pitchers who throw a pitch with little to no spin. For these pitchers, the ball is supposed to “dance” on its way to home plate at speeds of 60-70 miles per hour. For baseball hitters, trying to track down a dancing knuckleball is extremely tough. It’s hard to track the lively movement of the knuckleball, much less adjust to low speed at which they’re thrown.

Why the revived interest in knuckleball pitchers? Sports Illustrated writer Phil Taylor says; “(In baseball) we need the knuckleball to help counteract the obsession with speed, to prove there still is a place for nuance and skill.”

Phil Taylor has it exactly right –in baseball and in the business world.

Our world is obsessed with speed. Faster food, hurry up offenses in football, faster computers, and even faster war-making. As I have detailed before, it’s everything—faster.

But conversely, sometimes moving too fast is dangerous. There are some decisions that should not be made too quickly, especially those that could benefit from more data collection, or decisions where there is ambiguity and complexity. United States President Barack Obama mentioned in a Vanity Fair article; “Nothing comes to my desk that is perfectly solvable…so you wind up dealing with probabilities, and any given decision you make you’ll wind up with 30-40% chance that it isn’t going to work.”

Even when speed is deemed a competitive advantage, sometimes faster isn’t better. For example, Knight Capital Group lost $440 million dollars when a “technology malfunction” launched erroneous trades on their behalf.  Trading at near the speed of light, there simply wasn’t enough time to recover from the initial errors leaving Knight with nearly a $500 million loss in the span of just 45 minutes.

The need for speed comes at a price of compressed decision making windows and non-recoverability for critical errors. Worse, when errors from a few players cascade through complex systems, the feedback effects can severely damage all participants in the ecosystem. It’s as if the butterfly flapping its wings really does bring about category four hurricanes.

Not every decision needs to be made faster. There will always be a place for decisions made with “skill and nuance”, where it’s important to slow down, see the bigger picture, and adjust our swing and timing for the occasional erratic knuckleball thrown our way