NSA and the Future of Big Data

no speed limit 2

The National Security Agency of the United States (NSA) has seen the future of Big Data and it doesn’t look pretty.  With data volumes growing faster than the NSA can store, much less analyze, if the NSA with hundreds of millions of dollars to spend on analytics is challenged, it raises the question; “Is there any hope for your particular company”?

Courtesy of Flickr. By One Lost Penguin

By now, most IT industry analysts accept the term “Big Data” is much more than data volumes increasing at an exponential clip. There’s also velocity, or speeds at which data are created, ingested and analyzed. And of course, there’s variety in terms of multi-structured data types including web logs, text, social media, machine data and more.

But let’s get back to data volumes. A commonly referenced report conducted by IDC mentions data volumes are more than doubling every two years. Now that’s exponential growth that Professor Albert Bartlett can appreciate!

What are consequences of unwieldy data volumes? For starters, it’s nearly impossible to effectively deal with the flood.

In James Bamford’s “Shadow Factory”, he mentions how the NSA is vigorously constructing data centers in remote and not so remote locations to properly store the “flood of data” captured from foreign communications including video, voice, text and spreadsheets.  One NSA director is quoted as saying; “Some intelligence data sources grow at a rate of four petabytes per month now…and the rate of growth is increasing!”

Building data centers and storing petabytes of data isn’t the end goal. What the NSA really needs is analysis. And in this area the NSA is falling woefully short, but not for lack of trying.

That’s because in addition to the fastest super computers from Cray and Fujitsu, the NSA needs programmers who can modify algorithms on the fly to account for new key words that terrorists or other foreign nationals may be using. The NSA also constantly seeks linguists to help translate, document and analyze various foreign languages (something computers struggle with—especially discerning sentiment and context).

According to Bamford, the NSA sifts through petabytes of data on a daily basis and yet the flood of data continues unabated.

In summary, for the NSA it appears there are more data to be stored and analyzed than budget to procure more supercomputers, programmers and analytic talent.  There’s just too much data and too little “intelligence” to let directors know what patterns, links and relationships are most important. One NSA director says; “We’ve been into the future and we’ve seen the problems of a “tidal wave” of data.”

So if one of the most powerful government agencies in the world is struggling with an exponential flood of big data, is there hope for your company?  For advice, we turn to Bill Franks, Chief Analytics Officer for Teradata.

In a Smart Data Collective article, Mr. Franks says that even though the challenge of Big Data may be initially overwhelming, it pays to eat an elephant a single bite at a time. “People need step back, push the hype from their minds, and think things through,” he says.  In other words, don’t stress about going big from day one.

Instead, Franks counsels companies to “start small with big data.”  Capture a bit at a time, gain value from your analysis and then collect more he says. There’s an overwhelming temptation to splurge on hundreds of machines and lots of software to capture and analyze everything. Avoid this route, and instead take the road less traveled—the incremental approach.

The NSA may be drowning in information, but there’s no need to inflict sleepless nights on your IT staff.  Think big data but start small. Goodness knows, in terms of data, there will always be plenty more to capture and analyze. The data flood will continue. And from a IT job security perspective, that’s a comforting thought.

Ring in the New Year with New Data Products

For web-based businesses, and of course, those with a web presence (which is just about everyone) there’s a goldmine of behavioral data accessible with the right tools. The trick is getting past static web analytic reporting (bounce rates, page views, session times etc.) and going further into unlocking the rich treasure trove of machine data, text and weblogs that create “big data” insight.

In your business, gigabytes if not terabytes of multi-structured data are likely just waiting to be coupled with your imaginative thinking and analysis to create new data products that ultimately help drive customer interactions and revenues

For example, take a look at what LinkedIn is doing in creating new “products” with data they collect and analyze with a MapReduce approach and other techniques.

According to a recent whitepaper “Building Data Science Teams”, LinkedIn’s former Chief Data Scientist shows how smart thinking can be paired with compute power and huge quantities of multi-structured data to create innovative new products such as:

  • Products that provide personalized content (which makes customers feel products/services are handpicked for them based on their wants/needs)
  • Products that drive the company’s value proposition (For LinkedIn, it’s their “People You May Know” or “Jobs You May Be Interested In” algorithms which drive further customer engagement)
  • Products that facilitate an introduction to other products (to funnel customers into other relevant areas of your website and thus lower your bounce rates)
  • Products that prevent dead ends (ex: smart algorithms that suggest other potential purchases, i.e. “People like you also bought…”)

And of course, many of the above “products” are more than simply focused on nebulous metrics such “customer engagement”—they can directly tie to revenue improvements.

There are even opportunities to drive news cycles with unlocked insights. Companies such as LinkedIn can use information gleaned from their web server farms to build press releases such as:  “Top Ten Phrases Recruiters Want to See” or “Top Ten Job Growth Areas in the United States”.  These kinds of press releases are interesting to local newspapers, bloggers and media outlets, especially if there’s a unique angle relevant to readers/viewers.

There are plenty of companies turning to outside firms, crowds, and even their own customers for innovation. And there’s certainly nothing wrong with any of these approaches. However, those approaches may be trying too hard – especially when there’s a goldmine waiting to be unleashed just a few web servers away.