How Much Big Data is Too Much?

With storage costs plummeting and sophisticated software approaches to mining Big Data, it appears that it is increasingly cost effective for corporations and governments to keep all types of data, even those previously discarded.  However, how much “Big Data” should corporations, entities and governments keep online or archived, especially when “Right to Be Forgotten” debates are swirling?

Image Courtesy of Flickr

Like it or not, all kinds of data are captured every day. James Gleick in “The Information” sums it up nicely;

“The information produced and consumed by humankind used to vanish—that was the norm, the default. The sights, the sounds, the spoken word just melted away. Now the expectations have inverted. Everything may be recorded and preserved at least potentially; every musical performance, every crime, elevator, city street, every volcano or tsunami on the remotest shore…”

With petabytes of storage and virtual machines available in the cloud on a pay per use basis, and on premise storage costs dropping like a rock, it’s conceivable for companies and governments keep every image, video, recording, keystroke, and web generated data type. And of course, all these data are of little use without techniques to mine and perform information discovery. Fortunately BI and data warehousing technologies have worked wonders over the past thirty to forty years for data that needs to be organized, and we have MapReduce/Hadoop to assist in assembling/analyzing an organized data garbage dump.

There are two consequences of this data deluge.

For individuals, there is the feeling of drowning in a sea of overwhelming data of which it’s difficult to manage much less scrutinize. Novelist David Foster Wallace called this scenario “Total Noise” to coin the feeling of drowning in a deep pool of too many tweets, posts, phone calls, podcasts and more. And because this total noise causes “information anxiety” for some, there are plenty of people deleting social media accounts.

And there is a second consequence of this data deluge. Since everything that can be captured is in the process of being captured, there are certainly privacy and security concerns. Our likes, rants, passions and partialities are recorded online and archived offline in perpetuity. These concerns have fomented potential privacy legislation such as the EU’s “Right to Be Forgotten” where digital providers—upon request—will need to cull digital references owned by individuals.

These consequences then beg the question, how much Big Data is too much? What should be kept for corporate reasons (to serve customers better, sell more products, optimize business processes etc)? What should be kept for governmental concerns (tracking bank flows for money laundering, watching for potential terrorist activity, monitoring fringe groups that don’t see eye to eye with government officials)?  And with pending legislation such as “Right to be Forgotten” considered in statehouses across the world, is it more hassle than it’s worth to keep all this Big Data, especially if there are financial penalties for not complying with legislation?

 

Data Tracking for Asthma Sufferers?

Despite the recent privacy row with smartphones and other GPS enabled devices, a Wisconsin doctor is proposing use of an inhaler with built in global positioning system to track where and when asthma sufferers use their medication. By capturing data on inhaler usage, the doctor proposes that asthma sufferers can learn more about what triggers an attack and the medical community can learn more about this chronic condition. However, the use of such a device has privacy implications that need serious consideration.

For millions of people on a worldwide basis, asthma is no joke. An April 9, 2011 Economist article mentions that asthma affects more than 300 million people, almost 5% of the world’s population.

Scientists and the medical community have long pondered the question; ‘What triggers an asthma attack?’ Is it pollen, dust in the air, mold spores or other environmental factors? The key to learning the answer to this question is not only relevant for asthma sufferers themselves, but also society (and healthcare costs) as there are more than 500,000 asthma related hospital admissions every year.

In an effort to better understand factors behind asthma attacks, Dr. David Van Sickle, co-founded a company that makes an inhaler with GPS to track usage. Van Sickle once worked for the Centers for Disease Control (CDC), and he believes that with better data society can understand asthma in a deeper manner.  By capturing data on asthma inhaler usage and then plotting the results with visualization tools, Van Sickle hopes that this information can be sent back to primary care physicians to help patients understand asthma triggers.

A better understanding of asthma makes sense for patients, health insurers and society at large. The Economist article notes that pilot studies of device usage thus far have resulted in basic understandings of asthma coming into question. However, there are surely privacy implications in the capture, management and use of this data, despite reassurances from the medical community that data will be anonymized and secured.

Should societal and patient benefits outweigh privacy concerns when it comes to tracking asthma patients? What do you think?  I’d love to hear from you.