Privacy Ramifications of IT Infrastructure Everywhere

Most people don’t notice that information technology pervades our daily lives. Granted, some IT infrastructure is in the open and easy to spot, such as the computer and router on your desk hooked up via network cables. However, plenty of IT infrastructures are nearly invisible as they reside in locked network rooms or heavily guarded data centers.  And some IT infrastructures are bundled underneath city streets, arrayed on rooftops, or even camouflaged as trees at the local park. Let’s take a closer look at a few ramifications of IT infrastructure everywhere.

Courtesy of Flickr. By Jonathan McIntosh.
Courtesy of Flickr. By Jonathan McIntosh.

1.  Technology is pervasive and commonplace in our daily lives. Little is seen, much is hidden.

Good news: Companies have spent billions of dollars investing in wired and wireless connections that span cities, countries and oceans. This connectivity has enabled companies to ship work to lower cost providers in developing countries, and for certain IT projects to “follow the sun” and thus finish faster. Also, because we have IT infrastructure everywhere, it makes it possible for police forces and/or governments to identify and prosecute perpetrators of crime that much easier.

Bad news: This same IT infrastructure can also be used to monitor and analyze where and how people gather, what they say, relationships, how they vote, religious and political views and more. Closed circuit TV cameras on street corners (or concealed as mailboxes), ATM machines, POS systems, red-light cameras, and drones make up a pervasive and possibly invasive infrastructure that never sleeps. You may be free to assemble, however, IT infrastructure might be watching.

2.  Some information technology is either affordable or in some cases “free”, but the true costs may be hidden.

Good news: Google’s G+ or Gmail, Facebook, or Yahoo’s portal and email services are no to low cost for consumers and businesses. In addition, plenty of cloud providers such as Amazon, Google or Dropbox offer a base level of storage for documents or photos with no upfront hard dollar cost. On the surface it appears we are getting something for practically nothing.

Bad news: There’s no such thing as a free lunch as Janet Vertesi, assistant professor of sociology at Princeton can attest. For months she tried to hide her pregnancy from Big Data, but she realized that Facebook, Google and other free “services” were watching her every post, email, and interaction in search of ways to advertise and sell her something. While she was not paying a monthly fee for these online services, there was in fact a “cost”—Vertesi was exchanging her online privacy for the ability of advertisers to better target her and serve appropriate advertising.

3. IT infrastructure is expected to be highly available. Smartphones, internet access, computers are simply expected to work and be immediately available for use.

Good news: With IT infrastructure, high availability (four to five 9’s) is the name of the game. Anything less doesn’t cut it. Cloud services from IaaS to SaaS are expected to stay up and running, and phone networks are expected to have enough bandwidth to support our phone calls and web browsing—even atbusy sporting events.  And for the most part, IT infrastructure delivers time and again because consumers and business have the expectation that technology is highly available.

Bad news: Not only is IT infrastructure always on, but because of Moore’s Law and plummeting costs of disk, it never forgets.  For example, when disk and tape space was expensive, closed circuit TVs would record a day’s worth of coverage and then write over it the next day. Now, multiple cameras can record 30 days of surveillance on an 80 GB hard drive. And we haven’t even mentioned offsite or cloud storage which makes it possible to store audio, video, documents, photos, call detail records and more—essentially forever. Youthful transgressions can be published for all time. And mistakes today are recorded for years to come. The internet never forgets, unless you live in the European Union.

In the book, Sorting Things Out, Geoffrey C. Bowker and Susan Leigh Star call “Infrastructural Inversion” the process of focusing on various invisible systems—how they work—and how “people can change this invisibility when necessary”.   IT infrastructure is one such system that permeates our daily lives, often unseen but ever so critical to our societies.

There are undoubtedly other ramifications to this unseen IT infrastructure. Here’s hoping you’ll join the conversation with your thoughts!

Private Clouds Are Here to Stay—Especially for Data Warehousing

Some cloud experts are proclaiming private clouds “are false clouds”, or that the term was conveniently conjured to support vendor solutions. There are other analysts willing to hedge their bets by proclaiming that private clouds are a good solution for the next 3-5 years until public clouds mature.  I don’t believe it. Private clouds are here to stay (especially for data warehousing)—let me tell you why.

For starters, let’s define public vs. private cloud computing.  NIST and others do a pretty good job of defining public clouds and their attributes.  They are remote computing services that are typically elastic, scalable, use internet technologies, self-service, metered by use and more.  Private clouds, on the other hand are proprietary and typically behind the corporate firewall. And they frequently share most of the characteristics of public clouds.

However, there is one significant difference between the two cloud delivery models –public clouds are usually multi-tenant (i.e. shared with other entities/corporations/enterprises). Private clouds are typically dedicated to a single enterprise – i.e. not shared with other firms. I realize the above definitions are not accepted by all cloud experts, but they’re common enough to set a foundation for the rest of the discussion.

With the definition that private clouds equate to a dedicated environment for a single or common enterprise, it’s easy to see why they’ll stick around—especially for data warehousing workloads.

First, there’s the issue of security. No matter how “locked down” or secure a public cloud environment is said to be, there’s always going to be an issue of trust that will need to be overcome by contracts and/or SLAs (and possibly penalties for breaches).  Enterprises will have to trust that their data is safe and secure—especially if they plan on putting their most sensitive data (e.g. HR, financial, portfolio positions, healthcare and more) in the public cloud.

Second, there’s an issue of performance for analytics.  Data warehousing requirements such as high availability, mixed workload management, near real-time data loads and complex query execution are not easily managed or deployed using public cloud computing models. By contrast, private clouds for data warehousing offer higher performance and predictable service levels expected by today’s business users. There are myriad other reasons why public clouds aren’t ideal for data warehousing workloads and analyst Mark Madsen does a great job of explaining them in this whitepaper.

Third, in the multi-tenant environment of public cloud computing, there is increasing complexity which will lead to more cloud breakdowns. In a public cloud environment there are lots of moving pieces and parts interacting with each other (not necessarily in a linear fashion) within any given timeframe. These environments can be complex and tightly coupled where failures in one area easily cascade to others. For data warehousing customers with high availability requirements public clouds have a long way to go.  And the almost monthly “cloud breakdown” stories blasted throughout the internet aren’t helping their cause.

Finally, there’s the issue of control. Corporate IT shops are mostly accustomed to having control over their own IT environments. In terms of flexibly outsourcing some IT capabilities (which is what public cloud computing really is), IT is effectively giving up some/all control over their hardware and possibly software.  When there are issues and/or failures, IT is relegated to opening up a trouble ticket and waiting for a third party provider to remedy the situation (usually within a predefined SLA).  In times of harmony and moderation, this approach is all well and good. But when the inevitable hiccup or breakdown happens, it’s a helpless feeling to be at the mercy of another provider.

When embarking on a public cloud computing endeavor, a company or enterprise is effectively tying their fate to another provider for specific IT functions and/or processes.   Key questions to consider are:

  • How much performance do I need?
  • What data do I trust in the cloud?
  • How much control am I willing to give up?
  • How much risk am I willing to accept?
  • Do I trust this provider?

There are many reasons why moving workloads to the public cloud makes sense, and in fact your end-state will likely be a combination of public and private clouds.  But you’ll only want to consider public cloud after you carefully think about the above questions.

And inevitably, once answers to these questions are known, you’ll also conclude private clouds are here to stay.

 

What’s Next – Predictive “Scores” for Health?

In the United States health information privacy is protected by the Health Information Portability and Accountability (HIPAA) act.  However, new gene sequencing technologies are now available making it feasible to read an individual’s DNA for as little as $1,000 USD.  If there is predictive value in reading a person’s gene sequence, what are implications of this advancement? And will healthcare data privacy laws be enough to protect employees from discrimination?

The Financial Times reports a breakthrough in technology for gene sequencing, where a person’s chemical building blocks can be catalogued—according to one website—for scientific purposes such as exploration of human biology and other complex phenomena. And whereas DNA sequencing was formerly a costly endeavor, the price has dropped from $100 million to just under $1,000 per genome.

These advances are built on the back of Moore’s Law where computation power doubles every 12-18 months paired with plummeting data storage costs and very sophisticated software for data analysis.  And from a predictive analytics perspective, there is quite a bit of power in discovering which medications might work best for a certain patient’s condition based on their genetic profile.

However, as Stan Lee’s Spiderman reminds us, with great power comes great responsibility.

The Financial Times article mentions; “Some fear scientific enthusiasm for mass coding of personal genomes could lead to an ethical minefield, raising problems such as access to DNA data by insurers.”  After all, if indeed there is predictive value via analyzing a patient’s genome, it might be possible to either offer or deny that patient health insurance—or employment—based  on potential risks of developing a debilitating disease.

In fact, it may become possible in the near future to assign a certain patient or group of patients something akin to a credit score based on their propensity to develop a particular disease.

And something like a predictive “score” for diseases isn’t too outlandish a thought, especially when futurists such as Aaron Saenz forecast; “One day soon we should have an understanding of our genomes such that getting everyone sequenced will make medical sense.”

Perhaps in the near future, getting everyone sequenced may make medical sense (for both patient and societal benefit) but there will likely need to be newer and more stringent laws—and associated penalties for misuse) to ensure such information is protected and not used for unethical purposes.

Question:

  • With costs for DNA sequencing now around $1000 per patient, it’s conceivable universities, research firms and other companies will pursue genetic information and analysis. Are we opening Pandora’s Box in terms of harvesting this data?

Has Personalized Filtering Gone Too Far?

In a world of plenty, algorithms may be our saving grace as they map, sort, reduce, recommend, and decide how airplanes fly, packages ship, and even who shows up first in online dating profiles. But in a world where algorithms increasingly determine what we see and don’t see, there’s danger of filtering gone too far.

The global economy may be a wreck, but data volumes keep advancing. In fact, there is so much information competing for our limited attention, companies are increasingly turning to compute power and algorithms to make sense of the madness.

The human brain has its own methods for dealing with information overload. For example, think about millions of daily input the human eye receives and how it transmits and coordinates information with our brain. A task as simple as stepping a shallow flight of stairs takes incredible information processing. Of course, not all received data points are relevant to the task of walking a stairwell, and thus the brain must decide which data to process and which to ignore. And with our visual systems bombarded with sensory input from the time we wake until we sleep, it’s amazing the brain can do it all.

But the brain can’t do it all—especially not with the onslaught of data and information exploding at exponential rates. We need what author Rick Bookstaber calls “artificial filters,” computers and algorithms to help sort through mountains of data and present the best options. These algorithms are programmed with decision logic to find needles in haystacks, ultimately presenting us with more relevant choices in an ocean of data abundance.

Algorithms are at work all around us. Google’s PageRank presents us relevant results—in real time—captured from web server farms across the globe. Match.com sorts through millions of profiles, seeking compatible profiles for subscribers. And Facebookshows us friends we should “like.”

But algorithmic programming can go too far. As humans are more and more inundated with information, there’s a danger in turning over too much “pre-cognitive” work to algorithms. When we have computers sort friends we would “like”, pick the most relevant advertisements or best travel deals, and choose ideal dating partners for us, there’s a danger in missing the completely unexpected discovery, or the most unlikely correlation of negative one. And even as algorithms “watch” and process our online behavior and learn what makes us tick, there’s still a high possibility that results presented will be far and away from what we might consider “the best choice.”

With a data flood approaching, there’s a temptation to let algorithms do more and more of our pre-processing cognitive work. And if we continue to let algorithms “sort and choose” for us – we should be extremely careful to understand who’s designing these algorithms and how they decide. Perhaps it’s cynical to suggest otherwise, but in regards to algorithms we should always ask ourselves, are we really getting the best choice, or getting the choice that someone or some company has ultimately designed for us?

Question:
*  Rick Bookstaber makes the case that personalized filters may ultimately reduce human freedom. He says, “If filtering is part of thinking, then taking over the filtering also takes over how we think.” Are there dangers in too much personalized filtering?

Data Tracking for Asthma Sufferers?

Despite the recent privacy row with smartphones and other GPS enabled devices, a Wisconsin doctor is proposing use of an inhaler with built in global positioning system to track where and when asthma sufferers use their medication. By capturing data on inhaler usage, the doctor proposes that asthma sufferers can learn more about what triggers an attack and the medical community can learn more about this chronic condition. However, the use of such a device has privacy implications that need serious consideration.

For millions of people on a worldwide basis, asthma is no joke. An April 9, 2011 Economist article mentions that asthma affects more than 300 million people, almost 5% of the world’s population.

Scientists and the medical community have long pondered the question; ‘What triggers an asthma attack?’ Is it pollen, dust in the air, mold spores or other environmental factors? The key to learning the answer to this question is not only relevant for asthma sufferers themselves, but also society (and healthcare costs) as there are more than 500,000 asthma related hospital admissions every year.

In an effort to better understand factors behind asthma attacks, Dr. David Van Sickle, co-founded a company that makes an inhaler with GPS to track usage. Van Sickle once worked for the Centers for Disease Control (CDC), and he believes that with better data society can understand asthma in a deeper manner.  By capturing data on asthma inhaler usage and then plotting the results with visualization tools, Van Sickle hopes that this information can be sent back to primary care physicians to help patients understand asthma triggers.

A better understanding of asthma makes sense for patients, health insurers and society at large. The Economist article notes that pilot studies of device usage thus far have resulted in basic understandings of asthma coming into question. However, there are surely privacy implications in the capture, management and use of this data, despite reassurances from the medical community that data will be anonymized and secured.

Should societal and patient benefits outweigh privacy concerns when it comes to tracking asthma patients? What do you think?  I’d love to hear from you.

The Next Wave in Recommendation Systems?

While some internet privacy experts fret over use of cookies and web profiles for targeted advertising, the quest for personalization is about to go much deeper as web companies create new profiling techniques based on the science of influence.

Behavioral targeting on the web using cookies, http referrer data, registered user accounts and more is about to be significantly enhanced says columnist Eli Pariser.  In the May 2011 issue of Wired Magazine, in an article titled “Mind Reading”, Pariser discusses how website recommendation and targeting algorithms; “analyze our consumption patterns and use that information to figure out (what to pitch us next).”   However, Parser notes that the next chapter for recommendation systems is to discern the best approach in influencing online shoppers to buy.

In the article, Pariser cites an experiment by a doctoral student at Stanford where online shopping sites attempted to not only track clicks and items of interest, but also determine the best way to pitch a product. For example, pitches would alternate between an “Appeals to Authority”; as in someone you respect says you’ll like this product to “Social Proof”—everyone’s buying this product, so should you!

Taking a cue from the work completed by Dr. Robert Cialdini it appears that the next wave in recommendation algorithms is to learn our “decision triggers”, or the best way to persuade us to act. In his book “Influence: Science and Practice”, Cialdini documented six decision triggers of consistency, reciprocation, social proof, liking, authority and scarcity as mental shortcuts that help humans deal with the “richness and intricacy of the outside environment.”

Getting back to the Wired Magazine article, Eli Pariser says this means that websites will hone in on the best pitch for a particular online consumer and –if effective—continue to use it.  To illustrate this concept, Pariser says; “If you respond a few times to a 50% off in the next ten minutes deal, you could find yourself surfing a web filled with blaring red headlines and countdown clocks.”

Of course, shoppers buy in various ways and not always in the same manner. However, the work of Robert Cialdini shows that in the messy and complicated lives of most consumers that mental shortcuts help with the daily deluge of information. Therefore, this new approach of recommendation systems using principles of psychology in tailoring the right way to “pitch” online shoppers, might just work.

There’s no doubt that recommendation systems already take into account principles of social proof and liking, but there’s a lot more room for improvement, especially other areas that Cialdini has researched. The answer to ‘why we buy’ is about to be taken to a whole new level.

Questions:

  • What’s your take on this next development in recommendation systems? Benefit or too much “Big Brother”?
  • Are you moved by “act now” exhortations? What persuasion technique/s work best on you?

Should Online Companies Be Forced To Forget?

Online companies have raised the eyebrows of privacy advocates who think web generated data should only be archived for a specified period of time. And while some companies have bowed to public pressure and only keep data on customer searches for a maximum of three months, others have not acquiesced. When it comes to privacy concerns, should Internet based companies be required “to forget?”

Neuroscientists have long claimed the act of forgetting is important to the processes of the human mind. Humans have a need to forget especially because each day our brains deal with tons of trivial information and clutter, not to mention hundreds if not thousands of marketing messages.

Therefore, our mental processes must prioritize which facts should have more importance than others—such as ‘where are my car keys?’ versus ‘what did I eat for lunch last Thursday?’ We must forget, because according to neuroscientists, our brains would overload if we captured every detail of our lives.

Yet, unlike the human mind which has a fixed capacity, computer data stores (i.e. disk, tape etc) are getting larger and cheaper to manufacture thereby allowing companies to keep more transactional details very inexpensively.

In fact, thanks to accelerating technological change, companies can now take advantage of less expensive data storage to keep transactional data for longer periods of time—with the ultimate goal of mining data for insights to improve the customer experience.

However, data retention policies of considerable length run head first into concerns from privacy advocates. For example, according to a Washington Post article, online search companies have policies in which they actively keep query data from 3-18 months, and in some instances longer. Their rationale? Online search companies say query data is used to improve their algorithms, optimize search results, and provide advertisers better targeting.

Privacy advocates, however, argue that search queries often contain personal details, and taken collectively can reveal a complete picture of the person using the search engine. Ultimately they say, too much power in the hands of a few key search engines is a privacy nightmare.

To effectively meet customer needs in a very complex and fluid economic environment, companies must be able to collect and analyze data to understand customer behavior, drive better communications and respond to changing customer needs. That said, the benefits of data collection and analysis must coincide with responsible behavior.

Questions:

  • Should online companies be required to “forget” what they know about their customers and transactions? If so, what is the cut-off point?
  • Should corporations advertise that they quickly “forget”—much as Ask.com has?
  • Are consumer privacy concerns regarding data collection policies more bark than bite?