# The Math Says Yes, But Human Behavior Says No

Data scientists are busy writing algorithms to optimize employee productivity, improve trucking routes, and update retail prices on the fly. But those pesky humans and their demands for a reasonable schedule and consistent pricing keep getting in the way. Which then proves when it comes to algorithmic model development, “real world” human behavior is the hard part.

The traveling sales person is still one of the most interesting math problems in terms of optimization. The problem can be summarized this way: take a sales person and their accounts in various cities. Now, optimize the shortest possible route for that sales person to visit each account once, and then come back home, all within a defined time period. What may sound like an easy problem to solve is easy is actually one bedeviling planners to this day—and it ultimately involves a lot more than math.

While the math in the traveling sales person has been painstakingly improved over the years, the human element is still a very large factor in real world implementations. That’s because while the most mathematically optimized route for a sales person might be visiting three accounts in one day, it doesn’t take into account the schedules of those customers he/she intends to visit, necessary employee bathroom breaks, hotel availability and the fact the sales person also wants to visit their ailing grandmother after stop two.

The traveling salesperson problem also applies to transportation optimization. And again, sometimes the math doesn’t add up for human beings. For example, at particular shipping company, optimization software showed the best route combination for delivering packages. However, there was one small catch: the most optimized route ignored Teamster and Federal safety rules of drivers needing to take pre-defined breaks, and even naps after a certain amount of hours on the road.

Modeling is getting better though. An article in Nautilus shows how transportation models are now incorporating not only the most mathematically optimized route, but also human variables such as the “happiness” of drivers. For instance, did the driver have a life event such as death in the family? Do they prefer a certain route? How reliable are they in terms of delivering the goods on time? And plenty of other softer variables.

Sometimes optimization software just flat out misses the mark. I’m reminded of a big chain retail store that tried to use software to schedule employee shifts. The algorithm looked at variables such as store busyness, employee sales figures, weather conditions, and employee preferences and then mapped out an “ideal” schedule.

Too bad the human element was missing though as some employees were scheduled 9a-1pm and then 5p-9pm the same day, essentially swallowing their mornings and evenings whole. The algorithm essentially ignored the costs of employees having to travel back and forth to work, much less the softer side of quality of life for employees struggling to balance their day around two shifts with a four hour gap in between. Rest assured that while the store employee schedule was “optimized,” employee job satisfaction took a tumble.

Lastly, online retailers are experimenting with pricing optimization in near real time. You’ve undoubtedly seen such pricing models in action; you place an item in your shopping cart, but don’t buy it. Then, a couple hours later you come back to your shopping cart and the price has jumped a few dollars. This dynamic pricing has caused some customers to cry foul, especially because to some, it feels a lot like “bait and switch.” And while dynamic online pricing is becoming more commonplace, it doesn’t mean that consumers are going to like it—especially because humans have a preference for consistency.

Thus, from pricing, employee scheduling, to trucking route optimization, the computers say one thing, but sometimes humans beg to differ. Indeed, there’s a constant push-pull between mathematics and the human element of what’s practical and reasonable. As our society becomes more numbers and computer driven and thereby “optimized,” expect such battles to continue until a comfortable equilibrium can be achieved.  That is, until the computers don’t need us anymore. Then all bets are off.

# Technologies and Analyses in CBS’ Person of Interest

Person of Interest is a broadcast television show on CBS where a “machine” predicts a person most likely to die within 24-48 hours. Then, it’s up to a mercenary and a data scientist to find that person and help them escape their fate. A straight forward plot really, but not so simple in terms of the technologies and analyses behind the scenes that could make a modern day prediction machine a reality. I have taken the liberty of framing some components that could be part of such a project.  Can you help discover more?

In Person of Interest, “the machine” delivers either a single name or group of names predicted to meet an untimely death. However, in order to predict such an event, the machine must collect and analyze reams of big data and then produce a result set, which is then delivered to “Harold” (the computer scientist).

In real life, such an effort would be a massive undertaking on a national basis, much less by state or city. However, let’s dispense with the enormities—or plausibility of such a scenario and instead see if we can identify various technologies and analyses that could make a modern day “Person of Interest” a reality.

It is useful to think of this analytics challenge in terms of a framework: data sources, data acquisition, data repository, data access and analysis and finally, delivery channels.

First, let’s start with data sources. In Person of Interest, the “machine” collects data from various sources such as interactions from: cameras (images, audio and video), call detail records, voice (landline and mobile), GPS for location data, sensor networks, and text sources (social media, web logs, newspapers, internet etc.). Data sets stored in relational databases that are publicly and not publicly available might also be used for predictive purposes.

Next, data must be assimilated or acquired into a data management repository (most likely a multi-petabyte bank of computer servers). If data are acquired in near real time, they may go into a data warehouse and/or Hadoop cluster (maybe cloud based) for analysis and mining purposes. If data are analyzed in real time, it’s possible that complex event processing technologies (i.e. streams in memory) are used to analyze data “on the fly” and make instant decisions.

Analysis can be done at various points—during data streaming (CEP), in the data warehouse after data ingest (which could be in just a few minutes), or in Hadoop (batch processed).  Along the way, various algorithms may be running which perform functions such as:

• Pattern analysis – recognizing and matching voice, video, graphics, or other multi-structured data types. Could be mining both structured and multi-structured data sets.
• Social network (graph) analysis – analyzing nodes and links between persons. Possibly using call detail records, web data (Facebook, Twitter, LinkedIn and more).
• Sentiment analysis – scanning text to reveal meaning as in when someone says; “I’d kill for that job” – do they really mean they would murder someone, or is this just a figure of speech?
• Path analysis – what are the most frequent steps, paths and/or destinations by those predicted to be in danger?
• Affinity analysis – if person X is in a dangerous situation, how many others just like him/her are also in a similar predicament?

It’s also possible that an access layer is needed for BI types of reporting, dashboard, or visualization techniques.

Finally, delivery of the result set –in this case – name of the person “the machine” predicts most likely to be killed in the next twenty four hours, could be sent to a device in the field either a mobile phone, tablet, computer terminal etc.

These are just some of the technologies that would be necessary to make a “real life” prediction machine possible, just like in CBS’ Person of Interest. And I haven’t even discussed networking technologies (internet, intranet, compute fabric etc.), or middleware that would also fit in the equation.

What technologies are missing? What types of analysis are also plausible to bring Person of Interest to life? What’s on the list that should not be? Let’s see if we can solve the puzzle together!

# Are Computers the New Boss in HR?

Too many resumes, too few job openings. What’s an employer in today’s job market to do? Turn to computers of course! Sophisticated algorithms and personality tests are the new rage in HR circles as a method to separate the “wheat from the chaff” in terms of finding the right employee. However, there is danger in relying on machines to pick the best employees for your company, especially because hiring is such a complex process full of nuance and hundreds of variables and multiple predictors of success.

The article “The New Boss: Big Data”  in Macleans – a Canadian publication – discusses the challenges for human capital professionals in using machines for the hiring process– and coincidentally has a quote or two from me.

Net, net, with hundreds if not thousands of resumes to sort through and score for one to two open positions, it does appear this is an ideal task for machines.  However, I believe a careful balance is in order between relying on machines to solve the problem and also using intuition or “gut decision making” especially to determine cultural fit.  This is a complex problem to solve where the answer isn’t machine or HR professional –but in fact, both are necessary.

# How They Fit Together: Bell Curves, Bayes and Black Swans

Probability is defined as the possibility, chance or odds of likelihood that a certain event or occurrence will take place now or in the future.  In a world where business managers like to “know the odds”, how does probabilistic thinking (Frequentism and Bayesian) mesh with extreme events (i.e. Black Swans) that just cannot be predicted?

Statisticians lament how few business managers think probabilistically. In a world awash with data, statisticians claim there are few reasons to not have a decent amount of objective data for decision making. However, there are some events for which there are no data (they haven’t occurred yet), and there are other events that could happen outside the scope of what we think is possible.

The best quote to sum up this framework for decision making comes from the former US Defense secretary Donald Rumsfeld in February 2002:

“There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – there are things we do not know we don’t know.”

Breaking this statement down, it appears Mr. Rumsfeld is speaking about Frequentism, subjective probability (Bayes) and those rare but extreme events coined by Nassim Taleb as “Black Swans”.

Author Sharon Bertsch McGrayne elucidates the first two types of probabilistic reasoning in her book “The Theory That Would Not Die”.  Frequentism (conventional statistics), she says, relies on measuring the relative frequency of an event that can be repeated time and again under the same conditions. This is the world of p-values, bell curves, coin flips, casinos and actuaries where data driven decision making is objective based on sampling or computations of large data sets.

The greater part of McGrayne’s tome concentrates on defining Bayesian Inference, or subjective probability also known as a “measure of belief”. Bayes, she says, allows making of predications with no prior information at all (no frequency of events).With Bayes, one makes an educated guess, and then keeps refining that guess based on new information, thus updating and revising the probabilities, and getting “closer to certitude.”

Getting back to Rumsfeld’s quote, Rumsfeld seems to be saying we can guess the probability of  “known knowns” because they’ve happened before and we have frequent data to support objective reasoning. These “known knowns” are Nassim Taleb’s White Swans. There are also “known unknowns” or things that have never happened before, but have entered our imaginations as possible events (Taleb’s Grey Swans). We still need probability to discern “the odds” of that event (e.g. dirty nuclear bomb in Los Angeles), so Bayes is helpful because we can infer subjective probabilities or “the possible value of unknowns” from similar situations tangential to our own predicament.

Lastly, there are “unknown unknowns”, or things we haven’t even dreamed about (Taleb’s Black Swan).  Dr. Nassim Nicholas Taleb labels this “the fourth quadrant” where probability theory has no answers.  What’s an illustration of an “unknown unknown”? Dr. Taleb gives us an example of the invention of the wheel, because no one had even though or dreamed of a wheel until it was actually invented. The “unknown unknown” is unpredictable, because—like the wheel—had it been conceived by someone, it would have been already invented.

Rumsfeld’s quote gives business managers a framework for thinking probabilistically. There are “known knowns” for which Frequentism works best, “unknown knowns” for which Bayesian Inference is the best fit, and there is a realm of “unknown unknowns” where statistics falls short, where there can be no predictions. This area outside the boundary of statistics is the most dangerous area, says Dr. Taleb, because extreme events in this sector usually carry large impacts.

This column has been an attempt to provide a decision making framework for how Frequentism, Bayes and Black Swans fit together—by using Donald Rumsfeld’s quote.

What say you, can you improve upon this framework?

# Don’t Follow Rules Based Decision Making Blindly!

A rules based, structured decision making approach works for many occasions, especially when choices and outcomes are relatively well documented and repetitive. But an exclusive focus on following pre-determined business rules (even when business conditions change) is a recipe for financial disaster.

Author Michael Lewis of Moneyball and The Big Short fame, has long critiqued decisions made by government officials and bankers in just about every country connected to the Great Recession.  In a recent Vanity Fair article titled, “It’s the Economy Dummkopf!” Lewis stays on the attack with his description of how some banks continued to invest in structured products such as collateralized debt obligations (CDOs), long after investors fled the market.

As the US housing market declined in 2006 and CDOs based on souring loans lost significant value, many investment bankers sold their vast CDO portfolios. However, one banker interviewed by Michael Lewis says that even as the market for CDOs took a turn for the worst, his firm loaded up on CDOs. Adding insult to injury, the banker says; “(The bank’s portfolio) would have gotten bigger if they had more time to buy. They were still buying when the market crashed.”

Picture this scenario: Every other company is fleeing the market and only a few are buying. Did these banks know something others did not? Was this calculated “big bet” that the market would turn and they’d make tons of profit?  Michael Lewis explains the opposite was true; “This was a mindless, rule based investment strategy”, he says. “As long as the bonds offered up by Wall Street firms abided by the rules (as designed by the banks, the bonds were purchased.)”

In search of yield, all that mattered for some banks was whether investments met a few significant criteria. Check box one, two and three, then buy.  Never mind that the market for such bonds was tumbling. In fact, Michael Lewis asserts the only thing that stopped these banks from losing more money was that the CDO market ceased to exist. “Nothing that happened—no fact, no piece of data—was going to alter their approach to investing money,” Lewis says.

Rules based decision making codifies decisions into “if –then” trees to arrive at optimum outcomes. And rules are often straightforward and inflexible because in most “normal” environments when business conditions fluctuate within prescribed volatility, the right decision can be counted on most of the time. However, focusing on the wrong metrics (in this instance “yield”) to the detriment of other considerations such as risks, ended up costing some banks billions of dollars.

Take a lesson from Michael Lewis. Follow the rules, but be wise to regularly examine changing business conditions and adjust rules based decision making accordingly.

Rules, it appears, are sometimes made to be broken.

# Zero Latency: Faster Isn’t Always Better

Vendors often promise some derivative of the term “faster” in marketing and sales literature (i.e. faster decisions, quicker time to value, rapid implementations etc…). And to be sure, in plenty of cases, speed wins especially in terms of gaining insights into markets and customers before competitors get a clue. However, when it comes to decision making, too much speed without attention to improvements in logic and business processes can be disastrous.

It’s easy to confuse “fastest” with “best”. That’s what Jennifer Hughes writes in a Financial Times article on the arena of high frequency trading (HFT). The term HFT refers to buying and selling financial instruments in microseconds with the help of supercomputers, sophisticated algorithms, and in most instances co-location of equipment near stock exchange servers. In HFT, the goal is to make profitable trades faster than competitors, and this means that massive amounts of data must be examined in real time and buy/sell decisions executed in microseconds.

While an extreme case, high frequency traders are truncating the decision making window between “event” and “action” to near zero. In the previously mentioned Financial Times article, Kevin Rogers of Deutsche Bank says; “With some parts of the market we are getting to the point where the speed of light (is the only constraint).” And certainly, if one company can spot deals and trade faster than another, microseconds can be a significant advantage in profitability.

However, while in many cases speed wins, there are concerns, especially in terms of cost. After all, throwing millions of dollars in compute power to shave off a couple of microseconds might not be worth the investment. “We’re looking at a tipping point,” says Harpal Sandu, founder of electronic trading network Integral Development. “Trading isn’t going to get much faster than a few dozen microseconds—physical machines don’t run much faster than that.”

In addition, making decisions faster than competitors is useless if careful attention is lacking in data input, decision logic (possibly manifesting in algorithm development) and continual process improvement.  Moreover, the best decision today, or even ten minutes ago, might not be the best decision tomorrow, especially because external conditions make for a moving target with governmental policy changes, mergers and acquisitions, new technology development and more.

A final consideration is fragility. In high frequency trading for example, as trading decisions move closer to zero latency, there is less opportunity to remedy a potential mistake whether it consists of a “fat finger order”, or simply a poor trading decision that a company would like to correct. Adding insult to injury, in a complex environment such as stock markets, a poor decision made quickly can cause cascading effects to other players creating a massive market disruption.

In the countdown to zero latency, the focus is currently on speed. However, the returns on faster decision making are diminishing and equal opportunity should also be given to risk management considerations, business process improvement, and monitoring of business conditions to continually upgrade and refine decision making logic.

Questions:

• Can speed drastically increase without introducing fragility?
• Does a focus on speed provide an opportunity for companies to “get better” in how they deliver products and services?

# The Next Wave in Recommendation Systems?

While some internet privacy experts fret over use of cookies and web profiles for targeted advertising, the quest for personalization is about to go much deeper as web companies create new profiling techniques based on the science of influence.

Behavioral targeting on the web using cookies, http referrer data, registered user accounts and more is about to be significantly enhanced says columnist Eli Pariser.  In the May 2011 issue of Wired Magazine, in an article titled “Mind Reading”, Pariser discusses how website recommendation and targeting algorithms; “analyze our consumption patterns and use that information to figure out (what to pitch us next).”   However, Parser notes that the next chapter for recommendation systems is to discern the best approach in influencing online shoppers to buy.

In the article, Pariser cites an experiment by a doctoral student at Stanford where online shopping sites attempted to not only track clicks and items of interest, but also determine the best way to pitch a product. For example, pitches would alternate between an “Appeals to Authority”; as in someone you respect says you’ll like this product to “Social Proof”—everyone’s buying this product, so should you!

Taking a cue from the work completed by Dr. Robert Cialdini it appears that the next wave in recommendation algorithms is to learn our “decision triggers”, or the best way to persuade us to act. In his book “Influence: Science and Practice”, Cialdini documented six decision triggers of consistency, reciprocation, social proof, liking, authority and scarcity as mental shortcuts that help humans deal with the “richness and intricacy of the outside environment.”

Getting back to the Wired Magazine article, Eli Pariser says this means that websites will hone in on the best pitch for a particular online consumer and –if effective—continue to use it.  To illustrate this concept, Pariser says; “If you respond a few times to a 50% off in the next ten minutes deal, you could find yourself surfing a web filled with blaring red headlines and countdown clocks.”

Of course, shoppers buy in various ways and not always in the same manner. However, the work of Robert Cialdini shows that in the messy and complicated lives of most consumers that mental shortcuts help with the daily deluge of information. Therefore, this new approach of recommendation systems using principles of psychology in tailoring the right way to “pitch” online shoppers, might just work.

There’s no doubt that recommendation systems already take into account principles of social proof and liking, but there’s a lot more room for improvement, especially other areas that Cialdini has researched. The answer to ‘why we buy’ is about to be taken to a whole new level.

Questions:

• What’s your take on this next development in recommendation systems? Benefit or too much “Big Brother”?
• Are you moved by “act now” exhortations? What persuasion technique/s work best on you?