The Privacy War Has Begun

It started innocently enough….

My iPhone just upgraded itself to iOS 14.6, and the privacy protection purge began.

In late April,  Apple added App Tracking Transparency (ATT) to iOS (actually in 14.5 but for reasons mentioned in this Forbes article, I hadn’t noticed the change until the most recent update). Now, whenever I launch an app that is part of the online ad ecosystem, I’m asked whether I want to share data to enable tracking. I always opt out.

These alerts have been generally benign. They reference benefits like “more relevant ads,” a “customized experience” and “helping to support us.” Some assume you’re opting in and opting out is a much more circuitous and time-consuming process. Most also avoid the words “tracking” and “privacy.” One referred to it in these terms: “Would you allow us to refer to your activity?”

My answer is always no. Why would I want to customize an annoyance and make it more relevant?

All in all, it’s a deceptively innocent wrapper to put on what will prove to be a cataclysmic event in the world of online advertising. No wonder Facebook is fighting it tooth and nail, as I noted in a recent post.

This shot across the bow of online advertising marks an important turning point for privacy. It’s the first time that someone has put users ahead of advertisers. Everything up to now has been lip service from the likes of Facebook, telling us we have complete control over our privacy while knowing that actually protecting that privacy would be so time-consuming and convoluted that the vast majority of us would do nothing, thus keeping its profitability flowing through the pipeline.

The simple fact of the matter is that without its ability to micro-target, online advertising just isn’t that effective. Take away the personal data, and online ads are pretty non-engaging. Also, given our continually improving ability to filter out anything that’s not directly relevant to whatever we’re doing at the time, these ads are very easy to ignore.

Advertisers need that personal data to stand any chance of piercing our non-attentiveness long enough to get a conversion. It’s always been a crapshoot, but Apple’s ATT just stacked the odds very much against the advertiser.

It’s about time. Facebook and online ad platforms have had little to no real pushback against the creeping invasion of our privacy for years now. We have no idea how extensive and invasive this tracking has been. The only inkling we get is when the targeting nails the ad delivery so well that we swear our phone is listening to our conversations. And, in a way, it is. We are constantly under surveillance.

In addition to Facebook’s histrionic bitching about Apple’s ATT, others have started to find workarounds, as reported on 9 to 5 Mac. ATT specifically targets the IDFA (Identified for Advertisers), which offers cross app tracking by a unique identifier. Chinese ad networks backed by the state-endorsed Chinese Advertising Association were encouraging the adoption of CAID identifiers as an alternative to IDFA. Apple has gone on record as saying ATT will be globally implemented and enforced. While CAID can’t be policed at the OS level, Apple has said that apps that track users without their consent by any means, including CAID, could be removed from the App Store.

We’ll see. Apple doesn’t have a very consistent track record with it comes to holding the line against Chinese app providers. WeChat, for one, has been granted exceptions to Apple’s developer restrictions that have not been extended to anyone else.

For its part, Google has taken a tentative step toward following Apple’s lead with its new privacy initiative on Android devices, as reported in Slash Gear. Google Play has asked developers to share what data they collect and how they use that data. At this point, they won’t be requiring opt-in prompts as Apple does.

All of this marks a beginning. If it continues, it will throw a Kong-sized monkey wrench into the works of online advertising. The entire ecosystem is built on ad-supported models that depend on collecting and storing user data. Apple has begun nibbling away at that foundation.

The toppling has begun.

Why Free News is (usually) Bad News

Pretty much everything about the next week will be unpredictable. But whatever happens on Nov. 3, I’m sure there will be much teeth-gnashing and navel-gazing about the state of journalism in the election aftermath.

And there should be. I have written much about the deplorable state of that particular industry. Many, many things need to be fixed. 

For example, let’s talk about the extreme polarization of both the U.S. population and their favored news sources. Last year about this time, the PEW Research Center released a study showing that over 30% of Americans distrust their news sources. 

But what’s more alarming is, when we break this down by Republicans versus Democrats, only 27% of Democrats didn’t trust the news for information about politics or elections. With Republicans, that climbed to a whopping 67%. 

The one news source Republicans do trust? Fox News. Sixty-five percent of them say Fox is reliable. 

And that’s a problem.

Earlier this year, Ad Fontes Media came out with its Media Bias Chart. It charts major news and media channels on two axes: source reliability and political bias. The correlation between bias and reliability is almost perfect. The further a news source is out to the right or left, the less reliable it is.

How does Fox fare? Not well. Ad Fontes separates Fox TV from Fox Online. Fox Online lies on the border between being “reliable for news, but high in analysis/opinion content” and “some reliability issues and/or extremism.” Fox TV falls squarely in the second category.

I’ve written before that media bias is not just a right-wing problem. Outlets like CNN and MSNBC show a significant left-leaning bias. But CNN Online, despite its bias, still falls within the “Most Reliable for News” category. According to Ad Fontes, MSNBC has the same reliability issues as Fox.

The question that has to be asked is “How did we get here?”  And that’s the question tackled head-on in a new book, “Free is Bad,” by John Marshall.

I’ve known Marshall for ages. He has covered a lot of the things I’ve been writing about in this column. 

“It is difficult to get a man to understand something, when his salary depends on his not understanding it.” 

Upton Sinclair

The problem here is one of incentive. Our respective media heads didn’t wake up one morning and say, “You know what we need to be? A lot more biased!” They have walked down that path step by step, driven by the need to find a revenue model that meets their need for profitability. 

When we talk about our news channels, the obvious choice to be profitable is to be supported by ads. And to be supported by ads, you have to be able to target those ads. One of the most effective targeting strategies is to target by political belief, because it comes reliably bundled with a bunch of other beliefs that makes it very easy to predict behaviors. And that makes these ads highly effective in converting prospects.

This is how we got to where we are. But there are all types of ways to prop up your profit through selling ads. Some are pretty open and transparent. Some are less so. And that brings us to a particularly interesting section of Marshall’s book. 

John Marshall is a quant geek at heart. He has been a serial tech entrepreneur — and, in one of those ventures, built a very popular web analytics platform. He also has intimate knowledge of how the sausages are made in the ad-tech business. He knows sketchy advertising practices when he sees them. 

Given all of this, Marshall was able to undertake a fascinating analysis of the ads we see on various news platforms that dovetails nicely with the Ad Fontes chart. 

Marshall created the Ad Shenanigans chart. Basically, he did a forensic analysis of the advertising approaches of various online news platforms. He was looking for those that gathered data about their users, sold traffic to multiple networks, featured clickbait chumboxes and other unsavory practices. Then he ranked them accordingly.

Not surprisingly, there’s a pretty strong correlation between reputable reporting and business ethics. Highly biased and less reputable sites on the Ad Fontes Bias Chart (Breitbart, NewsMax, and Fox News) all can also be found near the top of Marshall’s Ad Shenanigans Chart. Those that do seem to have some ethics when it comes to the types of ads they run also seem to take objective journalism seriously. Case in point, The Guardian in the UK and ProPublica in the U.S.

The one anomaly in the group seems to be CNN. While it does fare relatively well on reputable reporting according to Ad Fontes, CNN appears to be willing to do just about anything to turn a buck. It ranks just a few slots below Fox in terms of “ad shenanigans.”

Marshall also breaks out those platforms that have a mix of paid firewalls and advertising. While there are some culprits in the mix such as the Daily Caller, Slate and the National Review, most sites that have some sort of subscription model seem to be far less likely to fling the gates of their walled gardens open to the ethically challenged advertising hordes. 

All of this drives home Marshall’s message: When it comes to the quality of your news sources, free is bad. As soon as something costs you nothing, you are no longer the customer. You’re the product. Invisible hand market forces are no longer working for you. They are working for the advertiser. And that means they’re working against you if you’re looking for an unbiased, quality news source.

Data does NOT Equal People

We marketers love data. We treat it like a holy grail: a thing to be worshipped. But we’re praying at the wrong altar. Or, at the very least, we’re praying at a misleading altar.

Data is the digital residue of behavior. It is the contrails of customer intent — a thin, wispy proxy for the rich bandwidth of the real world. It does have a purpose, but it should be just one tool in a marketer’s toolbox. Unfortunately, we tend to use it as a Swiss army knife, thinking it’s the only tool we need.

The problem is that data is seductive. It’s pliable and reliable, luring us into manipulation because it’s so easy to do. It can be twisted and molded with algorithms and spreadsheets.

But it’s also sterile. There is a reason people don’t fit nicely into spreadsheets. There are simply not enough dimensions and nuances to accommodate real human behavior.

Data is great for answering the questions “what,” “who,” “when” and “where.” But they are all glimpses of what has happened. Stopping here is like navigating through the rear-view mirror.

Data seldom yields the answer to “why.” But it’s why that makes the magic happen, that gives us an empathetic understanding that helps us reliably predict future behaviors.

Uncovering the what, who, when and where makes us good marketers. But it’s “why” that makes us great. It’s knowing why that allows us to connect the distal dots, hacking out the hypotheses that can take us forward in the leaps required by truly great marketing. As Tom Goodwin, the author of “Digital Darwinism,” said in a recent post, “What digital has done well is have enough of a data trail to claim, not create, success.”

We as marketers have to resist stopping at the data. We have to keep pursuing why.

Here’s one example from my own experience. Some years ago, my agency did an eye-tracking study that looked at gender differences in how we navigate websites.

For me, the most interesting finding to fall out of the data was that females spent a lot more time than males looking at a website’s “hero” shot, especially if it was a picture that had faces in it. Males quickly scanned the picture, but then immediately moved their eyes up to the navigation menu and started scanning the options there. Females lingered on the graphic and then moved on to scan text immediately adjacent to it.

Now, I could have stopped at “who” and “what,” which in itself would have been a pretty interesting finding. But I wanted to know “why.” And that’s where things started to get messy.

To start to understand why, you have to rely on feelings and intuition. You also have to accept that you probably won’t arrive at a definitive answer. “Why” lives in the realm of “wicked” problems, which I defined in a previous column as “questions that can’t be answered by yes or no — the answer always seems to be maybe.  There is no linear path to solve them. You just keep going in loops, hopefully getting closer to an answer but never quite arriving at one. Usually, the optimal solution to a wicked problem is ‘good enough – for now.’”

The answer to why males scan a website differently than females is buried in a maze of evolutionary biology, social norms and cognitive heuristics. It probably has something to do with wayfinding strategies and hardwired biases. It won’t just “fall out” of data because it’s not in the data to begin with.

Even half-right “why” answers often take months or even years of diligent pursuit to reveal themselves. Given that, I understand why it’s easier to just focus on the data. It will get you to “good,” and maybe that’s enough.

Unless, of course, you’re aiming to “put a ding in the universe,” as Steve Jobs said in an inspirational commencement speech at Stanford University. Then you have to shoot for great.

What the Hell is “Time Spent” with Advertising Anyway?

Over at MediaPost’s Research Intelligencer, Joe Mandese is running a series of columns that are digging into a couple of questions:

  • How much time are consumers spending with advertising; and,
  • How much is that time worth.

The quick answers are 1.84 hours daily and about $3.40 per hour.

Although Joe readily admits that these are ‘back of the envelope” calculations, regular Mediapost reader and commentator Ed Papazian points out a gaping hole in the logic of these questions: an hour of being exposed to ads does not equal an hour spent with those ads and it certainly doesn’t mean an hour being aware of the ads.

Ignoring this fundamental glitch is symptomatic of the conceit of the advertising business in general. They believe there is a value exchange possible where paying consumers to watch advertising is related to the effectiveness of that advertising. The oversimplification required to rationalize this exchange is staggering. It essentially ignores the fields of cognitive psychology and neuroscience. It assumes that audience attention is a simple door that can be opened if only the price is right.

It just isn’t that simple.

Let’s go back to the concept of time spent with media. There are many studies done that quantify this. But the simple truth is that media is too big a catchall category to make this quantification meaningful. We’re not even attempting to compare apples and oranges. We’re comparing an apple, a jigsaw and a meteor. The cognitive variations alone in how we consume media are immense.

And while I’m on a rant, let’s nuke the term “consumption” all together, shall we? It’s probably the most misleading word ever coined to define our relationship with media. We don’t consume media any more than we consume our physical environment. It is an informational context within which we function. We interact with aspects of it with varying degrees of intention. Trying to measure all these interactions with a single yardstick is the same as trying to measure our physical interactions with water, oxygen, gravity and an apple tree by the same criterion.

Even trying to dig into this question has a major methodological flaw – we almost never think about advertising. It is usually forced on our consciousness. So to use a research tools like a survey – requiring respondents to actively consider their response – to explore our subconscious relationship with advertising is like using a banana to drive a nail. It’s the wrong tool for the job. It’s the same as me asking you how much you would pay per hour to have access to gravity.

This current fervor all comes from a prediction from Publicis Groupe Chief Growth Officer Rishad Tobaccowala that the supply of consumer attention would erode by 20% to 30% in the next five years. Tobaccowala – by putting a number to attention – led to the mistaken belief that it’s something that could be managed by the industry. The attention of your audience isn’t slipping away because advertising and media buying was mismanaged. It’s slipping away because your audience now has choices, and some of those choices don’t include advertising. Let’s just admit the obvious. People don’t want advertising. We only put up with advertising when we have no choice.

“But wait,” the ad industry is quick to protest, “In surveys people say they are willing to have ads in return for free access to media. In fact, almost 80% of respondents in a recent survey said that they prefer the ad-supported model!”

Again, we have the methodological fly in the ointment. We’re asking people to stop and think about something they never stop and think about. You’re not going to get the right answer. A better answer would be to think about what happens when you get the pop up when you go to a news site with your ad-blocker on. “Hey,” it says, “We notice you’re using an ad-blocker.” If you have the option of turning the ad-blocker off to see the article or just clicking a link that let’s you see it anyway, which are you going to choose? That’s what I thought. And you’re probably in the ad business. It pays your mortgage.

Look, I get that the ad business is in crisis. And I also understand why the industry is motivated to find an answer. But the complexity of the issue in front of us is staggering and no one is served well by oversimplifying it down to putting a price tag on our attention. We have to understand that we’re in an industry where – given the choice – people would rather not have anything to do with us. Unless we do that, we’ll just be making the same mistakes over and over again.

 

 

The Rise of the Audience Marketplace

Far be it from me to let a theme go before it has been thoroughly beaten to the ground. This column has hosted a lot of speculation on the future of advertising and media buying and today, I’ll continue in that theme.

First, let’s return to a column I wrote almost a month ago about the future of advertising. This was a spin-off on a column penned by Gary Milner – The End of Advertising as We Know It. In it, Gary made a prediction: “I see the rise of a global media hub, like a stock exchange, which will become responsible for transacting all digital programmatic buys.”

Gary talked about the possible reversal of fragmentation of markets by channel and geographic area due to the potential centralization of digital media purchasing. But I see it a little differently than Gary. I don’t see the creation of a media hub – or, at least – that wouldn’t be the end goal. Media would simply be the means to the end. I do see the creation of an audience market based on available data. Actually, even an audience would only be the means to an end. Ultimately, we’re buying one thing – attention. Then it’s our job to create engagement.

The Advertising Research Foundation has been struggling with measuring engagement for a long time now. But it’s because they were trying to measure engagement on a channel-by-channel basis and that’s just not how the world works anymore. Take search, for example. Search is highly effective at advertising, but it’s not engaging. It’s a connecting medium. It enables engagement, but it doesn’t deliver it.

We talk multi-channel a lot, but we talk about it like the holy grail. The grail in this cause is an audience that is likely to give us their attention and once they do that – is likely to become engaged with our message. The multi-channel path to this audience is really inconsequential. We only talk about multi-channel now because we’re stopping short of the real goal, connecting with that audience. What advertising needs to do is give us accurate indicators of those two likelihoods: how likely are they to give us their attention and what is their potential proclivity towards our offer. The future of advertising is in assembling audiences – no matter what the channel – that are at a point where they are interested in the message we have to deliver.

This is where the digitization of media becomes interesting. It’s not because it’s aggregating into a single potential buying point – it’s because it’s allowing us to parallel a single prospect along a path of persuasion, getting important feedback data along the way. In this definition, audience isn’t a static snapshot in time. It becomes an evolving, iterative entity. We have always looked at advertising on an exposure-by-exposure basis. But if we start thinking about persuading an audience that paradigm needs to be shifted. We have to think about having the right conversation, regardless of the channel that happens to be in use at the time.

Our concept of media happens to carry a lot of baggage. In our minds, media is inextricably linked to channel. So when we think media, we are really thinking channels. And, if we believe Marshall McLuhan, the medium dictates the message. But while media has undergone intense fragmentation they’ve also become much more measurable and – thereby – more accountable. We know more than ever about who lies on the other side of a digital medium thanks to an ever increasing amount of shared data. That data is what will drive the advertising marketplace of the future. It’s not about media – it’s about audience.

In the market I envision, you would specify your audience requirements. The criteria used would not be so much our typical segmentations – demography or geography for example. These have always just been proxies for what we really care about; their beliefs about our product and predicted buying behaviors. I believe that thanks to ever increasing amounts of data we’re going to make great strides in understanding the psychology of consumerism. These  will be foundational in the audience marketplace of the future. Predictive marketing will become more and more accurate and allow for increasingly precise targeting on a number of behavioral criteria.

Individual channels will become as irrelevant as the manufacturer that supplies the shock absorbers and tie rods in your new BMW. They will simply be grist for the mill in the audience marketplace. Mar-tech and ever smarter algorithms will do the channel selection and media buying in the background. All you’ll care about is the audience you’re targeting, the recommended creative (again, based on the mar-tech running in the background) and the resulting behaviors. Once your audience has been targeted and engaged, the predicted path of persuasion is continually updated and new channels are engaged as required. You won’t care what channels they are – you’ll simply monitor the progression of persuasion.

 

The Coming Data Marketplace

The stakes are currently being placed in the ground. The next great commodity will be data and you can already sense the battle beginning the heat up.

Consumer data will be generated by connections. Those connections will fall into two categories: broad and deep. Both will generate data points that will become critical to businesses looking to augment their own internal data.

First, broad data is the domain of Google, Apple, Amazon, eBay and Facebook. Their play is it to stretch their online landscape as broadly as possible, generating thousands of new potential connections with the world at large. Google’s new “Buy” button is a perfect example of this. Adding to the reams of conversion data Google already collects, the “Buy” button means that Google will control even more transactional landscape. They’re packaging it with the promise of an improved mobile buying experience, but the truth is that purchases will be consummated on Google controlled territory, allowing them to harvest the rich data that will be generated from millions of individual transactions across every conceivable industry category. If Google can control a critical mass of connected touch points across the online landscape, they can get an end-to-end view of purchase behavior. The potential of that data is staggering.

In this market, data will be stripped of identity and aggregated to provide a macro but anonymous view of market behaviors. As the market evolves, we’ll be able to subscribe to data services that will provide real time views of emerging trends and broad market intelligence that can be sliced and diced in thousands of ways. Of course, Google (and their competitors) will have a free hand to use all this data to offer advertisers new ways to target ever more precisely.

This particular market is an online territory grab. It relies on a broad set of touch points with as many people across as many devices as possible. The more territory that is covered, the more comprehensive the data set.

The other data market will run deep. Consider the new health tracking devices like Fitbit, Garmin’s VivoActive and Apple’s iWatch. Focused purpose hardware and apps will rely on deep relationships with users. The more reliant you become on these devices, the more valuable the data collected will become. But this data comes with a caveat – unlike the broad data market, this data should not be striped of its identity. The value of the data comes from its connection with an individual. Therefore, that individual has to be an active participant in any potential data marketplaces. The data collector will act more as a data middleman – brokering matches between potential customers and vendors. If the customer agrees, they can choose to release the data to the vendor (or at least, a relevant subset of the data) in order to individualize the potential transaction.

As the data marketplace evolves, expect an extensive commercial eco-system to emerge. Soon, there will be a host of services that will take raw data and add value through interpretation, aggregation and filtering. Right now, the onus for data refinement falls on the company who is attempting to embrace Big Data marketing. As we move forward, expect an entire Big Data value chain to emerge. But it will all rely on players like Google, Amazon and Apple who have the front line access to the data itself. Just as natural resources provided the grist that drove the last industrial revolution, expect data to be the resource that fuels the next one.

The Persona is Dead, Long Live the Person

First, let me go on record as saying up to this point, I’ve been a fan of personas. In my past marketing and usability work, I used personas extensively as a tool. But I’m definitely aware that not everyone is equally enamored with personas. And I also understand why.

Personas, like any tool, can be used both correctly and incorrectly. When used correctly, they can help bridge the gap between the left brain and the right brain. They live in the middle ground between instinct and intellectualism. They provide a human face to raw data.

But it’s just this bridging quality that tends to lead to abuse. On the instinct side, personas are often used as a short cut to avoid quantitative rigor. Data driven people typically hate personas for this reason. Often, personas end up as fluffy documents and life sized cardboard cutouts with no real purpose. It seems like a sloppy way to run things.

On the intellectual side, because quant people distrust personas, they also leave themselves squarely on data side of the marketing divide. They can understand numbers – people not so much. This is where personas can shine. At their best, they give you a conceptual container with a human face to put data into. It provides a richer but less precise context that allows you to identify, understand and play out potential behaviors that data alone may not pinpoint.

As I said, because personas are intended as a bridging tool, they often remain stranded in no man’s land. To use them effectively, the practitioner should feel comfortable living in this gap between quant and qual. Too far one way or the other and it’s a pretty safe bet that personas will either be used incorrectly or be discarded entirely.

Because of this potential for abuse, maybe it’s time we threw personas in the trash bin. I suspect they may be doing more harm than good to the practice of marketing. Even at their best, personas were meant as a more empathetic tool to allow you to thing through interactions with a real live person in mind. But in order to make personas play nice with real data, you have to be very diligent about continually refining your personas based on that data. Personas were never intended to be placed on a shelf. But all too often, this is exactly what happens. Usually, personas are a poor and artificial proxy for real human behaviors. And this is why they typically do more harm than good.

The holy grail of marketing would be to somehow give real time data a human face. If we could find a way to bridge left brain logic and right brain empathy in real time to discover insights that were grounded in data but centered in the context of a real person’s behaviors, marketing would take a huge leap forward. The technology is getting tantalizingly close to this now. It’s certainly close enough that it’s preferable to the much abused persona. If – and this is a huge if – personas were used absolutely correctly they can still add value. But I suspect that too much effort is spent on personas that end up as documents on a shelf and pretty graphics. Perhaps that effort would be better spent trying to find the sweet spot between data and human insights.

Why Cognitive Computing is a Big Deal When it comes to Big Data

IBM-Watson

Watson beating it’s human opponents at Jeopardy

When IBM’s Watson won against humans playing Jeopardy, most of the world considered it just another man against machine novelty act – going back to Deep Blue’s defeat of chess champion Garry Kasporov in 1997. But it’s much more than that. As Josh Dreller reminded us a few Search Insider Summits ago, when Watson trounced Ken Jennings and Brad Rutter in 2011, it ushered in the era of cognitive computing. Unlike chess, where solutions can be determined solely with massive amounts of number crunching, winning Jeopardy requires a very nuanced understanding of the English language as well as an encyclopedic span of knowledge. Computers are naturally suited to chess. They’re also very good at storing knowledge. In both cases, it’s not surprising that they would eventually best humans. But parsing language is another matter. For a machine to best a man here requires something quite extraordinary. It requires a machine that can learn.

The most remarkable thing about Watson is that no human programmer wrote the program that made it a Jeopardy champion. Watson learned as it went. It evolved the winning strategy. And this marks a watershed development in the history of artificial intelligence. Now, computers have mastered some of the key rudiments of human cognition. Cognition is the ability to gather information, judge it, make decisions and problem solve. These are all things that Watson can do.

 

Peter Pirolli - PARC

Peter Pirolli – PARC

Peter Pirolli, one of the senior researchers at Xerox’s PARC campus in Palo Alto, has been doing a lot of work in this area. One of the things that has been difficult for machines has been to “make sense” of situations and adapt accordingly. Remember, a few columns ago where I talked about narratives and Big Data, this is where Monitor360 uses a combination of humans and computers – computers to do the data crunching and humans to make sense of the results. But as Watson showed us, computers do have to potential to make sense as well. True, computers have not yet matched humans in the ability to sense make in an unlimited variety of environmental contexts. We humans excel at quick and dirty sense making no matter what the situation. We’re not always correct in our conclusions but we’re far more flexible than machines. But computers are constantly narrowing the gap and as Watson showed, when a computer can grasp a cognitive context, it will usually outperform a human.

Part of the problem machines face when making sense of a new context is that the contextual information needs to be in a format that can be parsed by the computer. Again, this is an area where humans have a natural advantage. We’ve evolved to be very flexible in parsing environmental information to act as inputs for our sense making. But this flexibility has required a trade-off. We humans can go broad with our environmental parsing, but we can’t go very deep. We do a surface scan of our environment to pick up cues and then quickly pattern match against past experiences to make sense of our options. We don’t have the bandwidth to either gather more information or to compute this information. This is Herbert Simon’s Bounded Rationality.

But this is where Big Data comes in. Data is already native to computers, so parsing is not an issue. That handles the breadth issue. But the nature of data is also changing. The Internet of Things will generate a mind-numbing amount of environmental data. This “ambient” data has no schema or context to aid in sense making, especially when several different data sources are combined. It requires an evolutionary cognitive approach to separate potential signal from noise. Given the sheer volume of data involved, humans won’t be a match for this task. We can’t go deep into the data. And traditional computing lacks the flexibility required. But cognitive computing may be able to both handle the volume of environmental Big Data and make sense of it.

If artificial intelligence can crack the code on going both broad and deep into the coming storm of data, amazing things will certainly result from it.

The Human Stories that Lie Within Big Data

storytelling-boardIf I wanted to impress upon you the fact that texting and driving is dangerous, I could tell you this:

In 2011, at least 23% of auto collisions involved cell phones. That’s 1.3 million crashes, in which 3331 people were killed. Texting while driving makes it 23 times more likely that you’ll be in a car accident.

Or, I could tell you this:

In 2009, Ashley Zumbrunnen wanted to send her husband a message telling him “I love you, have a good day.” She was driving to work and as she was texting the message, she veered across the centerline into oncoming traffic. She overcorrected and lost control of her vehicle. The car flipped and Ashley broke her neck. She is now completely paralyzed.

After the accident, Zumbrunnen couldn’t sit up, dress herself or bath. She was completely helpless. Now a divorced single mom, she struggles to look after her young daughter, who recently said to her “I like to go play with your friends, because they have legs and can do things.”

The first example gave you a lot more information. But the second example probably had more impact. That’s because it’s a story.

We humans are built to respond to stories. Our brains can better grasp messages that are in a narrative arc. We do much less well with numbers. Numbers are an abstraction and so our brains struggle with numbers, especially big numbers.

One company, Monitor360, is bringing the power of narratives to the world of big data. I chatted with CEO Doug Randall recently about Monitor360’s use of narratives to make sense of Big Data.

“We all have filters through which we see the world. And those filters are formed by our experiences, by our values, by our viewpoints. Those are really narratives. Those are really stories that we tell ourselves.”

For example, I suspect the things that resonated with you with Ashley’s story were the reason for the text – telling her husband she loved him – the irony that the marriage eventually failed after her accident and the pain she undoubtedly felt when her daughter said she likes playing with other moms who can still walk. All of those things, while they don’t really add anything to our knowledge about the incidence rate of texting and driving accidents, are all things that strike us at a deeply emotional level because we can picture ourselves in Ashley’s situation. We empathize with her. And that’s what a story is, a vehicle to help us understand the experiences of another.

Monitor360 uses narratives to tap into these empathetic hooks that lie in the mountain of information being generating by things like social media. It goes beyond abstract data to try to identify our beliefs and values. And then it uses narratives to help us make sense of our market. Monitor360 does this with a unique combination of humans and machines.

“A computer can collect huge amounts of data and the compute can even sort that data. But “sense making” is still very, very difficult for computers to do. So human beings go through that information, synthesize that information and pull out what the underlying narrative is.”

Monitor360 detects common stories in the noisy buzz of Big Data. In the stories we tell, we indicate what we care about.

“This is what’s so wonderful about Big Data. The Data actually tells us, by volume, what’s interesting. We’re taking what are the most often talked about subjects…the data is actually telling us what those subjects are. We then go in and determine what the underlying belief system in that is.”

Monitor360’s realization that it’s the narratives that we care about is an interesting approach to Big Data. It’s also encouraging to know that they’re not trying to eliminate human judgment from the equation. Empathy is still something we can trump computers at.

At least for now.

The Bug in Google’s Flu Trend Data

First published March 20, 2014 in Mediapost’s Search Insider

Last year, Google Flu Trends blew it. Even Google admitted it. It over predicted the occurrence of flu by a factor of almost 2:1.  Which is a good thing for the health care system, because if Google’s predictions had have been right, we would have had the worst flu season in 10 years.

Here’s how Google Flu Trends works. It monitors a set of approximately 50 million flu related terms for query volume. It then compares this against data collected from health care providers where Influenza-like Illnesses (ILI) are mentioned during a doctor’s visit. Since the tracking service was first introduced there has been a remarkably close correlation between the two, with Google’s predictions typically coming within 1 to 2 percent of the number of doctor’s visits where the flu bug is actually mentioned. The advantage of Google Flu Trends is that it is available about 2 weeks prior to the ILI data, giving a much needed head start for responsiveness during the height of flu season.

FluBut last year, Google’s estimates overshot actual ILI data by a substantial margin, effectively doubling the size of the predicted flu season.

Correlation is not Causation

This highlights a typical trap with big data – we tend to start following the numbers without remembering what is generating the numbers. Google measures what’s on people’s minds. ILI data measures what people are actually going to the doctor about. The two are highly correlated, but one doesn’t not necessarily cause the other. In 2013, for instance, Google speculated that increased media coverage might be the cause for the overinflated predictions. More news coverage would have spiked interest, but not actual occurrences of the flu.

Allowing for the Human Variable

In the case of Google Flu Trends, because it’s using a human behavior as a signal – in this case online searching for information – it’s particularly susceptible to network effects and information cascades. The problem with this is that these social signals are difficult to rope into an algorithm. Once they reach a tipping point, they can break out on their own with no sign of a rational foundation. Because Google tracks the human generated network effect data and not the underlying foundational data, it is vulnerable to these weird variables in human behavior.

Predicting the Unexpected

A recent article in Scientific American pointed out another issue with an over reliance on data models –  Google Flu Trends completely missed the non-seasonal H1N1 pandemic in 2009. Why? Algorithmically, Google wasn’t expecting it. In trying to eliminate noise from the model, they actually eliminated signal coming during an unexpected time. Models don’t do very well at predicting the unexpected.

Big Data Hubris

The author of the Scientific American piece, associate editor Larry Greenemeier, nailed another common symptom of our emerging crush on data analytics – big data hubris. We somehow think the quantitative black box will eliminate the need for more mundane data collection – say – actually tracking doctor’s visits for the flu. As I mentioned before, the biggest problem with this is that the more we rely on data, which often takes the form of arm’s length correlated data, the further we get from exploring causality. We start focusing on “what” and forget to ask “why.”

We should absolutely use all the data we have available. The fact is, Google Flu Trends is a very valuable tool for health care management. It provides a lot of answers to very pertinent questions. We just have to remember that it’s not the only answer.