Learning about Big Data from Big Brother

icreach-search-illo-feature-hero-bYou may not have heard of ICREACH, but it has probably heard of you. ICREACH is the NSA’s own Google-like search engine.  And if Google’s mission is to organize the world’s information, ICREACH’s mission is to snoop on the world.  After super whistle blower Edward Snowden tipped the press off to the existence of ICREACH, the NSA fessed up last month. The amount of data we’re talking about is massive. According to The Intercept website, the tool can handle two to five billion new records every day, including data on the US’s emails, phone calls, faxes, Internet chats and text messages. It’s Big Brother meets Big Data.

I’ll leave aside for the moment the ethical aspect of this story.  What I’ll focus on is how the NSA deals with this mass of Big Data and what it might mean for companies who are struggling to deal with their own Big Data dilemmas.

Perhaps no one deals with more big data than the Intelligence Community. And Big Data is not new for them. They’ve been digging into data trying to find meaningful signals amongst the noise for decades. Finally, the stakes of successful data analysis are astronomically high here. Not only is it a matter of life and death – a failure to successfully connect the dots can lead to the kinds of nightmares that will haunt us for the rest of our lives. When the pressure is on to this extent, you can be sure that they’ve learned a thing or two. How the Intelligence community handles data is something I’ve been looking at recently. There are a few lessons to be learned here.

Owned Data vs Environmental Data

The first lesson is that you need different approaches for different types of data. The Intelligence Community has their own files, which include analyst’s reports, suspect files and other internally generated documentation. Then you have what I would call “Environmental” data. This includes raw data gathered from emails, phone calls, social media postings and cellphone locations. Raw data needs to be successfully crunched, screened for signals vs. noise and then interpreted in a way that’s relevant to the objectives of the organization. That’s where…

You Need to Make Sense of the Data – at Scale

Probably the biggest change in the Intelligence community has been to adopt an approach called “Sense making.”  Sense making really mimics how we, as humans, make sense of our environment. But while we may crunch a few hundred or thousand sensory inputs at any one time, the NSA needs to crunch several billion signals.

Human intuition expert Gary Klein has done much work in the area of sense making. His view of sense making relies on the existence of a “frame” that represents what we believe to be true about the world around us at any given time.  We constantly update that frame based on new environmental inputs.  Sometimes they confirm the frame. Sometimes they contradict the frame. If the contradiction is big enough, it may cause us to discard the frame and build a new one. But it’s this frame that allows us to not only connect the dots, but also to determine what counts as a dot. And to do this…

You Have to Be Constantly Experimenting

Crunching of the data may give you the dots, but there will be multiple ways to connect them. A number of hypothetical “frames” will emerge from the raw data. You need to test the validity of these hypotheses. In some cases, they can be tested against your own internally controlled data. Sometimes they will lie beyond the limits of that data. This means adopting a rigorous and objective testing methodology.  Objective is the key word here, because…

You Need to Remove Human Limitations from the Equation

When you look at the historic failures of Intelligence gathering, the fault usually doesn’t lie in the “gathering.” The signals are often there. Frequently, they’re even put together into a workable hypothesis by an analyst. The catastrophic failures in intelligence generally arise because some one, somewhere, made an intuitive call to ignore the information because they didn’t agree with the hypothesis. Internal politics in the Intelligence Community has probably been the single biggest point of failure. Finally…

Data Needs to Be Shared

The ICREACH project came about as a way to allow broader access to the information required to identify warning signals and test out hunches. ICREACH opens up this data pool to nearly two-dozen U.S. Government agencies.

Big Data shouldn’t replace intuition. It should embrace it. Humans are incredibly proficient at recognizing patterns. In fact, we’re too good at it. False positives are a common occurrence. But, if we build an objective way to validate our hypotheses and remove our irrational adherence to our own pet theories, more is almost always better when it comes to generating testable scenarios.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.