Big Data and Discovery
In which Jill channels Christopher Columbus.
When people talk about big data they usually talk about the millions or billions of records that can be analyzed via a new crop of specialized technologies. But the focus is quickly shifting from how much data—technology is doing a fine job of keeping pace with growing data volumes—to how that data is used.
The combination of big data, in-memory computing, mobile devices, and the cloud promises the delivery of real-time insights, however you want them, wherever you are. Automobile companies are searching big data to predict hot spots in components before they fail. Government institutions have increasingly been leveraging big data—in both its granular form and in the technologies that enable it—to forecast long-term weather patterns and discover seismic activities that can predict large earthquakes. Economists are mining data to reveal how human sentiments can drive economic downturns and how human behaviors can stimulate economic growth.
The common denominator of these and other big data applications is this: Discovery. Unlike a traditional database inquiry that assumes a certain hypothesis level, mining big data reveals relationships and patterns in the data that we didn’t even know to look for. And the sooner executives support raw data exploration efforts, the sooner they’ll see payback from their big data investments.
I wrote about discovery twelve years ago in my book, e-Data (Addison Wesley, 2000). The pyramid below represented a taxonomy of analytics. The bottom layer represented the most common type of database inquiry, the standard business intelligence report, progressively evolving toward more advanced types of analytics with successively lower hypotheses:
The pyramid is capped by what I called Knowledge Discovery, the detection of patterns in data. As I wrote back then:
These patterns are too specific and seemingly arbitrary to specify, and the analyst would be playing a perpetual guessing-game trying to figure out all the possible patterns in the database. Instead, special knowledge discovery software tools find the patterns and tell the analyst what—and where—they are.
Hence, you could be mining data on breast cancer cells expecting to see trends in cell proliferation rates. But, to your surprise, you also discover that surrounding non-cancerous cells are also contributing to cancer cell growth. The Stanford University researchers who made this discovery didn't know to look at the non-cancerous cells. But through low-hypothesis exploration, they found it.
Most companies have mastered the pyramid’s bottom two layers. Indeed many senior managers cite the third tier, representing predictive analytics, as the logical next step in their quest to be data-driven. But few companies possess the right combination of skills, technologies, and new delivery models to reach the pinnacle.
Executives assume there’s no time (let alone budget) for knowledge discovery. Indeed, the very term suggests an academic exercise with no tangible business payback. But as the above examples show, big data discovery efforts can result in startling and highly-actionable findings. A retailer we work with loaded twelve years’ worth of purchase transactions into a Hadoop cluster to uncover new relationships in the data that had gone unnoticed. The company discovered new correlations between products that ended up together in shoppers’ carts. The findings drove innovative product placement and shelf space management decisions. The revenue uplift per shopping cart was averaged at 16 percent in the first month of the trial. Executives were convinced. It’s the apocryphal “beer and diapers” legend writ large on the retailing bottom line.
Armed with newfound understanding of big data’s potential, business executives need to not only allow knowledge discovery efforts, they need to promote them. Fostering a culture of discovery means allotting budget money and resources for big data proofs-of-concept and surrendering expectations for their outcomes. It also means training the new batch of aptly-named data scientists to leverage the big data technologies that enable such discovery, and then translating the findings into business actions whose outcomes are then measured. In essence, running discovery trials on big data is a continuous loop, where the results may feed more traditional business intelligence, or drive additional discovery tests.
Sometimes this means isolating big data efforts from traditional analytics programs where delivery processes and organizational roles are already entrenched. Recently a commercial lines insurer reassigned senior data analysts from various lines of business to staff a temporary work effort to explore new attributes for fraudsters, mining hundreds of terabytes of social network interactions, customer profiles, and claims history. The team found that “loose affiliations” with low-income friends was an indicator a higher propensity to file fraudulent claims. The group of analysts evolved into an informal knowledge discovery SWAT team that reconvened whenever new data types or business processes invited fresh discovery efforts.
Knowledge discovery may force business executives to do an about-face, agreeing to revise team configurations or support discovery activities that will yield new business insights. Such activities, typically requiring quick-hit efforts of highly-skilled experts, have traditionally been prohibited by managers who considered them “skunkworks” projects. But these concentrated, intensive projects can reveal unknown customer behaviors, product affinities, financial risk patterns, and other findings that end up funding the initial discovery work many times over.
The sooner business executives understand the value of knowledge discovery, the more likely they can mobilize their organizations, introduce or revise analysis processes, and hire skilled resources that can ultimately differentiate them from their competitors. Indeed it’s through these low-hypothesis, high-reward surprises that companies can innovate and begin to thrive anew.
A shorter version of this article was originally posted on the Harvard Business Review site. Check out the original post, see what others are saying, and add your own comment there. This will honor the editorial work of HBR.