Part 7: Dodging Big Data’s Big Problems
Part 7: Dodging Big Data’s Big Problems
By Matt Polsky and Claire Sommer
This spring the Guardian newspaper released leaked materials from former NSA contractor Edward Snowden. The bombshell revelation is that the U.S. government has been systematically and secretly collecting, storing and looking for patterns of possible terrorism planning in Americans’ phone calls and emails and on a massive scale.
This kind of intrusion is possible because computers can physically collect and analyze what used to be unthinkably large data sets. The field of Big Data, which we’ll discuss below, likely will prove a boon to the world in general and to business in particular. But as with every measurement concept, there are pitfalls to avoid as well.
Pitfall: Forgoing the why for the what
Big Data means collecting huge amounts of information, then using it to learn things about your subject that you couldn’t have detected with smaller amounts of information and less sophisticated analytical methods.
The book “Big Data: A Revolution That Will Transform How We Live, Work, and Think” by Kenneth Cukier and Viktor Mayer-Schoenberger explains that Big Data shifts how we think about information in three ways:
1. Using all the data available, not just sample sets (“all not some”)
2. Data is typically less regular (“messy not clean”)
3. Moving away from causation to correlation. In a nutshell, the reason Big Data is a big deal is that it makes it easier to make better predictions.
As reported by GreenBiz, the business world has started to use Big Data to capture much larger energy efficiency gains than was possible before. Joel Makower wrote here that these gains are cumulative and extend far beyond the factory wall: “Collecting and analyzing all of that data will enable utilities and grid managers — as well as their customers — to ensure a steady and reliable energy supply, predict rates, and make decisions accordingly. That, in turn, will better manage existing power plants, reducing the need for new ones and reducing emissions overall.”
Cukier and Mayer-Schoenberger offer several intriguing examples where Big Data’s predictive abilities shine, such as UPS keeping its truck fleets running through a counter-intuitive maintenance calendar, and, in the social area, preemptively catching life-threatening diseases in premature infants.
But in seeking the what (what is going to happen?), the authors point out a danger in forgetting to search out the why (why is this outcome happening?).
Sometimes you don’t need the why, only the what. The trick is in knowing when it’s essential to get to a root cause and the risks if you don’t. As an example of why, “Why?” is important to ask, but we note that climate change conversations across the U.S. today focus (if at all) on the “what” of adaptation planning. Choosing to stay in the “what” of limited outcomes and apparent protective measures means we can keep kicking the far thornier “why” conversation of climate change mitigation down the road.
Pitfall: Hitching correlation with causation inappropriately
The authors also caution that Big Data’s phenomenal potential for predicting outcomes creates ripe conditions for the familiar behavioral pitfall of blurring correlation and causation. This shift in thinking “represents a move away from always trying to understand the deeper reasons behind how the world works to simply learning about an association among phenomena and using that to get things done,” the authors say.
An example is how Google researches flu outbreaks by analyzing keyword searches. While it is incredibly valuable to see where more people are thinking about flu at a particular moment, it is a mistake to think that each person googling “nearest drugstore” is currently ill. Correlation is not, as many professors often have told us, causation.
Google’s Flu Trends
Now, correlation is not a bad thing at all. Sustainable business metrics practitioners, perhaps without knowing it, are quite familiar with it, and perhaps comfortable with its inherent limitations. And, quite surprisingly, correlation’s reputation recently has gotten an upgrade. The topline: While correlation may not infer causation, it’s a good place to start to look for it.
That said, it is important to keep in mind, and in your communications, what those profs had to say even as we get much better at finding correlations. And if you really do need to know causal factors, better consult with a social scientist.
Pitfall: Using a sledgehammer when a flyswatter suffices
A related issue Big Data faces is applying it to any and all situations. A New York Times essay called “The Limits of Big Data in the Big City” describes instances where simpler, low-tech solutions that engage people — such as email chains — are superior to Big Data. Simply asking a community what it wants may trump a computer’s ability to predict an outcome.
Similarly, there’s a kind of bizarre irony that human resources departments use Big Data for recruiting and hiring decisions. This article describes how Big Data is being used to find specialized high-tech workers, the proverbial needles in the haystack. What gets lost in the hunt are the things that can’t be measured (or at least not yet), such as gut instinct: “When you remove humans from complex decision-making, you can optimize the hell out of the algorithm, but at what cost?”
Sometimes there is no substitute for an old-fashioned, comprehensive plowing through of resumes to find the gold. Some inefficiency may be worth holding onto.
Pitfall: When governments (or anyone else) go too far
Tacking back to Snowden’s disclosures, and even earlier concerns, Cukier and Mayer-Schoenberger say, “Another worry is what could happen when governments put too much trust in the power of data.” The “Big Brother” angle raises very real muddles about the proper balance in democracies between privacy and protection. Is this balance now shifting? Is it acceptable to monitor communications of friendly nations? Many of us are now engaged in a public debate on all this and the very difficult decisions ahead for our society. These questions are fundamental to what democracy will mean in the future. As Cukier and Mayer-Schoenberger write:
In his 1999 book, Seeing Like a State, the anthropologist James Scott documented the ways in which governments, in their zeal for quantification and data collection, sometimes end up making people’s lives miserable. They use maps to determine how to reorganize communities without first learning anything about the people who live there. They take all the imperfect, organic ways in which people have interacted over time and bend them to their needs, sometimes just to satisfy a desire for quantifiable order.
We need to learn how to deal better with the opportunities, tensions and complexities posed by Big Data in our personal, professional and civic lives. And set ground rules, as it were.
There are clear analogues here for the business world in that Big Data inevitably will surface (and to some degree, already has) many of the same privacy and fairness issues playing out in Washington. This is even before the coming era of private drones, which are likely to bring with them new privacy issues.
What customer information is fair game? What level of disclosure of data mining should be required? Are privacy and many social norms now almost totally gone? Is this an acceptable price to pay? And what does sustainability bring to this issue? Is it possible to find creative and strategic solutions that are fair and ethical even if some inefficiency has to be tolerated?
Conclusions: Don’t be blocked by Big Data
Many people are very high on Big Data. Perhaps they are right to be. Like many of the earlier pitfalls, but even more so here, are there things this super-powered use of numbers might be blocking us from seeing? As Big Data lets the forest become more understandable (both metaphorically and literally), will we miss more lessons from the trees?
The increasing emphasis on data, technology and efficiency will not make it any easier to ignore the still commonly downplayed social and equity side of sustainability. But perhaps, if privacy and the other above concerns with Big Data are faced with foresight, creativity and an enhanced sense of fairness, we might find that they actually help us move towards sustainability, surprising the skeptics among us.
And then we might possibly avoid the common fate of earlier breakthrough technologies: one step forward, followed by a half step back — at the least.