Thursday, May 8, 2014

Big Data is not a noun

The economist has an article out The backlash against big data pointing out recent reports and comments criticizing the concept.

“BOLLOCKS”, says a Cambridge professor. “Hubris,” write researchers at Harvard. “Big data is bullshit,” proclaims Obama’s reelection chief number-cruncher. A few years ago almost no one had heard of “big data”. Today it’s hard to avoid—and as a result, the digerati love to condemn it. Wired, Time, Harvard Business Review and other publications are falling over themselves to dance on its grave. “Big data: are we making a big mistake?,” asks the Financial Times. “Eight (No, Nine!) Problems with Big Data,” says the New York Times. What explains the big-data backlash?  The 
The article more or less gets it right in agreeing with the specifics of the criticism while maintaining that there is still lots to love about Big Data. I appreciate the shout-out to astronomical surveys as the place where the term originated, though it came from journalists not us scientists.

One thing that unquestioningly does amount to Big Data is all the nonsense written about it by writers and media outlets riding the hype cycle. And yes, hopefully that is starting to crest. If you were planning on writing a book full of vague cheerleading for Big Data revolutionizing the world, you might be out of luck.

I think most of the confusion and mis-information could be avoided if people just stopped using Big Data as a noun. It's not a noun. It's an adjective. There are Big Data technologies and Big Data developers and Big Data architectures. But there is really nothing called Big Data. Big Data isn't going to change the world because an adjective can't do anything. People do things with data. That's called either computation or statistics.

I'm fine with the term Data Science to describe the emerging occupation of applying statistics, data visualization and machine learning to business problems. That's actually a noun and can in principle do great things. Big Data technologies like Hadoop, Spark, the cloud and NoSQL databases have also been very successful and also helps us do data science faster, cheaper and better on data sets large enough to require distributed storage and processing. But these technologies are just about handling data so that people with good ideas about how to use the data can work more productively.



No comments:

Post a Comment