“Big Data” is one of those phrases — like Internet of Things or the Cloud — that is beyond mainstream in enterprise IT today. Yet, much like these terms, it’s the elephant in the room; no one wants to admit that they can’t quite put their finger on its exact definition.
Sure, there is a lot of data out there, and its quantity and complexity increase every year. But what makes “big data” a thing and what makes it different from just “a whole lot of data”?
The Answer Isn’t That Simple
You can turn to Google, Wikipedia, and the top analysts in the world, all of whom will give you varying answers dancing around the same issue. For the purposes of this article, we’ll stick to the one definition of big data that I found most helpful. It’s from a 2011 McKinsey report and goes like this: “Datasets whose size are beyond the ability of typical database software tools to capture, store, manage, and analyze.” If we were going to add an extra word to that definition, I would recommend complexity or variety. The data I’m talking about isn’t only difficult to maintain because of size alone; it is also comes from many different sources in many different forms.
Sure, the 2011 definition is still a bit ambiguous statement, but it gets to the root of big data by defining it in terms of the challenges (and headaches) it creates for IT: This is data that simply cannot be processed or analyzed without adding some powerful tools to your toolkit.
The X-Factor: Hadoop to the Rescue for Processing Data
One such tool is Hadoop, an open source project from Apache that allows for distributed processing of large data sets of multiple and disparate varieties. According to Aberdeen research, 75% of surveyed respondents are using Hadoop in this capacity for automated data capture, and with good reason: The average organization’s total data volume grows by 30% year-over-year, with many exceeding 100% growth. If you want to know what big data is (and why you have to deal with it), there’s a stat for you right there.
So, yes, Hadoop has the ability to capture these massive, exponentially expanding data sets. But how are organizations effectively whittling down this “big” data into a “small” picture?
For most organizations with Hadoop, exploiting the full potential of Big Data is dependent on first prepping the data for use (something for which nearly half of our surveyed users actually have no current capabilities), and then combining Hadoop with other analytical technologies to make sense out of a bunch of disparate “nonsense.”
And this approach is reaping the benefits: Those surveyed by Aberdeen (175 organizations) found that with data prep tools for Hadoop:
- A whopping 121% more users per capita were satisfied with the sophistication and firepower of analytical tools
- 53% more users per capita improved the speed of decision-making over the past two years
- 22% had higher likelihoods of obtaining pertinent information within the decision window
Analytical tools that can be layered upon this prepped data in Hadoop include enterprise search, used in concert with text analytics and social media monitoring tools, allowing for a better grasp of the massive amounts of tweets, blogs, and customer reviews across every inch of the Internet.
Don’t Worry Too Much About What Big Data ‘Means’
Leading organizations shouldn’t get too hung up on the definition of big data — it’s likely not going to be a fruitful discussion (or produce agreed upon definition). The bigger fish for organizations to fry is figuring out how they’ll capture, store, manage, and analyze the otherwise unmanageable. Beyond this challenge, it is even more critical for organizations to figure out how to transform something massive, complex, and never-ending into something that’s searchable and digestible. In other words, how to change “big” data into “useful” data.
For more information on how Best-in-Class organizations are using Hadoop and managing the challenges of big data across their enterprise, check out all of Aberdeen’s latest research on Analytics & Business Intelligence.