Above: A stuffed toy version of the Hadoop symbol, a yellow elephant.
Image Credit: Amazon
In the coming year, I expect — or hope — that enterprises will realize that Hadoop is not simply a new database, or simply the latest advance in data processing and everyday analytics. Instead, it is a paradigm-shifting development with broad implications for the entire business model.
Hadoop is an open source framework for distributed processing and storage of big data. According to reports by IDC and Gartner, the reality of Hadoop is that there seems to be more buzz and chatter than actual field experience.
This is not necessarily due to flaws in Hadoop itself but rather to the difficulty of deriving true value from such a complex and constantly changing framework.
Will 2015 be the year Hadoop delivers more broadly on its original promise of rich analytics on big data, or is this the beginning of another Trough of Disillusionment?
The disconnect that many enterprises experience with Hadoop stems from the tendency to view new technologies as a means of solving old problems rather than as the basis for a new way of doing things.
This has resulted in the early focus on re-deploying old-school applications — like business intelligence, data visualization, or even simple ETL processing — on the Hadoop cluster. In many cases, today’s relational database technology is more than capable of performing these functions. Even though migrating them over to Hadoop can yield significant cost savings and greater flexibility, these applications in no way reflect the true value of the platform.
Hadoop makes advanced analytics possible — the use of statistical methods, mathematical modeling, and machine learning in order to mine data for patterns and insights. It is the difference between merely summarizing data on the one hand, or, on the other, using inferential techniques to find patterns and relationships that are not already explicit in the data.
(I recently discussed what Alpine is helping enterprises accomplish with Hadoop in this podcast interview with VentureBeat’s Dylan Tweney and Jordan Novet.)
Already, leading data-centric companies like Facebook, Amazon, and LinkedIn are showing how the combination of advanced analytics and big data can be used to gain a competitive edge by providing a higher level of service and driving new revenue streams. Whether it is serving up the right banner ad, recommending the most relevant products, or fostering the best social media connections, advanced analytics on Hadoop offer greater insight than traditional analytics tools, allowing the enterprise to do truly amazing things with readily available data.
Hadoop doesn’t necessarily make advanced analytics easy, however. In many cases, it’s not even the most natural framework for machine learning (although systems like Spark and Giraph will help solve that problem).
Hadoop does provide a flexible framework for deploying parallel workflows to enable broad scalability and complex processing across a distributed infrastructure. And it’s a natural sandbox for data scientists, letting them combine data of many formats from many different sources to their heart’s content. It also provides a number of extensions and integrated tools to simplify data processing and a host of open-source satellite projects. But the end result of all that flexibility and all those technologies is that a lot of people are left wondering where to begin.
Significant challenges remain when it comes to fully leveraging Hadoop clusters and using more advanced methods to yield deeper insights and predictive models. This is why the software community needs to make significant strides in the coming year to avoid Hadoop turning into yet another boring component of a boring data warehouse.
The Holy Grail will be to prove Hadoop as the foundation of a new analytics that directly and immediately affects the way businesses run. Not just historical reports, not even just predictive models, but a platform for exploring data, producing insights, and embedding analytics into the engine of business. The challenge going forward, then, is for Hadoop app developers to step up to the plate and deliver not only a broad array of solutions designed to make Hadoop easier to use, but also to foster a cohesive environment that enables a high level of integration and orchestration.
The coming year is likely to be a pivotal one for Hadoop. Not because it will finally emerge on the enterprise radar — that has happened. But because the industry as a whole will come to define it either as a new way of doing old things or as a significant break from the past.
Steven Hillion is cofounder and Chief Product Officer of Alpine Data Labs, where he leads development of an enterprise platform for advanced analytics. Before joining Alpine, he founded the data science team at Greenplum. You can follow him on Twitter: @shillion.