Big Data Takes Center Stage at Strata + Hadoop World 2015

Strata + Hadoop World wrapped up on February 20, and this year’s four-day event focused on big data, with keynote speeches from leaders as prominent as U.S. President Barack Obama himself. During the show the President announced the appointment of D.J. Pail as Chief Data Science Officer of the U.S.A., highlighting the government’s commitment to new technologies and further cementing the importance of big data. Perhaps surprisingly, Hadoop wasn’t the star of the show, but rather a newer Apache product, Spark, quickly became the talk of the town.

Spark, which was created at Amp Labs at UC Berkely, is designed to work with Hadoop’s file system and is primarily an in-memory data-processing framework that’s faster and easier than MapReduce. In addition to its core competency, Spark also includes other projects such as it’s own in-memory file system called Tachyon, machine learning, stream processing, NoSQL, interactive SQL technologies and GraphX. Spark is being commercialized by a company called Databricks.

Pretty much everywhere, the main attraction was technology companies lining up to support Spark: Databricks (naturally), Intel, Altiscale, MemSQL, Qubole and ZoomData were among them. In the coming years we may begin to see tension between the Spark and Hadoop platforms, or possibly an evolution in how they relate and work together, but for now we expect most companies will continue to use trusted platforms like Hadoop MapReduce, Hive or Impala for their big data workloads.

Beyond the Spark excitement, several companies introduced new product and ideas, including Pivotal, Cloudera, Map R, Microsoft, HP, Oracle, and more. Here are a few key takeaways from the 3 day event:

  • Pivotal announced it is open sourcing its big data technology and essentially building its Hadoop business on top of the Hortonworks platform.
  • Cloudera announced it earned $100 million in 2014.
  • MapR announced something potentially compelling in the form of cross-data-center replication for its MapR-DB technology.
  • Microsoft announced Linux support for its HDInsight Hadoop cloud service on Azure and Python and R programming language support for its Azure ML cloud service. The company added the R language platform when in recently acquired Revolution Analytics.
  • Microsoft hopes that Azure ML will compete with IBM’s Watson Analytics deep learning platform by allowing the deployment of neural networks with a few clicks.
  • HP announced Haven Predictive Analytics that is powered by a distributed version of R develop by HP Labs. It’s nice to see HP finally entering the fray of data science.
  • Oracle announced a new analytic tool for Hadoop called Big Data Discovery, which looks like a cross between Platfora and Tableau. It will likely be used primarily by companies that already purchase Hadoop in appliance from Oracle.
  • furthered its new business intelligence platform with a handful of features designed to make the product easier to use on mobile devices. The company claims more than half of user engagement with the platform is via mobile device.
  • DataRPM claims to be taking aim at Watson with a new platform.

All in all the conference reinforced the obvious fact that big data is quickly becoming one of the most important forces in the IT field, as well as in business in general. The rapid improvement of analytics software and the increasing development of machine learning products promises to change the face of computing in the coming years, and you can be sure Collabera will be on top of every update as the field continues to mature.