Lighting a Spark Beneath Big Data

Art by Matt Vascellaro.

The so-called Big Data industry is a jumble of open-source software projects with such odd mascots that it resembles a box of animal crackers. There’s Hadoop (an elephant), Pig (see Porky, The) and Hive (an elephant’s head and a hornet’s body). The latest flavor is called Spark, whose logo is (mercifully) just a spark.

Spark has become wildly popular with developers in a short time, gathering more than 400 contributors last year, according to Open Hub, which tracks activity around open-source projects. It promises to compute data up to 100 times faster than the standard methods used for bits sitting inside Hadoop, the software suite sold by Cloudera and Hortonworks.

Adatao, a startup with $13 million in funding from Lightspeed Venture Partners and Andreessen Horowitz, is one of the early firms building an app that uses Spark to analyze data stored in Hadoop file systems.

Adatao CEO Christopher Nguyen argues that Hadoop forms a foundational “storage layer” for data that companies could one day unlock using Spark’s computing speed. Mr. Nguyen talked with The Information about Spark’s promise and the direction the industry is headed. Edited excerpts follow.