2011 was the “coming out” year for data science and big data. As the field matures in 2012, what can we expect over the course of the year?
This year has seen consolidation and engineering around improving the basic storage and data processing engines of NoSQL and Hadoop. That will doubtless continue, as we see the unruly menagerie of the Hadoop universe increasingly packaged into distributions, appliances and on-demand cloud services. Hopefully it won’t be long before that’s dull, yet necessary, infrastructure.
Looking up the stack, there’s already an early cohort of tools directed at programmers and data scientists (Karmasphere, Datameer), as well as Hadoop connectors for established analytical tools such as Tableau and R. But there’s a way to go in making big data more powerful: that is, to decrease the cost of creating experiments.
Here are two ways in which big data can be made more powerful.
Better programming language support. As we consider data, rather than business logic, as the primary entity in a program, we must create or rediscover idiom that lets us focus on the data, rather than abstractions leaking up from the underlying Hadoop machinery. In other words: write shorter programs that make it clear what we’re doing with the data. These abstractions will in turn lend themselves ...