Big data and cloud technology go hand-in-hand. Big data needs clusters of servers for processing, which clouds can readily provide. So goes the marketing message, but what does that look like in reality? Both “cloud” and “big data” have broad definitions, obscured by considerable hype. This article breaks down the landscape as simply as possible, highlighting what’s practical, and what’s to come.
What is often called “cloud” amounts to virtualized servers: computing resource that presents itself as a regular server, rentable per consumption. This is generally called infrastructure as a service (IaaS), and is offered by platforms such as Rackspace Cloud or Amazon EC2. You buy time on these services, and install and configure your own software, such as a Hadoop cluster or NoSQL database. Most of the solutions I described in my Big Data Market Survey can be deployed on IaaS services.
Using IaaS clouds doesn’t mean you must handle all deployment manually: good news for the clusters of machines big data requires. You can use orchestration frameworks, which handle the management of resources, and automated infrastructure tools, which handle server installation and configuration. RightScale offers a commercial multi-cloud management platform that mitigates some of the problems of managing servers in the cloud.