O'Reilly logo

Data Just Right: Introduction to Large-Scale Data & Analytics by Michael Manoochehri

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

5. Using Hadoop, Hive and Shark to Ask Questions about Large Datasets

The concept of data warehousing has long been in the domain of large enterprises. A huge industry has developed to attack the problem of quickly asking questions about data from across an entire organization. Data warehouse practices encompass the design of complicated ETL pipelines, as well as the art of processing data from transactional databases using OLAP cubes and star schemas. This mature field is being challenged by new approaches to dealing with data warehousing challenges that, in some cases, can be more scalable and performant as well as cheaper.

The Hadoop project provides an open-source platform for distributing data processing tasks across clusters of low-cost ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required