Columnar databases

Relational databases traditionally persist each row of data contiguously, meaning that each row will be stored in sequential blocks on disk. This type of database is referred to as row-oriented. For operations involving typical statistical aggregations such as calculating the average of a particular attribute, the effect of row-oriented databases is that every attribute in that row is read during processing, regardless of whether they are relevant to the query or not. In general, row-oriented databases are best suited for transactional workloads, also known as online transaction processing (OLTP), where individual rows are frequently written to and where the emphasis is on processing a large number of relatively simple ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.