Greenplum table distribution and partitioning

In the following section, we will define table distribution in Greenplum context and detail the other related aspects of distribution, like data skew.

Distribution

Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy.

Every table in Greenplum has a data distribution method, the DISTRIBUTED BY clause helps define the distribution strategy. We need to ensure that there is no data skew introduced on any of the segment hosts as a result of the distribution key defined.

There are two methods of distributing table data across segment hosts:

  • Column oriented/Hash distribution: This is a distribution mechanism that considers ...

Get Getting Started with Greenplum for Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.