Partial denormalization

Our initial approach to home timelines, which used the existing, fully normalized data structure that we've already built, is technically viable but will perform very poorly at scale. If I follow F users and want a page of size P for my home timeline, Cassandra will need to do the following:

  • Query F partitions for P rows each
  • Perform an ordered merge of FxP rows in order to retrieve only the most recent P

The most distressing part of this is the fact that both operations grow in complexity proportionally with the number of people I follow. Let's start by trying to fix this.

The basic goal of the home timeline is to show me the most recent status updates that matter to me. Instead of doing all the work to find out what status ...

Get Learning Apache Cassandra now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.