Distributed joins

With relational databases, we write different data entities in their own tables, and then we join them to form the desired view at query time. If we apply this idea to a database like Cassandra, we end up with a distributed join.

New Cassandra developers, especially those who come from a relational database background, are particularly prone to follow this pattern. In the previous chapter, we mentioned that denormalization is key to successful data modeling in Cassandra, and our discussion of secondary indices can help explain the reasons for this.

Note

If you find yourself querying multiple large tables, then joining them in your application based on some shared key, you are performing a distributed join. This should almost always ...

Get Cassandra High Availability now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.