O'Reilly logo
  • Abhishek Singh thinks this is interesting:

Once the cleaner thread builds the offset map, it will start reading off the clean segments, starting with the oldest, and check their contents against the offset map. For each message it checks, if the key of the message exists in the offset map. If the key does not exist in the map, the value of the message we’ve just read is still the latest and we copy over the message to a replacement segment. If the key does exist in the map, we omit the message because there is a message with an identical key but newer value later in the partition. Once we’ve copied over all the messages that still contain the latest value for their key, we swap the replacement segment for the original and move on to the next segment. At the end of ...

From

Cover of Kafka: The Definitive Guide

Note

Compaction will only keep one message for the key. Which is that key ? Can the key be repeated ? And what if all the messages are required ? How will compaction help there ?