Archiving time-based data

While dealing with time-based data, it is often noticed that the most useful data is that of the present. This makes the old data less relevant for our purposes. So as time progresses, the relevancy of past data falls very rapidly and the data we indexed exists without being used in the clusters. This situation is not very resource friendly, as there would be much unused data stored for no or less purpose.

We can visualize different levels of archiving, as follows:

  1. Keep the hottest index in the machines that have good hardware (shard filtering).
  2. Run the optimized API on indices where writing is done.
  3. Close indices that are not required for instant search.
  4. Take a snapshot and archive older indices.
  5. Finally, remove indices that ...

Get Elasticsearch Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.