Distributed Logging and Monitoring

Letâs look at logging and monitoring. If youâve ever managed a real server (like a web server) you know how vital it is to have a capture of what is going on. Thereâs a long list of reasons, not least:

To measure the performance of the system over time
To see what kinds of work are done the most, to optimize performance
To track errors and how often they occur
To do postmortems of failures
To provide an audit trail in case of dispute

Letâs scope this in terms of the problems we think weâll have to solve:

We want to track key events (such as nodes leaving and rejoining the network).
For each event, we want to track a consistent set of data: the date/time, node that observed the event, peer that created the event, type of the event itself, and other event data.
We want to be able to switch logging on and off at any time.
We want to be able to process log data mechanically, since it will be sizable.
We want to be able to monitor a running system; that is, collect logs and analyze them in real time.
We want log traffic to have minimal effect on the network.
We want to be able to collect log data at a single point on the network.

As in any design, some of these requirements are hostile to each other. For example, collecting log data in real time means sending it over the network, which will affect network traffic to some extent. However, as in any design these requirements are also hypothetical until we have running code, so we canât take them too seriously. ...

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

ZeroMQ by

Distributed Logging and Monitoring

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly