Tracing Activity

To debug the kinds of problems we saw here, we need extensive logging. There’s a lot happening in parallel, but every problem can be traced down to a specific exchange between two nodes, consisting of a set of events that happen in strict sequence. We know how to make very sophisticated logging, but as usual it’s wiser to make just what we need, no more. We have to capture:

  • The time and date for each event

  • In which node the event occurred

  • The peer node, if any

  • What the event was (e.g., which command arrived)

  • Event data, if any

The very simplest technique is to print the necessary information to the console, with a timestamp. That’s the approach I used. Then it’s simple to find the nodes affected by a failure, filter the log file for only messages referring to them, and see exactly what happened.

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.