Heartbeating for Paranoid Pirate

For Paranoid Pirate, we chose the second approach. It might not have been the simplest option: if designing this today, I’d probably try a ping-pong approach instead. However, the principles are similar. The heartbeat messages flow asynchronously in both directions, and either peer can decide the other is “dead” and stop talking to it.

In the worker, this is how we handle heartbeats from the queue:

  • We calculate a liveness, which is how many heartbeats we can still miss before deciding the queue is dead. It starts at three and we decrement it each time we miss a heartbeat.

  • We wait in the zmq_poll() loop for one second each time, which is our heartbeat interval.

  • If there’s any message from the queue during that time, we reset our liveness to three.

  • If there’s no message during that time, we count down our liveness.

  • If the liveness reaches zero, we consider the queue dead.

  • If the queue is dead, we destroy our socket, create a new one, and reconnect.

  • To avoid opening and closing too many sockets, we wait for a certain interval before reconnecting, and we double the interval each time until it reaches 32 seconds.

And this is how we handle heartbeats to the queue:

  • We calculate when to send the next heartbeat; this is a single variable because we’re talking to one peer, the queue.

  • In the zmq_poll() loop, whenever we pass this time, we send a heartbeat to the queue.

Here’s the essential heartbeating code for the worker:

#define HEARTBEAT_LIVENESS  3       // 3-5 ...

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.