One-Way Heartbeats

A second option is to send a heartbeat message from each node to its peers every second or so. When one node hears nothing from another within some timeout (several seconds, typically), it will treat that peer as dead. Sounds good, right? Sadly, no. This works in some cases but has nasty edge cases in others.

For pub-sub, this approach does work, and it’s the only model you can use. SUB sockets cannot talk back to PUB sockets, but PUB sockets can happily send “I’m alive” messages to their subscribers.

As an optimization, you can send heartbeats only when there is no real data to send. Furthermore, you can send heartbeats at progressively longer intervals, if network activity is an issue (e.g., on mobile networks where activity drains the battery). As long as the recipient can detect a failure (a sharp stop in activity), that’s fine.

Here are the typical problems with this design:

  • It can be inaccurate when we send large amounts of data, as heartbeats will be delayed behind that data. If heartbeats are delayed, you can get false timeouts and disconnections due to network congestion. Thus, always treat any incoming data as a heartbeat, whether or not the sender optimizes out heartbeats.

  • While the pub-sub pattern will drop messages for disappeared recipients, PUSH and DEALER sockets will queue them. So if you’ve send heartbeats to a dead peer and it comes back to life, it will get all the heartbeats you’ve sent, which can be thousands. Whoa, whoa!

  • This design ...

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.