Detecting Disappearances
Heartbeating sounds simple, but itâs not. UDP packets get dropped when thereâs a lot of TCP traffic, so if we depend on UDP beacons weâll get false disconnections. TCP traffic can be delayed for 5, 10, even 30 seconds if the network is really busy. So if we kill peers when they go quiet, weâll have false disconnections.
Since UDP beacons arenât reliable, itâs tempting to add in TCP beacons. After all, TCP will deliver them reliably. However, thereâs one little problem. Imagine you have 100 nodes on a network, and each node sends a TCP beacon once a second. Each beacon is 22 bytes, not counting TCPâs framing overhead. That is 100 * 99 * 22 bytes per second, or 217,000 bytes/second just for heartbeating. Thatâs about 1â2% of a typical WiFi networkâs ideal capacity, which sounds OK. But when a network is stressed, or fighting other networks for airspace, that extra 200K a second will break whatâs left. UDP broadcasts are at least low cost.
So what we do is switch to TCP heartbeats only when a specific peer hasnât sent us any UDP beacons in a while. And then, we send TCP heartbeats only to that one peer. If the peer continues to be silent, we conclude itâs gone away. If the peer comes back, with a different IP address and/or port, we have to disconnect our DEALER socket and reconnect to the new port.
This gives us a set of states for each peer, though at this stage the code doesnât use a formal state machine:
Peer visible thanks to ...
Get ZeroMQ now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.