Test Results

Yes, we broke the code. Several times, in fact. This was satisfying. I’ll work through the different things we found.

Getting nodes to agree on consistent group status was the most difficult. Every node needs to track the group membership of the whole network, as I already explained in the section “Group Messaging.” Group messaging is a publish-subscribe pattern. JOINs and LEAVEs are analogous to subscribe and unsubscribe messages. It’s essential that none of these ever get lost, or we’ll find nodes dropping randomly off groups.

So, each node counts the total number of JOINs and LEAVEs it’s ever done, and broadcasts this status (as a 1-byte rolling counter) in its UDP beacon. Other nodes pick up the status and compare it to their own calculations, and if there’s a difference, the code asserts.

The first problem was that UDP beacons get delayed randomly, so they’re useless for carrying the status. When a beacon arrives late, the status is inaccurate and we get a false negative. To fix this we moved the status information into the JOIN and LEAVE commands. We also added it to the HELLO command. The logic then becomes:

  • Get initial status for a peer from its HELLO command.

  • When getting a JOIN or LEAVE from a peer, increment the status counter.

  • Check that the new status counter matches the value in the JOIN or LEAVE command.

  • If it doesn’t, assert.

The next problem we got was that messages were arriving unexpectedly on new connections. The Harmony pattern connects, ...

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.