Recovery and Late Joiners

As it stands now, FileMQ has one major remaining problem: it provides no way for clients to recover from failures. The scenario is that a client, connected to a server, starts to receive files, and then disconnects for some reason. The network may be too slow, or break. The client may be on a laptop that is shut down, then resumed. The WiFi may be disconnected. As we move to a more mobile world (see Chapter 8), this use case becomes more and more frequent. In some ways it’s becoming a dominant use case.

In the classic ØMQ publish-subscribe pattern, there are two strong underlying assumptions, both of which are usually wrong in FileMQ’s real world: first, that data expires very rapidly, so there’s no interest in asking for old data; and second, that networks are stable and rarely break (so it’s better to invest more in improving the infrastructure and less in addressing recovery).

Take any FileMQ use case, and you’ll see that if the client disconnects and reconnects, it should get anything it missed. A further improvement would be to recover from partial failures, like HTTP and FTP do. But one thing at a time.

One answer to recovery is “durable subscriptions.” The first drafts of the FILEMQ protocol aimed to support this, with client identifiers that the server could hold onto and store so that if a client reappeared after a failure, the server would know what files it had not received.

Stateful servers are, however, nasty to make and difficult ...

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.