Establishing the Details

Several espressos later, we want to jump into writing code, but a little voice tells us to get more details before making a sensational solution to entirely the wrong problem. “What kind of work is the cloud doing?” we ask. The client explains:

  • Workers run on various kinds of hardware, but they are all able to handle any task. There are several hundred workers per cluster, and as many as a dozen clusters in total.

  • Clients create tasks for workers. Each task is an independent unit of work, and all the client wants is to find an available worker and send it the task as soon as possible. There will be a lot of clients and they’ll come and go arbitrarily.

  • The real difficulty is to be able to add and remove clusters at any time. A cluster can leave or join the cloud instantly, bringing all its workers and clients with it.

  • If there are no workers in its own cluster, a client’s tasks will go off to other available workers in the cloud.

  • Clients send out one task at a time, waiting for a reply. If they don’t get an answer within X seconds, they’ll just send out the task again. This isn’t our concern; the client API does it already.

  • Workers process one task at a time; they are very simple beasts. If they crash, they get restarted by whatever script started them.

So we double-check to make sure that we understood this correctly:

  • “There will be some kind of super-duper network interconnect between clusters, right?” we ask. The client says, “Yes, of course, ...

Get ZeroMQ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.