The distributed TensorFlow setup

In this section, we will explore the mechanisms through which computation in TensorFlow can be distributed. The first step in running distributed TensorFlow is to specify the architecture of the cluster using tf.train.ClusterSpec:

import tensorflow as tf

cluster = tf.train.ClusterSpec({"ps": ["localhost:2222"],\
                                "worker": ["localhost:2223",\
                                           "localhost:2224"]})

Nodes are typically divided into two jobs: parameter servers (ps), which host variables, and workers, which perform heavy computation. In the preceding code, we have one parameter server and two workers, as well as the IP address and port of each node.

Then we have to build a tf.train.Server for each parameter server and worker, previously defined:

ps = tf.train.Server(cluster, ...

Get Deep Learning with TensorFlow - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.