Module: PyHeartBeat—Detecting Inactive Computers

Credit: Nicola Larosa

When we have a number of computers connected by a TCP/IP network, we are often interested in monitoring their working state. The pair of programs presented in Example 10-1 and Example 10-2 help you detect when a computer stops working, while having minimal impact on network traffic and requiring very little setup. Note that this does not monitor the working state of single, specific services running on a machine, just that of the TCP/IP stack and the underlying operating system and hardware components.

PyHeartBeat is made up of two files: PyHBClient.py sends UDP packets, while PyHBServer.py listens for such packets and detects inactive clients. The client program, running on any number of computers, periodically sends an UDP packet to the server program that runs on one central computer. In the server program, one thread dynamically builds and updates a dictionary that stores the IP numbers of the client computers and the timestamp of the last packet received from each. At the same time, the main thread of the server program periodically checks the dictionary, noting whether any of the timestamps is older than a defined timeout.

In this kind of application, there is no need to use reliable TCP connections, since the loss of a packet now and then does not produce false alarms, given that the server-checking timeout is kept suitably larger than the client-sending period. On the other hand, if we have hundreds of computers to monitor, it is best to keep the bandwidth used and the load on the server at a minimum. We do this by periodically sending a small UDP packet, instead of setting up a relatively expensive TCP connection per client.

The packets are sent from each client every 10 seconds, while the server checks the dictionary every 30 seconds, and its timeout defaults to the same interval. These parameters, along with the server IP number and port used, can be adapted to one’s needs.

Also note that the debug printouts can be turned off by using the -O option of the Python interpreter, as that option sets the _ _debug_ _ variable to 0. However, some would consider this usage overcute and prefer a more straightforward and obvious approach: have the scripts accept either a -q flag (to keep the script quiet, with verbosity as the default) or a -v flag (to make it verbose, with quiet as the default). The getopt standard module makes it easy for a Python script to accept optional flags of this kind.

Example 10-1 shows the PyHBClient.py heartbeat client program, which should be run on every computer on the network, while Example 10-2 shows the heartbeat server program, PyHBServer.py, which should be run on the server computer only.

Example 10-1. PyHeartBeat client

""" PyHeartBeat client: sends an UDP packet to a given server every 10 seconds.

Adjust the constant parameters as needed, or call as:
    PyHBClient.py serverip [udpport]
"""

from socket import socket, AF_INET, SOCK_DGRAM
from time import time, ctime, sleep
import sys

SERVERIP = '127.0.0.1'    # local host, just for testing
HBPORT = 43278            # an arbitrary UDP port
BEATWAIT = 10             # number of seconds between heartbeats

if len(sys.argv)>1:
    SERVERIP=sys.argv[1]
if len(sys.argv)>2:
    HBPORT=sys.argv[2]

hbsocket = socket(AF_INET, SOCK_DGRAM)
print "PyHeartBeat client sending to IP %s , port %d"%(SERVERIP, HBPORT)
print "\n*** Press Ctrl-C to terminate ***\n"
while 1:
    hbsocket.sendto('Thump!', (SERVERIP, HBPORT))
    if _ _debug_ _:
        print "Time: %s" % ctime(time(  ))
    sleep(BEATWAIT)

Example 10-2. PyHeartBeat server

""" PyHeartBeat server: receives and tracks UDP packets from all clients.

While the BeatLog thread logs each UDP packet in a dictionary, the main
thread periodically scans the dictionary and prints the IP addresses of the
clients that sent at least one packet during the run, but have
not sent any packet since a time longer than the definition of the timeout.

Adjust the constant parameters as needed, or call as:
    PyHBServer.py [timeout [udpport]]
"""

HBPORT = 43278
CHECKWAIT = 30

from socket import socket, gethostbyname, AF_INET, SOCK_DGRAM
from threading import Lock, Thread, Event
from time import time, ctime, sleep
import sys

class BeatDict:
    "Manage heartbeat dictionary"

    def _ _init_ _(self):
        self.beatDict = {}
        if _ _debug_ _:
            self.beatDict['127.0.0.1'] = time(  )
        self.dictLock = Lock(  )

    def _ _repr_ _(self):
        list = ''
        self.dictLock.acquire(  )
        for key in self.beatDict.keys(  ):
            list = "%sIP address: %s - Last time: %s\n" % (
                list, key, ctime(self.beatDict[key]))
        self.dictLock.release(  )
        return list

    def update(self, entry):
        "Create or update a dictionary entry"
        self.dictLock.acquire(  )
        self.beatDict[entry] = time(  )
        self.dictLock.release(  )

    def extractSilent(self, howPast):
        "Returns a list of entries older than howPast"
        silent = []
        when = time(  ) - howPast
        self.dictLock.acquire(  )
        for key in self.beatDict.keys(  ):
            if self.beatDict[key] < when:
                silent.append(key)
        self.dictLock.release(  )
        return silent

class BeatRec(Thread):
    "Receive UDP packets, log them in heartbeat dictionary"

    def _ _init_ _(self, goOnEvent, updateDictFunc, port):
        Thread._ _init_ _(self)
        self.goOnEvent = goOnEvent
        self.updateDictFunc = updateDictFunc
        self.port = port
        self.recSocket = socket(AF_INET, SOCK_DGRAM)
        self.recSocket.bind(('', port))

    def _ _repr_ _(self):
        return "Heartbeat Server on port: %d\n" % self.port

    def run(self):
        while self.goOnEvent.isSet(  ):
            if _ _debug_ _:
                print "Waiting to receive..."
            data, addr = self.recSocket.recvfrom(6)
            if _ _debug_ _:
                print "Received packet from " + `addr`
            self.updateDictFunc(addr[0])

def main(  ):
    "Listen to the heartbeats and detect inactive clients"
    global HBPORT, CHECKWAIT
    if len(sys.argv)>1:
        HBPORT=sys.argv[1]
    if len(sys.argv)>2:
        CHECKWAIT=sys.argv[2]

    beatRecGoOnEvent = Event(  )
    beatRecGoOnEvent.set(  )
    beatDictObject = BeatDict(  )
    beatRecThread = BeatRec(beatRecGoOnEvent, beatDictObject.update, HBPORT)
    if _ _debug_ _:
        print beatRecThread
    beatRecThread.start(  )
    print "PyHeartBeat server listening on port %d" % HBPORT
    print "\n*** Press Ctrl-C to stop ***\n"
    while 1:
        try:
            if _ _debug_ _:
                print "Beat Dictionary"
                print `beatDictObject`
            silent = beatDictObject.extractSilent(CHECKWAIT)
            if silent:
                print "Silent clients"
                print `silent`
            sleep(CHECKWAIT)
        except KeyboardInterrupt:
            print "Exiting."
            beatRecGoOnEvent.clear(  )
            beatRecThread.join(  )

if _ _name_ _ == '_ _main_ _':
    main(  )

See Also

Documentation for the standard library modules socket, threading, and time in the Library Reference; Jeff Bauer has a related program using UDP for logging information known as Mr. Creosote (http://starship.python.net/crew/jbauer/creosote/); UDP is described in UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, Second Edition, by W. Richard Stevens (Prentice-Hall, 1998); for the truly curious, the UDP protocol is described in the two-page RFC 768 (http://www.ietf.org/rfc/rfc768.txt), which, when compared with current RFCs, shows how much the Internet infrastructure has evolved in 20 years.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.