Debugging and monitoring

Logging, monitoring, profiling, and debugging distributed systems, as discussed in Chapter 7, Testing and Debugging Distributed Applications, even today is not an easy task, especially when using languages other than C, C++, and Fortran. There is not much more to say here other than the fact that there is an important vacuum to be filled.

Most medium-to-large teams end up developing their own custom solutions based on log aggregators such as Sentry (https://getsentry.com) and monitoring solutions such as Ganglia (http://ganglia.sourceforge.net).

What would be nice to have are the equivalent of I/O monitoring tools such as Darshan (http://www.mcs.anl.gov/research/projects/darshan/) and distributed profilers such as Allinea ...

Get Distributed Computing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.