Summary

The most important thing to keep in mind when working with large datasets and matplotlib is to use data wisely and take advantage of NumPy and tools such as PyTables. When moving to distributed data, a large burden with regard to infrastructure is taken on compared to working with data on a single machine. As datasets approach terabytes and petabytes, the greatest work involved really has less to do with plotting and visualization and has more to do with deciding what to visualize and how to actually get there. An increasingly common aspect of big data is real-time analysis, where matplotlib might be used to generate hundreds or thousands of plots of a fairly small set of data points. Not all problems in big data visualization are about ...

Get Mastering matplotlib now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.