There's more...

This is by far the more experimental recipe of this book, but for very demanding datasets, Parquet is probably one of the more efficient formats available and has native support in Spark. In the medium to long-term, we will probably see developments in this space. You should expect other ways of converting data into the Parquet format to appear. If you decide to use Parquet, be sure to check all the different indexing strategies that the format supports.

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.