8.5. STORING DATA

Databases are used to store collected data for later use, and they come in several varieties. The first type of database is known as the flat file. Flat files are two-dimensional databases, much like an ordinary spreadsheet. Flat file databases are loved for their leanness, because there is very little baggage or overhead to slow them down. It is a simple file structure that can be searched very easily, usually in a sequential manner (i.e., from the first row of data onward to the last). However, you can easily imagine that searching for a data point near the bottom row of a very large flat file with millions of rows may take rather a long time. To help with this problem, many quants use indexed flat files, which add an extra step but which can make searching large files easier. The index gives the computer a sort of "cheat sheet," providing an algorithm to search large sets of data more intelligently than a sequential search.

A second important type of data storage is a relational database. Relational databases allow for more complex relationships among the data set. For example, imagine that we want to keep track of stocks not just on their own but also as part of industry groups, as part of sectors, as part of broader indices for the countries of their domicile, and as part of the universe of stocks overall. This is a fairly routine thing to want to do. With flat files, we would have to construct these groups each as separate tables. This is fine if nothing ...

Get Inside the Black Box: The Simple Truth About Quantitative Trading now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.