O'Reilly logo

Beautiful Data by Toby Segaran, Jeff Hammerbacher

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Impediments to Connecting Data

Hopefully you're starting to be convinced that there are huge advantages to being able to easily integrate data from many different sources. But there are a few different reasons people aren't doing it already….

The Representation Problem

Perhaps the most basic problem with attempting to connect data sets is the fact that most data is stored in very inflexible structures. First of all, a surprising amount of important data in science and business is also kept in Excel spreadsheets, which are stored locally on people's computers, inaccessible to others and also not designed for integration anyway.

Even in companies where databases are made accessible, data is classically stored in relational databases, most of which have predefined schemas to fit the data that was initially believed to be important. Figure 20-2 shows a simple example of a relational schema for restaurant data. This is excellent for large, predictable data sets because relational databases have excellent performance when well configured, but presents problems when the application requires new kinds of data, new fields, or new relationships to be added frequently.

A relational schema for restaurant data.

Figure 20-2. A relational schema for restaurant data.

I've seen people solve this problem in a number of ways, but two really stand out, mostly because they're opposite ends of a spectrum. The traditional approach is to continually ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required