Chapter 4. Describing Structured Data

Before continuing on a journey through information management, it is important to have a common understanding of how information and data can be described in a structured way. Even unstructured data, such as documents, contains or relates to some form of structured data, such as the fields in a database.

Apart from truly random data sets, which have some limited value, every data set or document has relationships. For example, these relationships could exist between database fields, through a structure within a document, or as assumed associations between Web pages through keywords.

NETWORKS AND GRAPHS

There is, however, a very useful mathematical tool called graph theory that can be applied to gain a much deeper understanding of data. Graph theory describes networks of nodes. Network theory is formally called graph theory in mathematics; so for the balance of this chapter, consider the words network and graph to be synonymous.

Each node in the graph is called a vertex. The connections between vertices are called edges. We aren't talking about any other form of physical networking. Rather, we are discussing network theory in the abstract and applying theory from our mathematical colleagues to the newer science of information management, and specifically data modeling. The rules of data modeling are often lost in the detail of the individual business problem, so it is very useful to have some tools to help abstract the problem. Hence, each vertex ...

Get Information-Driven Business: How to Manage Data and Information for Maximum Advantage now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.