Chapter 25. Expanding Your Data Warehouse with Unstructured Data

In This Chapter

  • Recognizing the limits of today's data warehousing

  • Getting data through multimedia

  • Looking at business intelligence and unstructured data

  • Going from unstructured information to structured data

Today's data landscape now encompasses a dizzying array of new information channels, new sources of data, and new analysis and reporting imperatives. According to analyst groups, nearly 80 to 85 percent of today's data is unstructured, and new information channels such as Web, e-mail, voice over IP, instant messaging (IM), text messaging, and podcasts are rapidly creating huge stores of nontraditional data. Data from any of these sources will be requested from your users to be integrated into your data warehouse.

Traditional Data Warehousing Means Analyzing Traditional Data Types

Unless you've used an extraordinary, state-of-the-art data warehouse, your business intelligence functionality has probably been limited to these types of data:

  • Numbers: Numeric data in the technical form of integers and decimal numbers

  • Text: Character data, typically fixed-length alphanumeric information that's rarely more than about 255 characters per occurrence, although (very rarely) it might go up to 4,000 characters

  • Dates and times: Either actual dates and times or, more likely, ranges of dates (such as a month and year for which product sales are grouped and stored)

That's about it.

To be fair, data warehousing in its original incarnation, ...

Get Data Warehousing For Dummies®, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.