Foreword

Undoubtedly, we've all seen some infographic showcasing how the amount of data in our digitally connected world, if represented as books, would circle the globe a number of times. We've all heard the mind-boggling amount of data created every minute of every day. Each minute generates ∼456,000 tweets, ∼46,000 Uber trips, ∼4,150,000 YouTube videos watched, and more. How did we get here? It's simple: in 1964, 1TB of memory would have cost about $3.5 billion. Today? $27…and it fits into your shirt pocket. (Keep in mind all of these numbers were out of date the minute they were stated.)

Now consider the impact of technologies such as blockchain and the Internet of Things (IoT) on the velocity of data that's currently being collected. Data collection rates are set to go into turbo mode (if they aren't already there). If I was to grade the world on data collection, I'd give it an A+. 24/7, the world stores more and more data. Nice job, world!

What about grading the world on how much of that data it understands? C− (at best). Why? Most of the world's data is unstructured—in other words, it doesn't fit nicely into the rows and columns upon which most analytics is performed. Add to this the fact that most of the world's data can't even be “Googled” (which means your company is stuck with it)…and opportunity has been missed.

The growing importance of unstructured data is evidenced by the recognition of Big Data industry leaders and academics that unstructured data has become ...

Get Unstructured Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.