O'Reilly logo

Data Architecture: A Primer for the Data Scientist by Dan Linstedt, W.H. Inmon

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

1.3

The “Great Divide”

Abstract

Corporate data consists of structured data and unstructured data. Unstructured data consists of repetitive and nonrepetitive data. The separation between repetitive data and nonrepetitive data can be called the: great divide”. Repetitive Big Data is centric to Hadoop, where most of the activities include data management functions for very large amounts of data. Nonrepetitive data is data that is organized around textual disambiguation, including such functions as sub doc processing, inline contextualization, taxonomical resolution, acronym resolution, standardization, stop word processing, homographic resolution, proximity resolution, and other functions.

Keywords

corporate data
Hadoop
Big Data
textual disambiguation ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required