Typical text found in corporations include call center conversations, corporate contracts, email, insurance claims, sales pitches, court orders, jokes, tweets, invitations and so forth. There is no limit as to what kind of text and how much text can be stored in a data lake. However, in order for text to be used analytically it must be transformed. As long as text is in its original form, only the most superficial analysis can be done against the text. In order for text to be subjected to useful analytical processing, unstructured text must pass through a process known as textual disambiguation.
- Chapter 3 Inside the Data Lake
- from Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump
- Publisher: Technics Publications
- Released: April 2016
Share this highlighthttp://www.safaribooksonline.com/a/data-lake-architecture/7933714/