Unstructured Data

Video Description

One way to divide the world of data is in terms of structured and unstructured data. Unstructured data is data that is unintelligible in terms of structure to the computer. Text is a common form of unstructured data. Text makes up over 80% of the data in the corporation. We cover the process of analyzing text including movement from unstructured data to textual disambiguation to analytical databases. We explore the different types of text including repetitive verses non-repetitive text. Learn how to analyze email, and appreciate the implications of unstructured data in the data warehouse. Find out how to handle optical character recognition (OCR) and special characters during textual ETL. We explain textual Extract, Transform, and Load (ETL) and also cover formal verses informal organization systems, and the advantages of moving textual data into a structured environment. Unstructured data exists in many forms including email, medical records such as x-rays, old legacy data, hard copy, video, and movies. Learn the opportunities and challenges of storage and analysis for each of these types. We use Venn diagrams to explain how unstructured data differs from structured data. We cover the rewards of placing text into a database as well as best practices in analytical text processing. We conclude by covering available unstructured utilities.