THE UNSTRUCTURED DATA ANALYTICS INDUSTRY

Industry consolidation of unstructured data analytics providers is ongoing, as key technologies and even more precious patents and skilled engineers are acquired by companies with an interest in unstructured data technologies. In the search engine arena, numerous well-known vendors have been assimilated—AltaVista, Autonomy, Endeca, and FAST. Record-linking technology companies (technology to deduplicate different forms of people and place names, such as IBM, Intl. Business Machine) have also been absorbed by larger vendors. Taxonomy and information retrieval companies (for example, InXight, Teragram) have similarly been snapped up. Statistical software vendors with text analytics solutions have also been acquired.

IBM’s Watson debuted as a fine example of the promise of question-answering technology. Attivio, a premier independent vendor of hybrid structured and unstructured data management, offers a solution that seamlessly integrates traditional relational database technology with the capabilities of a full-text search engine.

At the consumer level, it is possible with a small and talented staff to conduct hybrid structured and unstructured analytics, using a blend of open source and commercial software. Open source solutions include R, a statistics package, and Apache Lucene/Solr, a full-text search engine. However, the choice of implementation technology and the degree of sophistication are driven by budget, staffing, and opportunity. ...

Get Win with Advanced Business Analytics: Creating Business Value from Your Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.