Chapter 7

Newton

Building an Authority-Driven Company Tagging and Resolution System

M. Thomas*; H. Bretz*; T. Vacek*; B. Hachey; S. Singh*; F. Schilder*    * Thomson Reuters, NYC, NY, USA University of Sydney, Sydney, Australia

Abstract

We describe an entity detection and resolution system called Newton that is being used to identify company names in Reuters news articles and ground the mention text to a company authority database. The system is required to be fast and precise on arbitrary web news sources. We introduce an infrastructure for authority-driven lookup-tagging followed by joint mention and disambiguation classification using a support vector machine. Performance on a corpus of 70k automatically annotated documents from the ...

Get Working with Text now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.