6.8. Term Lookup

The Term Lookup transform uses the same algorithms and statistical models as the Term Extraction transform to break up an incoming stream into noun or noun phrase tokens, but it is designed to compare those tokens to a stored word list and output a matching list of terms and phrases with simple frequency counts. Now a strategy for working with both term-based transforms should become clear. Periodically use the Term Extraction transform to mine the text data and to generate lists of statistical phrases. Store these phrases in a word list, along with phrases that you think the term extraction process should identify. Remove the phrases that you don't want identified. Use the Term Lookup Transform to reprocess the text input to generate your final statistics. This way, you are generating statistics on known phrases of importance.

You can use results from the Term Extraction example by removing the word "model" from the [TermExclusions] table for future Term Extractions. You would then want to review all of the terms stored in the [TermResults] table, sort them out, remove the duplicates, and add back terms that make sense to your subject matter experts reading the text. Since you want to generate some statistics about which model numbers are generating customer service calls, but you don't want to restrict your extractions to only the occurrences of the model number in conjunction with the word "model," remove phrases combining the word "model" and the model number. ...

Get Professional SQL Server™ 2005 Integration Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.