41

image Generating Candidate Keywords

This is useful for locating misspelled words in documents or creating indexes and keywords for adding to a metadata block associated with a document in a media asset manager.

Given an arbitrary block of text in a file, this UNIX script will transform the text file into a sorted list of unique words. A list of stop-words is maintained in a separate file that is applied to the output list to remove unwanted keys.

If the stop list contains words like “the,” “and,” “of,” etc., then the result is a candidate list of keywords for indexing.

If the stop list contains all the words from a previously spell-checked document, ...

Get Developing Quality Metadata now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.