2.6. Language Technology: Your Tax Dollars at Work

The U.S. government has been busy spending your money on technologies to do this kind of content extraction and analysis for years. When the research first started, the language researchers were most interested in was Russian. Harvard's Tony Oettinger, who led the research, tells of inputting the English sentence "The spirit is willing but the flesh is weak" into an English-to-Russian translation program. The computer's Russian translation was then fed back into a Russian-to-English translation program. The resulting retranslation was "The vodka is ready but the meat is rotten." Language technology has improved markedly, but still has a long way to go.

The Defense Advanced Research Projects Agency (DARPA) paid for the development of the Internet, which was called the ARPANET in the early days. Many additional millions have gone into programs with names like "Machine Understanding and Classification" and "Text Retrieval and Categorization." A great deal of this research has now found its way into tools and products for mining the deep Web. Advanced proprietary trading firms have added these techniques to the sophisticated quantitative analytics in their technological arsenals.

Figure 2.15. Cumulative Abnormal Return (CAR) Over 2002. Board member concentration deciles and return relative to S&P 1500 index, 2002. I bet even Heidi looks less than perfect if you look close enough. Source: David Leinweber and Jacob Sisk, unpublished ...

Get Nerds on Wall Street: Math, Machines, and Wired Markets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.