The core of our real-time news analysis engine relies on a scoring method that assesses the relative volume/significance of news from a specific category of news. For instance, we wish to identify periods when the volume of news about foreign exchange markets is abnormally high, or when there is a flurry of macroeconomic news announcements.
For a given topic, say foreign exchange news, the scoring procedure has the following parameters:
The keywords list and the last l minutes of news are used to create a raw score, and this score is normalized/calibrated using statistics about the news over the last L days (as described below).
The score at a given point in time, t, is assigned as follows: Let (w1,…, wk) be the vector of keyword frequencies in the time interval [t – l, t) (i.e., wi is the number of times word/phrase Wi has appeared in the last l minutes). The raw score at time t is then defined to be:
In this form, the raw score will tend to be high when news volume is high, and so we calibrate/normalize the score using the calibration rolling window: We maintain a record of the scores that have been assigned ...