One thing you may note is that the preprocessing of tweets is very minimal, and some of the rules are odd. For example, all hashtags are treated as one, as are all links and mentions. When this project started, it seemed like a good reason. There is no other justification than it seemed like a good reason; one always needs a springboard from which to jump off in any project. A flimsy excuse at that point is as good as any other. Nonetheless, I have tweaked my preprocessing steps. These are the functions that I finally settled on. Do observe the difference between this and the original, listed in previous sections:
var nl = regexp.MustCompile("\n+") var ht = regexp.MustCompile("&.+?;")func (p *processor) ...