Having laid out our requirements, we can combine them into a single data structure that contains the things we need. Here's how the processor data structure looks:
type processor struct { tfidf *tfidf.TFIDF corpus *corpus.Corpus locations map[string]int t transform.Transformer locCount int }
For now, ignore the locations field. We shall look into how metadata might be useful in clustering.
To create a new processor, the following function is defined:
func newProcessor() *processor { c, err := corpus.Construct(corpus.WithWords([]string{mention, hashtag, retweet, url})) dieIfErr(err) return &processor{ tfidf: tfidf.New(), corpus: c, locations: make(map[string]int), } }
Here, we see some interesting decisions. The corpus is ...