In this chapter, we discussed how text mining is different than traditional attribute-based learning, requiring a lot of pre-processing steps to transform written natural language into feature vectors. Further, we discussed how to leverage Mallet, a Java-based library for NLP by applying it to two real-life problems. First, we modeled topics in a news corpus using the LDA model to build a model that is able to assign a topic to new document. We also discussed how to build a Naive Bayesian spam-filtering classifier using the BoW representation.
This chapter concludes the technical demonstrations of how to apply various libraries to solve machine-learning tasks. As we weren't able to cover more interesting applications and give further ...