Text data, such as tweets, comes with little structure compared to spreadsheets and other typical types of data. One very useful way to impose some structure on text data is to turn it into a document-term matrix. This is a matrix where each row represents a document and each term is represented as a column. Each element in the matrix represents the number of times a particular term (column) appears in a particular document (row). Put differently, the i, jth element counts the number of times the term j appears in the document i. Document-term matrices get their length from the number of input documents and their width from the number of unique words used in the collection of documents, which is often called a
**corpus**. Throughout ...

Start Free Trial

No credit card required