Glossary

agglomerative

Agglomerative clustering is a type of hierarchical clustering that produces clusters starting with single instances that are iteratively aggregated by similarity until all belong to a single group.

application programming interface (API)

An application programming interface formally defines how software components communicate. A data API might provide users with a systematic way to read or fetch information from the internet. The Scikit-Learn API exposes generalized access to machine learning algorithms implemented via class inheritance.

bag-of-words (BOW)/continuous bag-of-words (CBOW)

Bag-of-words is a method of encoding text, such that every document from the corpus is transformed into a vector whose length is equal to the vocabulary of the corpus. The primary insight of a bag-of-words representation is that meaning and similarity are encoded in vocabulary.

baleen

Baleen is an open source automated ingestion service for blogs to construct a corpus for natural language processing research.

betweenness centrality

Given a node N in a graph G, the betweenness centrality indicates how connected G is as a result of N. Betweenness centrality is computed as the ratio of the shortest paths in G that include N to the total number of shortest paths in G.

bias

Bias is one of two sources of error in supervised learning problems, computed as the difference between an estimator’s predicted value and the true value. High bias indicates that the estimator’s ...

Get Applied Text Analysis with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.