Build the Term Network

The next CDA step requires that you build a network of terms: a graph where nodes represent terms, and (weighted) edges represent their similarities.

You could have included all 12,437 discovered terms in the graph, but some of them are mentioned only once or twice (which is expected, given Zipf’s law[45]). Rather than wondering why the less frequently used words are in fact less frequently used, remove all rows with fewer than ten occurrences, but provide an option of changing the cut-off value MIN_SUPPORT in the future. At this point, you might wish that Python had first-class constants, but nonetheless spell MIN_SUPPORT in all capital letters. DataFrame limited is a truncated version of domain: it has only 319 rows. ...

Get Complex Network Analysis in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.