Index
A note on the digital index
A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.
Symbols
- $ (MongoDB operator), Querying by Date/Time Range
- $** (MongoDB operator), Searching Emails by Keywords
- 68-95-99.7 rule, Contingency tables and scoring functions
A
- access token (OAuth)
- access token secret (OAuth), Creating a Twitter API Connection, Discussion–Discussion, OAuth 1.0A
- activities (Google+), Exploring the Google+ API, Making Google+ API Requests–Making Google+ API Requests
- agglomeration clustering technique, Hierarchical clustering
- aggregation framework (MongoDB), Writing Advanced Queries–Writing Advanced Queries, Discussion
- analyzing GitHub API
- about, Analyzing GitHub Interest Graphs
- extending interest graphs, Extending the Interest Graph with “Follows” Edges for Users–Computational Considerations
- graph centrality measures, Computing Graph Centrality Measures–Computing Graph Centrality Measures
- nodes as query pivots, Using Nodes as Pivots for More Efficient Queries–Using Nodes as Pivots for More Efficient Queries
- seeding interest graphs, Seeding an Interest Graph–Seeding an Interest Graph
- visualizing interest graphs, Visualizing Interest Graphs–Closing Remarks
- analyzing Google+ data
- bigrams in human language, Analyzing Bigrams in Human Language–Contingency tables and scoring functions
- TF-IDF, A Whiz-Bang Introduction to TF-IDF–TF-IDF
- analyzing LinkedIn data
- clustering data, Crash Course on Clustering Data–Clustering Enhances User Experiences, Clustering Algorithms–Visualizing geographic clusters with Google Earth
- measuring similarity, Crash Course on Clustering Data, Measuring Similarity–Measuring Similarity
- normalizing data, Normalizing Data to Enable Analysis–Measuring Similarity
- analyzing mailboxes
- analyzing Enron corpus, Analyzing the Enron Corpus–Searching Emails by Keywords
- analyzing mail data, Analyzing Your Own Mail Data–Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
- analyzing sender/recipient patterns, Analyzing Patterns in Sender/Recipient Communications–Analyzing Patterns in Sender/Recipient Communications
- analyzing Social Graph connections
- about, Analyzing Social Graph Connections–Analyzing Social Graph Connections
- analyzing Facebook pages, Analyzing Facebook Pages–Analyzing Coke vs Pepsi Facebook pages
- analyzing likes, Analyzing things your friends “like”–Analyzing things your friends “like”
- analyzing mutual friendships, Analyzing mutual friendships with directed graphs–Closing Remarks
- examining friendships, Examining Friendships–Closing Remarks
- analyzing Twitter platform objects
- about, Analyzing the 140 Characters–Analyzing the 140 Characters
- analyzing favorite tweets, Problem
- extracting tweet entities, Extracting Tweet Entities, Problem, Problem, Problem
- frequency analysis, Analyzing Tweets and Tweet Entities with Frequency Analysis–Analyzing Tweets and Tweet Entities with Frequency Analysis, Visualizing Frequency Data with Histograms–Closing Remarks, Problem
- lexical diversity of tweets, Computing the Lexical Diversity of Tweets–Computing the Lexical Diversity of Tweets, Discussion
- patterns in retweets, Examining Patterns in Retweets–Examining Patterns in Retweets, Problem–Problem
- analyzing web pages
- by scraping, parsing, and crawling, Scraping, Parsing, and Crawling the Web–Breadth-First Search in Web Crawling
- entity-centric, Entity-Centric Analysis: A Paradigm Shift–Gisting Human Language Data
- quality of analytics, Quality of Analytics for Processing Human Language Data–Quality of Analytics for Processing Human Language Data
- semantic understanding of data, Discovering Semantics by Decoding Syntax–Analysis of Luhn’s summarization algorithm
- API key (OAuth), Making LinkedIn API Requests, Making Google+ API Requests
- API requests
- Facebook, Exploring Facebook’s Social Graph API–Understanding the Open Graph Protocol
- GitHub, Exploring GitHub’s API–Making GitHub API Requests
- Google+, Exploring the Google+ API–Making Google+ API Requests
- LinkedIn, Exploring the LinkedIn API–Downloading LinkedIn Connections as a CSV File
- Twitter, Creating a Twitter API Connection–Creating a Twitter API Connection
- approximate matching (see clustering LinkedIn data)
- arbitrary arguments, Searching for Tweets
- *args (Python), Searching for Tweets
- Aristotle, Inferencing About an Open World
- Atom feed, Scraping, Parsing, and Crawling the Web
- authorizing applications
- accessing Gmail, Accessing Your Gmail with OAuth–Accessing Your Gmail with OAuth
- Facebook, Understanding the Social Graph API
- GitHub API, Making GitHub API Requests–Making GitHub API Requests
- Google+ API, Making Google+ API Requests–Making Google+ API Requests
- LinkedIn API, Making LinkedIn API Requests–Making LinkedIn API Requests
- Twitter and, Creating a Twitter API Connection–Creating a Twitter API Connection, Problem–Discussion
- avatars, Making Google+ API Requests
B
- B-trees, Searching Emails by Keywords
- bag of words model, Discovering Semantics by Decoding Syntax
- Bayesian classifier, Recommended Exercises
- BeautifulSoup Python package, Making Google+ API Requests, Scraping, Parsing, and Crawling the Web
- betweenness graph metric, Computing Graph Centrality Measures
- big data
- about, Breadth-First Search in Web Crawling
- big graph databases, Modeling Data with Property Graphs
- map-reduce and, Programmatically Accessing MongoDB with Python
- Big O notation, Crash Course on Clustering Data, Searching Emails by Keywords
- BigramAssociationMeasures Python class, Measuring Similarity
- BigramCollocationFinder function, Contingency tables and scoring functions
- bigrams, Measuring Similarity, Analyzing Bigrams in Human Language–Contingency tables and scoring functions
- Bing geocoding service, Visualizing locations with cartograms
- binomial distribution, Contingency tables and scoring functions
- bipartite analysis, Using Nodes as Pivots for More Efficient Queries
- boilerplate detection, Scraping, Parsing, and Crawling the Web–Scraping, Parsing, and Crawling the Web
- bookmarking projects, Exploring GitHub’s API
- bot policy, Geocoordinates: A Common Thread for Just About Anything
- bounded breadth-first searches, Breadth-First Search in Web Crawling
- breadth-first searches, Breadth-First Search in Web Crawling–Breadth-First Search in Web Crawling
- Brown Corpus, Introducing the Natural Language Toolkit
C
- Cantor, George, Exploring Trending Topics
- cartograms, Visualizing locations with cartograms–Measuring Similarity
- central limit theorem, Contingency tables and scoring functions
- centrality measures
- application of, Application of centrality measures–Application of centrality measures
- betweenness, Computing Graph Centrality Measures
- closeness, Computing Graph Centrality Measures
- computing for graphs, Computing Graph Centrality Measures–Computing Graph Centrality Measures
- degree, Computing Graph Centrality Measures
- online resources, Online Resources
- centroid (clusters), k-means clustering
- chi-square test, Contingency tables and scoring functions
- chunking (NLP), Natural Language Processing Illustrated Step-by-Step
- circles (Google+), Exploring the Google+ API
- cleanHTML function, Making Google+ API Requests
- clique detection
- Facebook, Analyzing mutual friendships with directed graphs–Closing Remarks
- NetworkX Python package, Using Nodes as Pivots for More Efficient Queries
- closeness graph metric, Computing Graph Centrality Measures
- cluster Python package, Hierarchical clustering, Visualizing geographic clusters with Google Earth
- clustering LinkedIn data
- about, Crash Course on Clustering Data–Clustering Enhances User Experiences
- clustering algorithms, Clustering Algorithms–Visualizing geographic clusters with Google Earth
- dimensionality reduction and, Crash Course on Clustering Data
- greedy clustering, Greedy clustering–Runtime analysis
- hierarchical clustering, Hierarchical clustering–k-means clustering
- k-means clustering, k-means clustering–k-means clustering
- measuring similarity, Crash Course on Clustering Data, Measuring Similarity–Measuring Similarity
- normalizing data to enable analysis, Normalizing Data to Enable Analysis
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- visualizing with Google Earth, Visualizing geographic clusters with Google Earth–Visualizing geographic clusters with Google Earth
- clustering posts with cosine similarity, Clustering posts with cosine similarity–Clustering posts with cosine similarity
- collections Python module
- collective intelligence, Why Is Twitter All the Rage?
- collocations
- computing, Analyzing Bigrams in Human Language–Analyzing Bigrams in Human Language
- n-gram similarity, Measuring Similarity, Analyzing Bigrams in Human Language
- comments (Google+), Exploring the Google+ API, Making Google+ API Requests
- Common Crawl Corpus, Scraping, Parsing, and Crawling the Web, Microformats: Easy-to-Implement Metadata
- company names (LinkedIn data), Normalizing and counting companies–Normalizing and counting companies
- confidence intervals, Quality of Analytics for Processing Human Language Data
- Connections API (LinkedIn), Making LinkedIn API Requests
- consumer key (OAuth), Creating a Twitter API Connection, Discussion–Discussion, OAuth 1.0A
- consumer secret (OAuth), Creating a Twitter API Connection, Discussion–Discussion, OAuth 1.0A
- content field (Google+), Making Google+ API Requests
- context, human language data and, Reflections on Analyzing Human Language Data
- contingency tables, Analyzing Bigrams in Human Language–Contingency tables and scoring functions
- converting
- mail corpus to Unix mailbox, Converting a Mail Corpus to a Unix Mailbox–Converting a Mail Corpus to a Unix Mailbox
- mailboxes to JSON, Converting Unix Mailboxes to JSON–Converting Unix Mailboxes to JSON
- cosine similarity
- about, Finding Similar Documents–The theory behind vector space models and cosine similarity
- clustering posts with, Clustering posts with cosine similarity–Clustering posts with cosine similarity
- visualizing with matrix diagram, Visualizing document similarity with a matrix diagram
- CouchDB, Programmatically Accessing MongoDB with Python
- Counter class
- Facebook and, Analyzing Coke vs Pepsi Facebook pages, Analyzing things your friends “like”
- GitHub and, Making GitHub API Requests
- LinkedIn and, Measuring Similarity
- Twitter and, Analyzing Tweets and Tweet Entities with Frequency Analysis, Discussion
- CSS query selectors, Retrieving recipe reviews
- CSV file format, Downloading LinkedIn Connections as a CSV File
- csv Python module, Downloading LinkedIn Connections as a CSV File, Discussion
- cursors (Twitter API), Discussion
- CVS version control system, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
D
- D3.js toolkit, Visualizing directed graphs of mutual friendships, Visualizing locations with cartograms, Visualizing document similarity with a matrix diagram, Visualizing Interest Graphs
- Data Science Toolkit, Recommended Exercises
- DataSift platform, Discussion
- date/time range, query by, Querying by Date/Time Range–Querying by Date/Time Range
- datetime function, Querying by Date/Time Range
- dateutil Python package, Converting a Mail Corpus to a Unix Mailbox
- DBPedia initiative, Recommended Exercises
- deduplication (see clustering LinkedIn data)
- degree graph metric, Computing Graph Centrality Measures
- degree of nodes in graphs, Modeling Data with Property Graphs
- dendograms, Hierarchical clustering–k-means clustering
- density of graphs, Modeling Data with Property Graphs
- depth-first searches, Breadth-First Search in Web Crawling
- dereferencing, Normalizing and counting companies
- Dice’s coefficient, Contingency tables and scoring functions
- digraphs (directed graphs), Analyzing mutual friendships with directed graphs–Closing Remarks, Modeling Data with Property Graphs–Modeling Data with Property Graphs
- dimensionality reduction, Crash Course on Clustering Data
- dir Python function, Making GitHub API Requests
- directed graphs (digraphs), Analyzing mutual friendships with directed graphs–Closing Remarks, Modeling Data with Property Graphs–Modeling Data with Property Graphs
- distributed version control systems, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- document summarization, Document Summarization–Analysis of Luhn’s summarization algorithm
- document-oriented databases (see MongoDB)
- dollar sign ($-MongoDB operator), Querying by Date/Time Range
- Dorling Cartogram, Visualizing locations with cartograms–Measuring Similarity
- double list comprehension, Extracting Tweet Entities
- dynamic programming, Hierarchical clustering
E
- edit distance, Measuring Similarity
- ego (social networks), Understanding the Social Graph API, Analyzing things your friends “like”–Analyzing things your friends “like”, Seeding an Interest Graph
- ego graphs, Understanding the Social Graph API, Seeding an Interest Graph–Seeding an Interest Graph
- email Python package, A Primer on Unix Mailboxes, Converting a Mail Corpus to a Unix Mailbox
- end-of-sentence (EOS) detection, Discovering Semantics by Decoding Syntax, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data–Sentence Detection in Human Language Data
- Enron corpus
- about, Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More, Analyzing the Enron Corpus
- advanced queries, Writing Advanced Queries–Writing Advanced Queries
- analyzing sender/recipient patterns, Analyzing Patterns in Sender/Recipient Communications–Analyzing Patterns in Sender/Recipient Communications
- getting Enron data, Getting the Enron Data–Getting the Enron Data
- online resources, Online Resources
- query by date/time range, Querying by Date/Time Range–Querying by Date/Time Range
- entities
- interactions between, Gisting Human Language Data–Gisting Human Language Data
- property graphs representing, Modeling Data with Property Graphs–Modeling Data with Property Graphs
- entities field (tweets), Analyzing the 140 Characters, Solution
- entity extraction, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- entity resolution (entity disambiguation), Analyzing this book’s Facebook page
- entity-centric analysis, Entity-Centric Analysis: A Paradigm Shift–Gisting Human Language Data
- envoy Python package, Importing a JSONified Mail Corpus into MongoDB
- EOS (end-of-sentence) detection, Discovering Semantics by Decoding Syntax, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data–Sentence Detection in Human Language Data
- extracting tweet entities, Extracting Tweet Entities, Problem, Problem, Problem
- extraction (NLP), Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
F
- F1 score, Quality of Analytics for Processing Human Language Data
- Facebook, Exploring Facebook’s Social Graph API
- (see also Social Graph API)
- about, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More–Exploring Facebook’s Social Graph API
- analyzing connections, Analyzing Social Graph Connections–Closing Remarks
- interest graphs and, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Seeding an Interest Graph
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- Facebook accounts, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Exploring Facebook’s Social Graph API
- Facebook pages, analyzing, Analyzing Facebook Pages–Analyzing Coke vs Pepsi Facebook pages
- Facebook Platform Policies document, Exploring Facebook’s Social Graph API
- facebook Python package, Understanding the Social Graph API, Analyzing things your friends “like”
- Facebook Query Language (FQL), Exploring Facebook’s Social Graph API, Understanding the Social Graph API
- false negatives, Quality of Analytics for Processing Human Language Data
- false positives, Quality of Analytics for Processing Human Language Data
- favorite_count field (tweets), Analyzing the 140 Characters, Discussion
- feedparser Python package, Scraping, Parsing, and Crawling the Web, Sentence Detection in Human Language Data
- field expansion feature (Social Graph API), Understanding the Social Graph API
- fields
- Facebook Social Graph API, Understanding the Social Graph API
- Google+ API, Making Google+ API Requests
- LinkedIn API, Making LinkedIn API Requests
- MongoDB, Searching Emails by Keywords
- Twitter API, Analyzing the 140 Characters–Analyzing the 140 Characters, Discussion
- find function (Python), Analyzing Bigrams in Human Language, Programmatically Accessing MongoDB with Python, Analyzing Patterns in Sender/Recipient Communications
- Firefox Operator add-on, Geocoordinates: A Common Thread for Just About Anything
- folksonomies, Why Is Twitter All the Rage?
- following model
- GitHub, Extending the Interest Graph with “Follows” Edges for Users–Computational Considerations
- interest graphs and, Seeding an Interest Graph
- Twitter, Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More, Why Is Twitter All the Rage?, Fundamental Twitter Terminology, Exploring Facebook’s Social Graph API, Problem–Discussion, Problem
- forked projects, Exploring GitHub’s API
- forward chaining, Inferencing About an Open World
- FQL (Facebook Query Language), Exploring Facebook’s Social Graph API, Understanding the Social Graph API
- frequency analysis
- document summarization, Document Summarization–Analysis of Luhn’s summarization algorithm
- Facebook data, Analyzing Facebook Pages–Closing Remarks
- LinkedIn data, Normalizing and counting companies–Normalizing and counting locations
- TF-IDF, A Whiz-Bang Introduction to TF-IDF–TF-IDF
- Twitter data, Analyzing Tweets and Tweet Entities with Frequency Analysis–Analyzing Tweets and Tweet Entities with Frequency Analysis, Visualizing Frequency Data with Histograms–Closing Remarks, Problem
- Zipf’s law, Introducing the Natural Language Toolkit–Introducing the Natural Language Toolkit
- friendship graphs, Problem
- friendship model
- Friendster social network, Recommended Exercises
- functools.partial function, Solution, Discussion
- FuXi reasoning system, Inferencing About an Open World
- fuzzy matching (see clustering LinkedIn data)
G
- geo microformat, Microformats: Easy-to-Implement Metadata, Geocoordinates: A Common Thread for Just About Anything–Geocoordinates: A Common Thread for Just About Anything
- geocoding service (Bing), Visualizing locations with cartograms
- geocoordinates, Microformats: Easy-to-Implement Metadata, Geocoordinates: A Common Thread for Just About Anything–Geocoordinates: A Common Thread for Just About Anything
- GeoJSON, Recommended Exercises
- geopy Python package, Normalizing and counting locations
- Gephi open source project, Visualizing Interest Graphs
- GET search/tweets resource, Searching for Tweets–Searching for Tweets
- GET statuses/retweets resource, Examining Patterns in Retweets
- GET trends/place resource, Exploring Trending Topics
- Git version control system, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- GitHub
- about, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- following model, Extending the Interest Graph with “Follows” Edges for Users–Computational Considerations
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- social coding, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- GitHub API
- about, Exploring GitHub’s API
- analyzing interest graphs, Analyzing GitHub Interest Graphs–Closing Remarks
- creating connections, Creating a GitHub API Connection–Creating a GitHub API Connection
- making requests, Making GitHub API Requests–Making GitHub API Requests
- modeling data with property graphs, Modeling Data with Property Graphs–Modeling Data with Property Graphs
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- terminology, Exploring GitHub’s API
- gitscm.com, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- Gmail
- GNU Prolog, Open-world versus closed-world assumptions
- Google API Console, Making Google+ API Requests
- Google Earth, Visualizing geographic clusters with Google Earth–Visualizing geographic clusters with Google Earth, Geocoordinates: A Common Thread for Just About Anything
- Google Knowledge Graph, Discovering Semantics by Decoding Syntax
- Google Maps, Visualizing geographic clusters with Google Earth, Geocoordinates: A Common Thread for Just About Anything
- Google Structured Data Testing Tool, Accessing LinkedIn’s 200 Million Online Résumés–From Semantic Markup to Semantic Web: A Brief Interlude
- Google+ accounts, Exploring the Google+ API
- Google+ API
- about, Exploring the Google+ API–Exploring the Google+ API
- making requests, Making Google+ API Requests–Making Google+ API Requests
- online resources, Online Resources
- querying human data language, Querying Human Language Data with TF-IDF–Reflections on Analyzing Human Language Data
- recommended exercises, Recommended Exercises
- terminology, Exploring the Google+ API
- TF-IDF and, A Whiz-Bang Introduction to TF-IDF–TF-IDF
- google-api-python-client package, Making Google+ API Requests
- Graph API (Facebook) (see Social Graph API (Facebook))
- Graph API Explorer app, Exploring Facebook’s Social Graph API, Understanding the Social Graph API–Understanding the Social Graph API
- Graph Search project (Facebook), Understanding the Open Graph Protocol
- Graph Your Inbox Chrome extension, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension–Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
- GraphAPI class (facebook Python package)
- get_connections() method, Analyzing Social Graph Connections
- get_object() method, Analyzing Social Graph Connections, Analyzing this book’s Facebook page, Analyzing things your friends “like”
- get_objects() method, Analyzing Social Graph Connections
- request() method, Analyzing Social Graph Connections
- Graphviz, Visualizing Interest Graphs
- greedy clustering, Greedy clustering–Runtime analysis
H
- hangouts (Google+), Exploring the Google+ API
- hashtags (tweets)
- about, Fundamental Twitter Terminology, Searching for Tweets
- extracting, Extracting Tweet Entities
- frequency data in histograms, Visualizing Frequency Data with Histograms–Visualizing Frequency Data with Histograms
- lexical diversity of, Computing the Lexical Diversity of Tweets
- hCalendar microformat, Microformats: Easy-to-Implement Metadata, Accessing LinkedIn’s 200 Million Online Résumés
- hCard microformat, Microformats: Easy-to-Implement Metadata, Accessing LinkedIn’s 200 Million Online Résumés
- help Python function, Creating a Twitter API Connection, Making Google+ API Requests, Introducing the Natural Language Toolkit, Making GitHub API Requests
- hierarchical clustering, Hierarchical clustering–k-means clustering
- HierarchicalClustering Python class, Hierarchical clustering
- histograms
- frequency data for tweets, Visualizing Frequency Data with Histograms–Closing Remarks
- generating with IPython Notebook, Visualizing Frequency Data with Histograms–Closing Remarks
- recommended exercises, Recommended Exercises
- home timeline (tweets), Fundamental Twitter Terminology
- homographs, Discovering Semantics by Decoding Syntax
- homonyms, Discovering Semantics by Decoding Syntax
- Horrocks, Ian, Open-world versus closed-world assumptions
- hRecipe microformat, Microformats: Easy-to-Implement Metadata, Using Recipe Data to Improve Online Matchmaking–Accessing LinkedIn’s 200 Million Online Résumés
- hResume microformat, Microformats: Easy-to-Implement Metadata, Accessing LinkedIn’s 200 Million Online Résumés–From Semantic Markup to Semantic Web: A Brief Interlude
- hReview microformat, Using Recipe Data to Improve Online Matchmaking–Accessing LinkedIn’s 200 Million Online Résumés
- hReview-aggregate microformat, Retrieving recipe reviews–Accessing LinkedIn’s 200 Million Online Résumés
- HTML format, Scraping, Parsing, and Crawling the Web
- HTTP API, Making Google+ API Requests
- HTTP requests
- Facebook Social Graph API, Understanding the Social Graph API
- GitHub API, Creating a GitHub API Connection
- requests Python package, Understanding the Social Graph API
- Twitter, Problem–Problem
- human language data, Quality of Analytics for Processing Human Language Data
- (see also NLP)
- analyzing bigrams, Analyzing Bigrams in Human Language–Contingency tables and scoring functions
- applying TF-IDF to, Applying TF-IDF to Human Language–Applying TF-IDF to Human Language
- chunking, Natural Language Processing Illustrated Step-by-Step
- document summarization, Document Summarization–Analysis of Luhn’s summarization algorithm
- end of sentence detection in, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data
- entity resolution, Analyzing this book’s Facebook page
- extraction, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- Facebook example, Analyzing Coke vs Pepsi Facebook pages
- finding similar documents, Finding Similar Documents–Analyzing Bigrams in Human Language
- measuring quality of analytics for, Quality of Analytics for Processing Human Language Data–Quality of Analytics for Processing Human Language Data
- part of speech assignment, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- querying with TF-IDF, Querying Human Language Data with TF-IDF–Reflections on Analyzing Human Language Data
- reflections on, Reflections on Analyzing Human Language Data
- tokenization, Introducing the Natural Language Toolkit, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data–Sentence Detection in Human Language Data
- hyperedges, Modeling Data with Property Graphs
- hypergraphs, Modeling Data with Property Graphs
I
- I/O bound code, Breadth-First Search in Web Crawling
- ID field (tweets), Analyzing the 140 Characters
- IDF (inverse document frequency), Inverse Document Frequency
- IMAP (Internet message access protocol), Analyzing Your Own Mail Data, Fetching and Parsing Email Messages with IMAP–Fetching and Parsing Email Messages with IMAP
- importing mail corpus into MongoDB, Importing a JSONified Mail Corpus into MongoDB–The MongoDB shell
- In-Reply-To email header, A Primer on Unix Mailboxes
- Indie Web, Microformats: Easy-to-Implement Metadata, Microformats: Easy-to-Implement Metadata
- inference, Inferencing About an Open World–Inferencing About an Open World
- information retrieval theory
- about, A Whiz-Bang Introduction to TF-IDF, Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
- additional resources, A Whiz-Bang Introduction to TF-IDF
- cosine similarity, Finding Similar Documents–Analyzing Bigrams in Human Language
- inverse document frequency, Inverse Document Frequency
- term frequency, Term Frequency–Term Frequency
- TF-IDF example, TF-IDF–TF-IDF
- vector space models and, The theory behind vector space models and cosine similarity–The theory behind vector space models and cosine similarity
- interactions between entities, Gisting Human Language Data–Gisting Human Language Data
- interest graphs
- about, Examining Patterns in Retweets, Overview, Analyzing GitHub Interest Graphs
- adding repositories to, Adding more repositories to the interest graph–Computational Considerations
- centrality measures and, Computing Graph Centrality Measures–Computing Graph Centrality Measures, Application of centrality measures–Application of centrality measures
- extending for GitHub users, Extending the Interest Graph with “Follows” Edges for Users–Computational Considerations
- Facebook and, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Seeding an Interest Graph
- nodes as query pivots, Using Nodes as Pivots for More Efficient Queries–Using Nodes as Pivots for More Efficient Queries
- online resources, Online Resources
- seeding, Seeding an Interest Graph–Seeding an Interest Graph
- Twitter and, Examining Patterns in Retweets, Seeding an Interest Graph
- visualizing, Visualizing Interest Graphs–Closing Remarks
- Internet message access protocol (IMAP), Analyzing Your Own Mail Data, Fetching and Parsing Email Messages with IMAP–Fetching and Parsing Email Messages with IMAP
- Internet usage statistics, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
- inverse document frequency (IDF), Inverse Document Frequency
- io Python package, Discussion
J
- Jaccard distance, Measuring Similarity, Greedy clustering, Recommended Exercises
- Jaccard Index, Recommended Exercises, Analyzing Bigrams in Human Language, Contingency tables and scoring functions, Contingency tables and scoring functions
- job titles (LinkedIn data)
- counting, Normalizing and counting job titles–Normalizing and counting job titles
- greedy clustering, Greedy clustering–Runtime analysis
- hierarchical clustering, Hierarchical clustering–k-means clustering
- k-means clustering, k-means clustering–k-means clustering
- JSON
- converting mailboxes to, Converting Unix Mailboxes to JSON–Converting Unix Mailboxes to JSON
- Facebook Social Graph API, Understanding the Social Graph API
- GitHub API, Visualizing Interest Graphs
- Google+ API, Applying TF-IDF to Human Language
- importing mail corpus into MongoDB, Importing a JSONified Mail Corpus into MongoDB–The MongoDB shell
- MongoDB and, Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More, Solution
- saving and restoring with text files, Problem–Discussion
- Twitter API, Exploring Trending Topics
- json Python package, Exploring Trending Topics
K
- k-means clustering, k-means clustering–k-means clustering
- Keyhole Markup Language (KML), Visualizing geographic clusters with Google Earth, Geocoordinates: A Common Thread for Just About Anything
- keyword arguments (Python), Searching for Tweets
- keywords, searching email by, Searching Emails by Keywords–Searching Emails by Keywords
- Kiss, Tibor, Sentence Detection in Human Language Data
- KMeansClustering Python class, Visualizing geographic clusters with Google Earth
- KML (Keyhole Markup Language), Visualizing geographic clusters with Google Earth, Geocoordinates: A Common Thread for Just About Anything
- Krackhardt Kite Graph, Computing Graph Centrality Measures–Computing Graph Centrality Measures
- Kruskal’s algorithm, Computational Considerations
- **kwargs (Python), Searching for Tweets
L
- Levenshtein distance, Measuring Similarity
- lexical diversity of tweets, Computing the Lexical Diversity of Tweets–Computing the Lexical Diversity of Tweets, Discussion
- likelihood ratio, Contingency tables and scoring functions
- likes (Facebook), Understanding the Social Graph API, Analyzing things your friends “like”–Analyzing things your friends “like”
- about, Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More–Overview
- clustering data, Crash Course on Clustering Data–Visualizing geographic clusters with Google Earth
- hResume micoformat, Accessing LinkedIn’s 200 Million Online Résumés–From Semantic Markup to Semantic Web: A Brief Interlude
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- LinkedIn API
- about, Exploring the LinkedIn API
- clustering data, Crash Course on Clustering Data–Visualizing geographic clusters with Google Earth
- downloading connections as CSV files, Downloading LinkedIn Connections as a CSV File
- making requests, Making LinkedIn API Requests–Making LinkedIn API Requests
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- LinkedInApplication Python class, Making LinkedIn API Requests–Making LinkedIn API Requests
- list comprehensions, Exploring Trending Topics, Extracting Tweet Entities
- locations (LinkedIn data)
- counting, Normalizing and counting locations–Normalizing and counting locations
- KML and, Visualizing geographic clusters with Google Earth
- visualizing with cartograms, Visualizing locations with cartograms–Measuring Similarity
- visualizing with Google Earth, Visualizing geographic clusters with Google Earth–Visualizing geographic clusters with Google Earth
- Luhn’s algorithm, Document Summarization, Analysis of Luhn’s summarization algorithm–Analysis of Luhn’s summarization algorithm
M
- mail corpus
- analyzing Enron data, Analyzing the Enron Corpus–Searching Emails by Keywords
- converting to mailbox, Converting a Mail Corpus to a Unix Mailbox–Converting a Mail Corpus to a Unix Mailbox
- getting Enron data, Getting the Enron Data–Getting the Enron Data
- importing into MongoDB, Importing a JSONified Mail Corpus into MongoDB–The MongoDB shell
- programmatically accessing MongoDB, Programmatically Accessing MongoDB with Python–Programmatically Accessing MongoDB with Python
- mailbox Python package, A Primer on Unix Mailboxes
- mailboxes
- about, A Primer on Unix Mailboxes–A Primer on Unix Mailboxes
- analyzing Enron corpus, Analyzing the Enron Corpus–Searching Emails by Keywords
- analyzing mail data, Analyzing Your Own Mail Data–Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
- converting mail corpus to, Converting a Mail Corpus to a Unix Mailbox–Converting a Mail Corpus to a Unix Mailbox
- converting to JSON, Converting Unix Mailboxes to JSON–Converting Unix Mailboxes to JSON
- online resources, Online Resources
- parsing email messages with IMAP, Fetching and Parsing Email Messages with IMAP–Fetching and Parsing Email Messages with IMAP
- processing mail corpus, Obtaining and Processing a Mail Corpus–Programmatically Accessing MongoDB with Python
- recommended exercises, Recommended Exercises
- searching by keywords, Searching Emails by Keywords–Searching Emails by Keywords
- visualizing patterns in Gmail, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension–Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
- visualizing time-series trends, Discovering and Visualizing Time-Series Trends–Discovering and Visualizing Time-Series Trends
- Manning, Christopher, Contingency tables and scoring functions
- map function, Programmatically Accessing MongoDB with Python
- map-reduce computing paradigm, Programmatically Accessing MongoDB with Python
- matplotlib Python package, Visualizing Frequency Data with Histograms–Visualizing Frequency Data with Histograms, Analyzing things your friends “like”
- matrix diagrams, Visualizing document similarity with a matrix diagram
- maximal clique, Analyzing mutual friendships with directed graphs
- maximum clique, Analyzing mutual friendships with directed graphs
- mbox (see Unix mailboxes)
- Message-ID email header, A Primer on Unix Mailboxes
- metadata
- email headers, Getting the Enron Data
- Google+, Exploring the Google+ API
- OGP example, Understanding the Open Graph Protocol–Understanding the Open Graph Protocol
- RDFa, Understanding the Open Graph Protocol
- semantic web, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More
- Twitter-related, Fundamental Twitter Terminology
- microdata (HTML), Scraping, Parsing, and Crawling the Web, Microformats: Easy-to-Implement Metadata
- microform.at service, Geocoordinates: A Common Thread for Just About Anything
- microformats
- about, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More–Microformats: Easy-to-Implement Metadata
- geocoordinates, Microformats: Easy-to-Implement Metadata, Geocoordinates: A Common Thread for Just About Anything–Geocoordinates: A Common Thread for Just About Anything
- hResume, Accessing LinkedIn’s 200 Million Online Résumés–From Semantic Markup to Semantic Web: A Brief Interlude
- list of popular, Microformats: Easy-to-Implement Metadata
- online matchmaking, Using Recipe Data to Improve Online Matchmaking–Accessing LinkedIn’s 200 Million Online Résumés
- recommended exercises, Recommended Exercises
- minimum spanning tree, Computational Considerations
- modeling data with property graphs, Modeling Data with Property Graphs–Modeling Data with Property Graphs
- moments (Google+), Exploring the Google+ API
- MongoDB
- $addToSet operator, Writing Advanced Queries, Writing Advanced Queries
- advanced queries, Writing Advanced Queries–Writing Advanced Queries
- analyzing sender/recipient patterns, Analyzing Patterns in Sender/Recipient Communications–Analyzing Patterns in Sender/Recipient Communications
- ensureIndex command, Searching Emails by Keywords
- find Python function, Programmatically Accessing MongoDB with Python, Analyzing Patterns in Sender/Recipient Communications
- $group operator, Writing Advanced Queries, Writing Advanced Queries
- $gt operator, Discovering and Visualizing Time-Series Trends
- importing JSON mailbox data into, Converting Unix Mailboxes to JSON
- importing mail corpus into, Importing a JSONified Mail Corpus into MongoDB–The MongoDB shell
- $in operator, Analyzing Patterns in Sender/Recipient Communications, Writing Advanced Queries
- JSON and, Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More, Solution
- $lt operator, Discovering and Visualizing Time-Series Trends
- $match operator, Writing Advanced Queries
- online resources, Online Resources
- programmatically accessing, Programmatically Accessing MongoDB with Python–Programmatically Accessing MongoDB with Python
- querying by date/time range, Querying by Date/Time Range–Querying by Date/Time Range
- recommended exercises, Recommended Exercises
- searching emails by keywords, Searching Emails by Keywords–Searching Emails by Keywords
- $sum function, Discovering and Visualizing Time-Series Trends
- time-series trends, Discovering and Visualizing Time-Series Trends–Discovering and Visualizing Time-Series Trends, Discussion
- $unwind operator, Writing Advanced Queries
- MongoDB shell, The MongoDB shell–The MongoDB shell, Searching Emails by Keywords
- mongoimport MongoDB command, Importing a JSONified Mail Corpus into MongoDB, Importing a JSONified Mail Corpus into MongoDB
- mutualfriends API (Facebook), Analyzing mutual friendships with directed graphs–Closing Remarks
N
- n-gram similarity, Measuring Similarity, Analyzing Bigrams in Human Language
- n-squared problems, Crash Course on Clustering Data
- N3 (Notation3), Inferencing About an Open World
- named entity recognition, Entity-Centric Analysis: A Paradigm Shift
- natural language processing (see NLP)
- Natural Language Toolkit (see NLTK)
- nested list comprehension, Extracting Tweet Entities
- NetworkX Python package
- about, Analyzing mutual friendships with directed graphs–Closing Remarks, Modeling Data with Property Graphs, Modeling Data with Property Graphs
- add_edge method, Modeling Data with Property Graphs, Seeding an Interest Graph
- add_node method, Seeding an Interest Graph
- betweenness_centrality function, Computing Graph Centrality Measures
- clique detection, Using Nodes as Pivots for More Efficient Queries
- closeness_centrality function, Computing Graph Centrality Measures
- degree_centrality function, Computing Graph Centrality Measures
- DiGraph class, Computing Graph Centrality Measures
- find_cliques method, Analyzing mutual friendships with directed graphs
- Graph class, Computing Graph Centrality Measures
- recommended exercises, Recommended Exercises, Recommended Exercises
- NLP (natural language processing), A Whiz-Bang Introduction to TF-IDF
- (see also human language data)
- about, A Whiz-Bang Introduction to TF-IDF, Discovering Semantics by Decoding Syntax
- additional resources, Contingency tables and scoring functions
- document summarization, Document Summarization–Analysis of Luhn’s summarization algorithm
- sentence detection, Sentence Detection in Human Language Data–Sentence Detection in Human Language Data
- step-by-step illustration, Natural Language Processing Illustrated Step-by-Step–Natural Language Processing Illustrated Step-by-Step
- NLTK (Natural Language Toolkit)
- about, Introducing the Natural Language Toolkit–Introducing the Natural Language Toolkit
- additional resources, Mining Google+: Computing Document Similarity, Extracting Collocations, and More, Natural Language Processing Illustrated Step-by-Step
- chunking, Natural Language Processing Illustrated Step-by-Step
- computing bigrams and collocations for sentences, Analyzing Bigrams in Human Language–Analyzing Bigrams in Human Language
- EOS detection, Natural Language Processing Illustrated Step-by-Step
- extraction, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- measuring similarity, Measuring Similarity–Measuring Similarity
- POS tagging, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- stopword lists, Inverse Document Frequency
- tokenization, Introducing the Natural Language Toolkit, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data–Sentence Detection in Human Language Data
- nltk Python package
- batch_ne_chunk function, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- clean_html function, Making Google+ API Requests
- collocations function, Analyzing Bigrams in Human Language
- concordance method, Introducing the Natural Language Toolkit
- cosine_distance function, Clustering posts with cosine similarity
- demo function, Introducing the Natural Language Toolkit
- download function, Measuring Similarity
- edit_distance function, Measuring Similarity
- FreqDist class, Measuring Similarity, Making GitHub API Requests
- jaccard_distance function, Measuring Similarity
- sent_tokenize method, Sentence Detection in Human Language Data, Sentence Detection in Human Language Data
- word_tokenize method, Sentence Detection in Human Language Data, Sentence Detection in Human Language Data
- node IDs (Social Graph API), Understanding the Social Graph API
- Node.js platform, Using Recipe Data to Improve Online Matchmaking
- nodes
- betweenness centrality, Computing Graph Centrality Measures
- closeness centrality, Computing Graph Centrality Measures
- degree centrality, Computing Graph Centrality Measures
- as query pivots, Using Nodes as Pivots for More Efficient Queries–Using Nodes as Pivots for More Efficient Queries
- normal distribution, Contingency tables and scoring functions
- normalizing LinkedIn data
- about, Crash Course on Clustering Data, Normalizing Data to Enable Analysis
- counting companies, Normalizing and counting companies–Normalizing and counting companies
- counting job titles, Normalizing and counting job titles–Normalizing and counting job titles
- counting locations, Normalizing and counting locations–Normalizing and counting locations
- visualizing locations with cartograms, Visualizing locations with cartograms–Measuring Similarity
- Norvig, Peter, Inferencing About an Open World
- NoSQL databases, Modeling Data with Property Graphs
- Notation3 (N3), Inferencing About an Open World
- NP-complete problems, Analyzing mutual friendships with directed graphs–Closing Remarks
- numpy Python package, Document Summarization
O
- OAuth (Open Authorization)
- about, Creating a Twitter API Connection, Overview–OAuth 2.0
- accessing Gmail with, Accessing Your Gmail with OAuth–Accessing Your Gmail with OAuth
- Big O notation, Crash Course on Clustering Data, Searching Emails by Keywords
- Facebook Social Graph API and, Understanding the Social Graph API
- GitHub API and, Creating a GitHub API Connection–Creating a GitHub API Connection
- Google+ API and, Making Google+ API Requests
- LinkedIn API and, Making LinkedIn API Requests–Making LinkedIn API Requests
- runtime complexity, Computational Considerations
- Twitter API and, Fundamental Twitter Terminology, Creating a Twitter API Connection–Creating a Twitter API Connection, Solution–Discussion
- OGP (Open Graph protocol), Understanding the Open Graph Protocol–Understanding the Open Graph Protocol, Microformats: Easy-to-Implement Metadata
- ontologies, Man Cannot Live on Facts Alone
- operator.itemgetter Python function, Analyzing things your friends “like”
- OWL language, Modeling Data with Property Graphs, Inferencing About an Open World
P
- parsing
- part-of-speech (POS) tagging, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- Patel-Schneider, Peter, Open-world versus closed-world assumptions
- patterns
- in retweets, Examining Patterns in Retweets–Examining Patterns in Retweets, Problem–Problem
- in sender/recipient communications, Analyzing Patterns in Sender/Recipient Communications–Analyzing Patterns in Sender/Recipient Communications
- visualizing in Gmail, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension–Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
- PaySwarm, Recommended Exercises
- Pearson’s chi-square test, Contingency tables and scoring functions
- Penn Treebank Project, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- people (Google+), Exploring the Google+ API, Making Google+ API Requests–Making Google+ API Requests
- People API (Google+), Making Google+ API Requests
- personal API access token (OAuth), Creating a GitHub API Connection
- pip instal command
- google-api-python-client Python package, Making Google+ API Requests
- pip install command
- beautifulsoup Python package, Making Google+ API Requests
- cluster Python package, Hierarchical clustering
- envoy Python package, Importing a JSONified Mail Corpus into MongoDB
- facebook-sdk Python package, Analyzing Social Graph Connections
- feedparser Python package, Scraping, Parsing, and Crawling the Web
- geopy Python package, Normalizing and counting locations
- networkx Python package, Analyzing mutual friendships with directed graphs, Modeling Data with Property Graphs
- nltk Python package, Measuring Similarity
- numpy Python package, Document Summarization
- oauth2 Python package, Accessing Your Gmail with OAuth
- prettytable Python package, Analyzing things your friends “like”, Making LinkedIn API Requests, Discussion
- PyGithub Python package, Creating a GitHub API Connection
- pymongo Python package, Importing a JSONified Mail Corpus into MongoDB, Programmatically Accessing MongoDB with Python
- python-boilerpipe package, Scraping, Parsing, and Crawling the Web
- python-linkedin Python package, Making LinkedIn API Requests
- python_dateutil Python package, Converting a Mail Corpus to a Unix Mailbox
- requests Python package, Understanding the Social Graph API, Creating a GitHub API Connection
- twitter Python package, Creating a Twitter API Connection, Twitter Cookbook
- twitter-text-py Python package, Discussion
- places (Twitter), Fundamental Twitter Terminology, Discussion
- PMI (Pointwise Mutual Information), Contingency tables and scoring functions
- Pointwise Mutual Information (PMI), Contingency tables and scoring functions
- POS (part-of-speech) tagging, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
- prettytable Python package, Analyzing things your friends “like”, Making LinkedIn API Requests, Discovering and Visualizing Time-Series Trends, Solution
- privacy controls
- projects (GitHub), Exploring GitHub’s API
- Prolog programming language, Open-world versus closed-world assumptions
- property graphs, modeling data with, Modeling Data with Property Graphs–Modeling Data with Property Graphs
- public firehose (tweets), Fundamental Twitter Terminology
- public streams API, Fundamental Twitter Terminology
- pull requests (Git), Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- PunktSentenceTokenizer Python class, Sentence Detection in Human Language Data
- PunktWordTokenizer Python class, Sentence Detection in Human Language Data
- PuTTY (Windows SSH client), The MongoDB shell
- pydoc Python package, Creating a Twitter API Connection, Making Google+ API Requests, Sentence Detection in Human Language Data, Making GitHub API Requests
- PyGithub Python package, Creating a GitHub API Connection, Making GitHub API Requests–Making GitHub API Requests, Adding more repositories to the interest graph
- PyLab, Visualizing Frequency Data with Histograms, Analyzing things your friends “like”
- pymongo Python package, Importing a JSONified Mail Corpus into MongoDB, Programmatically Accessing MongoDB with Python–Programmatically Accessing MongoDB with Python, Searching Emails by Keywords
- python-boilerpipe Python package, Scraping, Parsing, and Crawling the Web
- python-oauth2 Python package, Accessing Your Gmail with OAuth
- PYTHONPATH environment variable, Creating a Twitter API Connection
Q
- quality of analytics for human language data, Quality of Analytics for Processing Human Language Data–Quality of Analytics for Processing Human Language Data
- queries
- advanced, Writing Advanced Queries–Writing Advanced Queries
- by date/time range, Querying by Date/Time Range–Querying by Date/Time Range
- Facebook Social Graph API, Analyzing Social Graph Connections–Analyzing Social Graph Connections
- GitHub API, Making GitHub API Requests
- Google+ API, Making Google+ API Requests–Making Google+ API Requests
- human language data, Querying Human Language Data with TF-IDF–Reflections on Analyzing Human Language Data
- LinkedIn data, Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More, Making LinkedIn API Requests
- nodes as pivots for, Using Nodes as Pivots for More Efficient Queries–Using Nodes as Pivots for More Efficient Queries
- TF-IDF support, A Whiz-Bang Introduction to TF-IDF–Clustering posts with cosine similarity
- Twitter API, Exploring Trending Topics–Analyzing the 140 Characters
- quopri Python package, Converting Unix Mailboxes to JSON
- quoting tweets, Examining Patterns in Retweets
R
- rate limits
- Facebook Social Graph API, Analyzing Social Graph Connections
- GitHub API, Creating a GitHub API Connection, Extending the Interest Graph with “Follows” Edges for Users
- LinkedIn API, Making LinkedIn API Requests
- Twitter API, Exploring Trending Topics
- raw frequency, Contingency tables and scoring functions
- RDF (Resource Description Framework), Man Cannot Live on Facts Alone–Inferencing About an Open World
- RDF Schema language, Modeling Data with Property Graphs, Inferencing About an Open World
- RDFa
- about, Microformats: Easy-to-Implement Metadata
- metadata and, Understanding the Open Graph Protocol
- web scraping and, Scraping, Parsing, and Crawling the Web
- re Python package, Visualizing locations with cartograms
- Really Simple Syndication (RSS), Scraping, Parsing, and Crawling the Web
- reduce function, Programmatically Accessing MongoDB with Python
- References email header, A Primer on Unix Mailboxes
- regular expressions, Visualizing locations with cartograms, Discovering Semantics by Decoding Syntax, Converting a Mail Corpus to a Unix Mailbox, Discussion
- RelMeAuth Indie Web initiative, Microformats: Easy-to-Implement Metadata, Recommended Exercises
- repositories, adding to interest graphs, Adding more repositories to the interest graph–Computational Considerations
- requests Python package, Understanding the Social Graph API, Making GitHub API Requests
- Resource Description Framework (RDF), Man Cannot Live on Facts Alone–Inferencing About an Open World
- RESTful API, Creating a Twitter API Connection, Exploring Trending Topics
- retweeted field (tweets), Analyzing the 140 Characters, Discussion
- retweeted_status field (tweets), Analyzing the 140 Characters, Examining Patterns in Retweets
- retweets
- extracting attribution, Problem
- frequency data in histograms for, Visualizing Frequency Data with Histograms
- patterns in, Examining Patterns in Retweets–Examining Patterns in Retweets, Problem–Problem
- retweet_count field (tweets), Analyzing the 140 Characters, Examining Patterns in Retweets, Solution, Discussion
- RFC 822, Fetching and Parsing Email Messages with IMAP
- RFC 2045, Converting Unix Mailboxes to JSON, Online Resources
- RFC 3501, Fetching and Parsing Email Messages with IMAP
- RFC 5849, Overview
- RFC 6749, Overview
- Riak database, Programmatically Accessing MongoDB with Python
- RIAs (rich internet applications), The Semantic Web: An Evolutionary Revolution
- RSS (Really Simple Syndication), Scraping, Parsing, and Crawling the Web
- Russell, Stuart, Inferencing About an Open World
S
- schema.org site, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More, Microformats: Easy-to-Implement Metadata
- Schütze, Hinrich, Contingency tables and scoring functions
- scoring functions, Analyzing Bigrams in Human Language–Contingency tables and scoring functions
- Scrapy Python framework, Recommended Exercises, Scraping, Parsing, and Crawling the Web
- screen names (Twitter)
- extracting from tweets, Extracting Tweet Entities
- frequency data for tweets with histograms, Visualizing Frequency Data with Histograms–Visualizing Frequency Data with Histograms
- lexical diversity of, Computing the Lexical Diversity of Tweets
- Search API, Making LinkedIn API Requests, Solution
- searching
- bounded breadth-first, Breadth-First Search in Web Crawling
- breadth-first, Breadth-First Search in Web Crawling–Breadth-First Search in Web Crawling
- depth-first, Breadth-First Search in Web Crawling
- email by keywords, Searching Emails by Keywords–Searching Emails by Keywords
- Facebook Graph Search project, Understanding the Open Graph Protocol
- Google+ data, Exploring the Google+ API–Making Google+ API Requests
- LinkedIn data, Making LinkedIn API Requests, Clustering Enhances User Experiences
- for tweets, Fundamental Twitter Terminology, Searching for Tweets–Analyzing the 140 Characters, Problem, Problem
- secret key (OAuth), Making LinkedIn API Requests
- seeding interest graphs, Seeding an Interest Graph
- semantic web
- about, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More
- as evolutionary revolution, The Semantic Web: An Evolutionary Revolution–Inferencing About an Open World
- microformats, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More–From Semantic Markup to Semantic Web: A Brief Interlude
- online resources, Online Resources
- recommended exercises, Recommended Exercises
- technologies supporting, Scraping, Parsing, and Crawling the Web, Modeling Data with Property Graphs
- transitioning to, From Semantic Markup to Semantic Web: A Brief Interlude
- semantic web stack, Modeling Data with Property Graphs
- setwise operations
- about, Exploring Trending Topics
- difference, Analyzing Patterns in Sender/Recipient Communications, Solution
- intersection, Analyzing things your friends “like”, Measuring Similarity, Analyzing Patterns in Sender/Recipient Communications, Solution
- union, Analyzing Patterns in Sender/Recipient Communications
- similarity
- cosine, Finding Similar Documents–Analyzing Bigrams in Human Language
- measuring in LinkedIn data, Crash Course on Clustering Data, Measuring Similarity–Measuring Similarity
- slicing technique, Extracting Tweet Entities
- Snowball stemmer, Recommended Exercises
- social coding, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- Social Graph API (Facebook)
- about, Exploring Facebook’s Social Graph API–Understanding the Social Graph API
- analyzing connections, Analyzing Social Graph Connections–Analyzing Social Graph Connections
- analyzing Facebook pages, Analyzing Facebook Pages–Analyzing Coke vs Pepsi Facebook pages
- examining friendships, Examining Friendships–Closing Remarks
- field expansion feature, Understanding the Social Graph API
- online resources, Online Resources
- Open Graph protocol and, Understanding the Open Graph Protocol–Understanding the Open Graph Protocol
- rate limits, Analyzing Social Graph Connections
- recommended exercises, Recommended Exercises
- XFN and, Microformats: Easy-to-Implement Metadata
- social graphs, Seeding an Interest Graph
- social interest graphs (see interest graphs)
- SPARQL language, Modeling Data with Property Graphs
- SSH client, The MongoDB shell
- stargazing (GitHub), Exploring GitHub’s API, Making GitHub API Requests, Seeding an Interest Graph–Seeding an Interest Graph
- statistics, Internet usage, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
- stopwords
- Streaming API (Twitter), Problem
- Strunk, Jan, Sentence Detection in Human Language Data
- Student’s t-score, Contingency tables and scoring functions
- subject-verb-object form, Gisting Human Language Data
- Subversion version control system, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- supernodes, Adding more repositories to the interest graph, Discussion
- supervised learning, Scraping, Parsing, and Crawling the Web, Quality of Analytics for Processing Human Language Data
- syllogisms, Inferencing About an Open World
T
- tag clouds, Entity-Centric Analysis: A Paradigm Shift, Recommended Exercises
- taxonomies, Why Is Twitter All the Rage?
- term frequency (TF), Term Frequency–Term Frequency
- Term Frequency–Inverse Document Frequency (see TF-IDF)
- text field (tweets), Analyzing the 140 Characters
- TF (term frequency), Term Frequency–Term Frequency
- TF-IDF (Term Frequency–Inverse Document Frequency)
- about, Overview, A Whiz-Bang Introduction to TF-IDF
- applying to human language, Applying TF-IDF to Human Language–Applying TF-IDF to Human Language
- finding similar documents, Finding Similar Documents–Analyzing Bigrams in Human Language
- inverse document frequency, Inverse Document Frequency
- querying human language data with, Querying Human Language Data with TF-IDF–Reflections on Analyzing Human Language Data
- running on sample data, TF-IDF–TF-IDF
- term frequency, Term Frequency–Term Frequency
- thread pool, Breadth-First Search in Web Crawling, Extending the Interest Graph with “Follows” Edges for Users
- time-series trends, Discovering and Visualizing Time-Series Trends–Discovering and Visualizing Time-Series Trends, Problem
- time.sleep Python function, Solution
- timelines (Twitter), Fundamental Twitter Terminology–Fundamental Twitter Terminology, Solution
- timestamps, A Primer on Unix Mailboxes
- Titan big graph database, Modeling Data with Property Graphs
- tokenization, Introducing the Natural Language Toolkit, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data–Sentence Detection in Human Language Data
- Travelling Salesman probems, Visualizing geographic clusters with Google Earth
- TreebankWordTokenizer Python class, Sentence Detection in Human Language Data
- trends (Twitter), Exploring Trending Topics–Exploring Trending Topics, Problem
- TrigramAssociationMeasures Python class, Measuring Similarity
- trigrams, Measuring Similarity
- true error, Quality of Analytics for Processing Human Language Data
- true negatives, Quality of Analytics for Processing Human Language Data
- true positives, Quality of Analytics for Processing Human Language Data
- Turing Test, Discovering Semantics by Decoding Syntax
- tweet entities
- analyzing, Analyzing the 140 Characters–Analyzing the 140 Characters, Analyzing Tweets and Tweet Entities with Frequency Analysis–Analyzing Tweets and Tweet Entities with Frequency Analysis
- composition of, Fundamental Twitter Terminology
- extracting, Extracting Tweet Entities, Problem, Problem, Problem
- finding most popular, Problem
- searching for, Fundamental Twitter Terminology, Searching for Tweets–Analyzing the 140 Characters
- TweetDeck, Fundamental Twitter Terminology
- tweets
- about, Fundamental Twitter Terminology–Fundamental Twitter Terminology
- analyzing, Analyzing the 140 Characters–Analyzing the 140 Characters, Analyzing Tweets and Tweet Entities with Frequency Analysis–Analyzing Tweets and Tweet Entities with Frequency Analysis, Problem
- composition of, Fundamental Twitter Terminology
- finding most popular, Problem
- harvesting, Problem
- lexical diversity of, Computing the Lexical Diversity of Tweets–Computing the Lexical Diversity of Tweets, Discussion
- quoting, Examining Patterns in Retweets
- retweeting, Examining Patterns in Retweets–Examining Patterns in Retweets, Visualizing Frequency Data with Histograms, Problem–Problem
- searching for, Fundamental Twitter Terminology, Searching for Tweets–Analyzing the 140 Characters, Problem, Problem
- timelines and, Fundamental Twitter Terminology–Fundamental Twitter Terminology, Solution
- about, Why Is Twitter All the Rage?–Why Is Twitter All the Rage?
- fundamental terminology, Fundamental Twitter Terminology–Fundamental Twitter Terminology
- interest graphs and, Examining Patterns in Retweets, Seeding an Interest Graph
- recommended exercises, Recommended Exercises
- Twitter accounts
- creating, Fundamental Twitter Terminology
- governance of, Why Is Twitter All the Rage?
- logging into, Fundamental Twitter Terminology
- recommended exercises, Recommended Exercises, Recommended Exercises
- resolving user profile information, Problem
- Twitter API
- accessing for development purposes, Problem–Discussion
- collecting time-series data, Problem
- convenient function calls, Problem
- creating connections, Creating a Twitter API Connection–Creating a Twitter API Connection
- fundamental terminology, Fundamental Twitter Terminology–Fundamental Twitter Terminology
- making robust requests, Problem–Problem
- online resources, Online Resources, Online Resources
- rate limits, Exploring Trending Topics
- recommended exercises, Recommended Exercises, Recommended Exercises
- sampling public data, Problem
- saving and restoring JSON data with text files, Problem–Discussion
- searching for tweets, Searching for Tweets–Analyzing the 140 Characters, Problem, Problem
- trending topics, Exploring Trending Topics–Exploring Trending Topics, Problem
- Twitter platform objects
- about, Fundamental Twitter Terminology–Fundamental Twitter Terminology
- analyzing tweets, Analyzing the 140 Characters–Closing Remarks
- searching for tweets, Searching for Tweets–Analyzing the 140 Characters, Problem, Problem
- Twitter Python class, Exploring Trending Topics
- twitter Python package, Creating a Twitter API Connection, Twitter Cookbook
- twitter_text Python package, Solution
- Twurl tool (Twitter API), Fundamental Twitter Terminology
U
- UnicodeDecodeError (Python), Converting Unix Mailboxes to JSON, Discussion
- Unix mailboxes
- about, A Primer on Unix Mailboxes–A Primer on Unix Mailboxes
- converting mail corpus to, Converting a Mail Corpus to a Unix Mailbox–Converting a Mail Corpus to a Unix Mailbox
- converting to JSON, Converting Unix Mailboxes to JSON–Converting Unix Mailboxes to JSON
- unsupervised machine learning, Sentence Detection in Human Language Data
- urllib2 Python package, Understanding the Social Graph API
- URLs (tweets), Fundamental Twitter Terminology, Problem–Problem
- User Followers API (GitHub), Extending the Interest Graph with “Follows” Edges for Users
- user mentions (tweets), Fundamental Twitter Terminology
- user secret (OAuth), Making LinkedIn API Requests
- user timeline (Twitter), Fundamental Twitter Terminology, Solution
- user token (OAuth), Making LinkedIn API Requests
V
- vagrant ssh command, The MongoDB shell
- vCard file format, Geocoordinates: A Common Thread for Just About Anything
- vector space models, The theory behind vector space models and cosine similarity–The theory behind vector space models and cosine similarity
- version control systems, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
- visualizing
- directed graphs of mutual friendships, Visualizing directed graphs of mutual friendships–Closing Remarks
- document similarity with matrix diagrams, Visualizing document similarity with a matrix diagram
- document summarization, Document Summarization–Document Summarization
- frequency data with histograms, Visualizing Frequency Data with Histograms–Closing Remarks
- interactions between entities, Gisting Human Language Data
- interest graphs, Visualizing Interest Graphs–Closing Remarks
- locations with cartograms, Visualizing locations with cartograms–Measuring Similarity
- locations with Google Earth, Visualizing geographic clusters with Google Earth–Visualizing geographic clusters with Google Earth
- patterns in Gmail, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension–Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
- recommended exercises, Recommended Exercises
- time-series trends, Discovering and Visualizing Time-Series Trends–Discovering and Visualizing Time-Series Trends
W
- web crawling
- about, Scraping, Parsing, and Crawling the Web
- breadth-first searches, Breadth-First Search in Web Crawling–Breadth-First Search in Web Crawling
- depth-first searches, Breadth-First Search in Web Crawling
- Web Data Commons, Microformats: Easy-to-Implement Metadata
- web pages
- entity-centric analysis, Entity-Centric Analysis: A Paradigm Shift–Gisting Human Language Data
- mining, Scraping, Parsing, and Crawling the Web–Breadth-First Search in Web Crawling
- online resources, Online Resources
- quality of analytics of, Quality of Analytics for Processing Human Language Data–Quality of Analytics for Processing Human Language Data
- recommended exercises, Recommended Exercises
- semantic understanding of data, Discovering Semantics by Decoding Syntax
- web scraping, Scraping, Parsing, and Crawling the Web–Scraping, Parsing, and Crawling the Web
- well-formed XML, Scraping, Parsing, and Crawling the Web
- Where On Earth (WOE) ID system, Exploring Trending Topics, Discussion
- WhitespaceTokenizer Python class, Sentence Detection in Human Language Data
- WOE (Where On Earth) ID system, Exploring Trending Topics, Discussion
- WolframAlpha, Entity-Centric Analysis: A Paradigm Shift
- WordNet, Recommended Exercises
X
- XFN microformat, Microformats: Easy-to-Implement Metadata
- XHTML format, Scraping, Parsing, and Crawling the Web
- XML format, Scraping, Parsing, and Crawling the Web
- xoauth.py utility, Accessing Your Gmail with OAuth
Y
- Yahoo! GeoPlanet, Exploring Trending Topics
Get Mining the Social Web, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.