Index

A note on the digital index

A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.

Symbols

$ (MongoDB operator), Querying by Date/Time Range
$** (MongoDB operator), Searching Emails by Keywords
68-95-99.7 rule, Contingency tables and scoring functions

A

access token (OAuth)
about, OAuth 1.0A
Facebook, Understanding the Social Graph API
GitHub, Creating a GitHub API ConnectionCreating a GitHub API Connection
Twitter, Creating a Twitter API Connection, DiscussionDiscussion
access token secret (OAuth), Creating a Twitter API Connection, DiscussionDiscussion, OAuth 1.0A
activities (Google+), Exploring the Google+ API, Making Google+ API RequestsMaking Google+ API Requests
agglomeration clustering technique, Hierarchical clustering
aggregation framework (MongoDB), Writing Advanced QueriesWriting Advanced Queries, Discussion
analyzing GitHub API
about, Analyzing GitHub Interest Graphs
extending interest graphs, Extending the Interest Graph with “Follows” Edges for UsersComputational Considerations
graph centrality measures, Computing Graph Centrality MeasuresComputing Graph Centrality Measures
nodes as query pivots, Using Nodes as Pivots for More Efficient QueriesUsing Nodes as Pivots for More Efficient Queries
seeding interest graphs, Seeding an Interest GraphSeeding an Interest Graph
visualizing interest graphs, Visualizing Interest GraphsClosing Remarks
analyzing Google+ data
bigrams in human language, Analyzing Bigrams in Human LanguageContingency tables and scoring functions
TF-IDF, A Whiz-Bang Introduction to TF-IDFTF-IDF
analyzing LinkedIn data
clustering data, Crash Course on Clustering DataClustering Enhances User Experiences, Clustering AlgorithmsVisualizing geographic clusters with Google Earth
measuring similarity, Crash Course on Clustering Data, Measuring SimilarityMeasuring Similarity
normalizing data, Normalizing Data to Enable AnalysisMeasuring Similarity
analyzing mailboxes
analyzing Enron corpus, Analyzing the Enron CorpusSearching Emails by Keywords
analyzing mail data, Analyzing Your Own Mail DataVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
analyzing sender/recipient patterns, Analyzing Patterns in Sender/Recipient CommunicationsAnalyzing Patterns in Sender/Recipient Communications
analyzing Social Graph connections
about, Analyzing Social Graph ConnectionsAnalyzing Social Graph Connections
analyzing Facebook pages, Analyzing Facebook PagesAnalyzing Coke vs Pepsi Facebook pages
analyzing likes, Analyzing things your friends “like”Analyzing things your friends “like”
analyzing mutual friendships, Analyzing mutual friendships with directed graphsClosing Remarks
examining friendships, Examining FriendshipsClosing Remarks
analyzing Twitter platform objects
about, Analyzing the 140 CharactersAnalyzing the 140 Characters
analyzing favorite tweets, Problem
extracting tweet entities, Extracting Tweet Entities, Problem, Problem, Problem
frequency analysis, Analyzing Tweets and Tweet Entities with Frequency AnalysisAnalyzing Tweets and Tweet Entities with Frequency Analysis, Visualizing Frequency Data with HistogramsClosing Remarks, Problem
lexical diversity of tweets, Computing the Lexical Diversity of TweetsComputing the Lexical Diversity of Tweets, Discussion
patterns in retweets, Examining Patterns in RetweetsExamining Patterns in Retweets, ProblemProblem
analyzing web pages
by scraping, parsing, and crawling, Scraping, Parsing, and Crawling the WebBreadth-First Search in Web Crawling
entity-centric, Entity-Centric Analysis: A Paradigm ShiftGisting Human Language Data
quality of analytics, Quality of Analytics for Processing Human Language DataQuality of Analytics for Processing Human Language Data
semantic understanding of data, Discovering Semantics by Decoding SyntaxAnalysis of Luhn’s summarization algorithm
API key (OAuth), Making LinkedIn API Requests, Making Google+ API Requests
API requests
Facebook, Exploring Facebook’s Social Graph APIUnderstanding the Open Graph Protocol
GitHub, Exploring GitHub’s APIMaking GitHub API Requests
Google+, Exploring the Google+ APIMaking Google+ API Requests
LinkedIn, Exploring the LinkedIn APIDownloading LinkedIn Connections as a CSV File
Twitter, Creating a Twitter API ConnectionCreating a Twitter API Connection
approximate matching (see clustering LinkedIn data)
arbitrary arguments, Searching for Tweets
*args (Python), Searching for Tweets
Aristotle, Inferencing About an Open World
Atom feed, Scraping, Parsing, and Crawling the Web
authorizing applications
accessing Gmail, Accessing Your Gmail with OAuthAccessing Your Gmail with OAuth
Facebook, Understanding the Social Graph API
GitHub API, Making GitHub API RequestsMaking GitHub API Requests
Google+ API, Making Google+ API RequestsMaking Google+ API Requests
LinkedIn API, Making LinkedIn API RequestsMaking LinkedIn API Requests
Twitter and, Creating a Twitter API ConnectionCreating a Twitter API Connection, ProblemDiscussion
avatars, Making Google+ API Requests

C

Cantor, George, Exploring Trending Topics
cartograms, Visualizing locations with cartogramsMeasuring Similarity
central limit theorem, Contingency tables and scoring functions
centrality measures
application of, Application of centrality measuresApplication of centrality measures
betweenness, Computing Graph Centrality Measures
closeness, Computing Graph Centrality Measures
computing for graphs, Computing Graph Centrality MeasuresComputing Graph Centrality Measures
degree, Computing Graph Centrality Measures
online resources, Online Resources
centroid (clusters), k-means clustering
chi-square test, Contingency tables and scoring functions
chunking (NLP), Natural Language Processing Illustrated Step-by-Step
circles (Google+), Exploring the Google+ API
cleanHTML function, Making Google+ API Requests
clique detection
Facebook, Analyzing mutual friendships with directed graphsClosing Remarks
NetworkX Python package, Using Nodes as Pivots for More Efficient Queries
closeness graph metric, Computing Graph Centrality Measures
cluster Python package, Hierarchical clustering, Visualizing geographic clusters with Google Earth
clustering LinkedIn data
about, Crash Course on Clustering DataClustering Enhances User Experiences
clustering algorithms, Clustering AlgorithmsVisualizing geographic clusters with Google Earth
dimensionality reduction and, Crash Course on Clustering Data
greedy clustering, Greedy clusteringRuntime analysis
hierarchical clustering, Hierarchical clusteringk-means clustering
k-means clustering, k-means clusteringk-means clustering
measuring similarity, Crash Course on Clustering Data, Measuring SimilarityMeasuring Similarity
normalizing data to enable analysis, Normalizing Data to Enable Analysis
online resources, Online Resources
recommended exercises, Recommended Exercises
visualizing with Google Earth, Visualizing geographic clusters with Google EarthVisualizing geographic clusters with Google Earth
clustering posts with cosine similarity, Clustering posts with cosine similarityClustering posts with cosine similarity
collections Python module
about, Analyzing Tweets and Tweet Entities with Frequency Analysis
Counter class, Analyzing Tweets and Tweet Entities with Frequency Analysis, Analyzing Coke vs Pepsi Facebook pages, Analyzing things your friends “like”, Measuring Similarity, Making GitHub API Requests, Discussion
collective intelligence, Why Is Twitter All the Rage?
collocations
computing, Analyzing Bigrams in Human LanguageAnalyzing Bigrams in Human Language
n-gram similarity, Measuring Similarity, Analyzing Bigrams in Human Language
comments (Google+), Exploring the Google+ API, Making Google+ API Requests
Common Crawl Corpus, Scraping, Parsing, and Crawling the Web, Microformats: Easy-to-Implement Metadata
company names (LinkedIn data), Normalizing and counting companiesNormalizing and counting companies
confidence intervals, Quality of Analytics for Processing Human Language Data
Connections API (LinkedIn), Making LinkedIn API Requests
consumer key (OAuth), Creating a Twitter API Connection, DiscussionDiscussion, OAuth 1.0A
consumer secret (OAuth), Creating a Twitter API Connection, DiscussionDiscussion, OAuth 1.0A
content field (Google+), Making Google+ API Requests
context, human language data and, Reflections on Analyzing Human Language Data
contingency tables, Analyzing Bigrams in Human LanguageContingency tables and scoring functions
converting
mail corpus to Unix mailbox, Converting a Mail Corpus to a Unix MailboxConverting a Mail Corpus to a Unix Mailbox
mailboxes to JSON, Converting Unix Mailboxes to JSONConverting Unix Mailboxes to JSON
cosine similarity
about, Finding Similar DocumentsThe theory behind vector space models and cosine similarity
clustering posts with, Clustering posts with cosine similarityClustering posts with cosine similarity
visualizing with matrix diagram, Visualizing document similarity with a matrix diagram
CouchDB, Programmatically Accessing MongoDB with Python
Counter class
Facebook and, Analyzing Coke vs Pepsi Facebook pages, Analyzing things your friends “like”
GitHub and, Making GitHub API Requests
LinkedIn and, Measuring Similarity
Twitter and, Analyzing Tweets and Tweet Entities with Frequency Analysis, Discussion
CSS query selectors, Retrieving recipe reviews
CSV file format, Downloading LinkedIn Connections as a CSV File
csv Python module, Downloading LinkedIn Connections as a CSV File, Discussion
cursors (Twitter API), Discussion
CVS version control system, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More

D

D3.js toolkit, Visualizing directed graphs of mutual friendships, Visualizing locations with cartograms, Visualizing document similarity with a matrix diagram, Visualizing Interest Graphs
Data Science Toolkit, Recommended Exercises
DataSift platform, Discussion
date/time range, query by, Querying by Date/Time RangeQuerying by Date/Time Range
datetime function, Querying by Date/Time Range
dateutil Python package, Converting a Mail Corpus to a Unix Mailbox
DBPedia initiative, Recommended Exercises
deduplication (see clustering LinkedIn data)
degree graph metric, Computing Graph Centrality Measures
degree of nodes in graphs, Modeling Data with Property Graphs
dendograms, Hierarchical clusteringk-means clustering
density of graphs, Modeling Data with Property Graphs
depth-first searches, Breadth-First Search in Web Crawling
dereferencing, Normalizing and counting companies
Dice’s coefficient, Contingency tables and scoring functions
digraphs (directed graphs), Analyzing mutual friendships with directed graphsClosing Remarks, Modeling Data with Property GraphsModeling Data with Property Graphs
dimensionality reduction, Crash Course on Clustering Data
dir Python function, Making GitHub API Requests
directed graphs (digraphs), Analyzing mutual friendships with directed graphsClosing Remarks, Modeling Data with Property GraphsModeling Data with Property Graphs
distributed version control systems, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
document summarization, Document SummarizationAnalysis of Luhn’s summarization algorithm
document-oriented databases (see MongoDB)
dollar sign ($-MongoDB operator), Querying by Date/Time Range
Dorling Cartogram, Visualizing locations with cartogramsMeasuring Similarity
double list comprehension, Extracting Tweet Entities
dynamic programming, Hierarchical clustering

E

edit distance, Measuring Similarity
ego (social networks), Understanding the Social Graph API, Analyzing things your friends “like”Analyzing things your friends “like”, Seeding an Interest Graph
ego graphs, Understanding the Social Graph API, Seeding an Interest GraphSeeding an Interest Graph
email Python package, A Primer on Unix Mailboxes, Converting a Mail Corpus to a Unix Mailbox
end-of-sentence (EOS) detection, Discovering Semantics by Decoding Syntax, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language DataSentence Detection in Human Language Data
Enron corpus
about, Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More, Analyzing the Enron Corpus
advanced queries, Writing Advanced QueriesWriting Advanced Queries
analyzing sender/recipient patterns, Analyzing Patterns in Sender/Recipient CommunicationsAnalyzing Patterns in Sender/Recipient Communications
getting Enron data, Getting the Enron DataGetting the Enron Data
online resources, Online Resources
query by date/time range, Querying by Date/Time RangeQuerying by Date/Time Range
entities
interactions between, Gisting Human Language DataGisting Human Language Data
property graphs representing, Modeling Data with Property GraphsModeling Data with Property Graphs
entities field (tweets), Analyzing the 140 Characters, Solution
entity extraction, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
entity resolution (entity disambiguation), Analyzing this book’s Facebook page
entity-centric analysis, Entity-Centric Analysis: A Paradigm ShiftGisting Human Language Data
envoy Python package, Importing a JSONified Mail Corpus into MongoDB
EOS (end-of-sentence) detection, Discovering Semantics by Decoding Syntax, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language DataSentence Detection in Human Language Data
extracting tweet entities, Extracting Tweet Entities, Problem, Problem, Problem
extraction (NLP), Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift

F

F1 score, Quality of Analytics for Processing Human Language Data
Facebook, Exploring Facebook’s Social Graph API
(see also Social Graph API)
about, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and MoreExploring Facebook’s Social Graph API
analyzing connections, Analyzing Social Graph ConnectionsClosing Remarks
interest graphs and, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Seeding an Interest Graph
online resources, Online Resources
recommended exercises, Recommended Exercises
Facebook accounts, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Exploring Facebook’s Social Graph API
Facebook pages, analyzing, Analyzing Facebook PagesAnalyzing Coke vs Pepsi Facebook pages
Facebook Platform Policies document, Exploring Facebook’s Social Graph API
facebook Python package, Understanding the Social Graph API, Analyzing things your friends “like”
Facebook Query Language (FQL), Exploring Facebook’s Social Graph API, Understanding the Social Graph API
false negatives, Quality of Analytics for Processing Human Language Data
false positives, Quality of Analytics for Processing Human Language Data
favorite_count field (tweets), Analyzing the 140 Characters, Discussion
feedparser Python package, Scraping, Parsing, and Crawling the Web, Sentence Detection in Human Language Data
field expansion feature (Social Graph API), Understanding the Social Graph API
fields
Facebook Social Graph API, Understanding the Social Graph API
Google+ API, Making Google+ API Requests
LinkedIn API, Making LinkedIn API Requests
MongoDB, Searching Emails by Keywords
Twitter API, Analyzing the 140 CharactersAnalyzing the 140 Characters, Discussion
find function (Python), Analyzing Bigrams in Human Language, Programmatically Accessing MongoDB with Python, Analyzing Patterns in Sender/Recipient Communications
Firefox Operator add-on, Geocoordinates: A Common Thread for Just About Anything
folksonomies, Why Is Twitter All the Rage?
following model
GitHub, Extending the Interest Graph with “Follows” Edges for UsersComputational Considerations
interest graphs and, Seeding an Interest Graph
Twitter, Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More, Why Is Twitter All the Rage?, Fundamental Twitter Terminology, Exploring Facebook’s Social Graph API, ProblemDiscussion, Problem
forked projects, Exploring GitHub’s API
forward chaining, Inferencing About an Open World
FQL (Facebook Query Language), Exploring Facebook’s Social Graph API, Understanding the Social Graph API
frequency analysis
document summarization, Document SummarizationAnalysis of Luhn’s summarization algorithm
Facebook data, Analyzing Facebook PagesClosing Remarks
LinkedIn data, Normalizing and counting companiesNormalizing and counting locations
TF-IDF, A Whiz-Bang Introduction to TF-IDFTF-IDF
Twitter data, Analyzing Tweets and Tweet Entities with Frequency AnalysisAnalyzing Tweets and Tweet Entities with Frequency Analysis, Visualizing Frequency Data with HistogramsClosing Remarks, Problem
Zipf’s law, Introducing the Natural Language ToolkitIntroducing the Natural Language Toolkit
friendship graphs, Problem
friendship model
Facebook, Exploring Facebook’s Social Graph API, Understanding the Social Graph API, Examining FriendshipsClosing Remarks
Twitter, Why Is Twitter All the Rage?, ProblemDiscussion, Problem
Friendster social network, Recommended Exercises
functools.partial function, Solution, Discussion
FuXi reasoning system, Inferencing About an Open World
fuzzy matching (see clustering LinkedIn data)

G

geo microformat, Microformats: Easy-to-Implement Metadata, Geocoordinates: A Common Thread for Just About AnythingGeocoordinates: A Common Thread for Just About Anything
geocoding service (Bing), Visualizing locations with cartograms
geocoordinates, Microformats: Easy-to-Implement Metadata, Geocoordinates: A Common Thread for Just About AnythingGeocoordinates: A Common Thread for Just About Anything
GeoJSON, Recommended Exercises
geopy Python package, Normalizing and counting locations
Gephi open source project, Visualizing Interest Graphs
GET search/tweets resource, Searching for TweetsSearching for Tweets
GET statuses/retweets resource, Examining Patterns in Retweets
GET trends/place resource, Exploring Trending Topics
Git version control system, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
GitHub
about, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
following model, Extending the Interest Graph with “Follows” Edges for UsersComputational Considerations
online resources, Online Resources
recommended exercises, Recommended Exercises
social coding, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
GitHub API
about, Exploring GitHub’s API
analyzing interest graphs, Analyzing GitHub Interest GraphsClosing Remarks
creating connections, Creating a GitHub API ConnectionCreating a GitHub API Connection
making requests, Making GitHub API RequestsMaking GitHub API Requests
modeling data with property graphs, Modeling Data with Property GraphsModeling Data with Property Graphs
online resources, Online Resources
recommended exercises, Recommended Exercises
terminology, Exploring GitHub’s API
gitscm.com, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
Gmail
accessing with OAuth, Accessing Your Gmail with OAuthAccessing Your Gmail with OAuth
visualizing patterns in, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome ExtensionVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
GNU Prolog, Open-world versus closed-world assumptions
Google API Console, Making Google+ API Requests
Google Earth, Visualizing geographic clusters with Google EarthVisualizing geographic clusters with Google Earth, Geocoordinates: A Common Thread for Just About Anything
Google Knowledge Graph, Discovering Semantics by Decoding Syntax
Google Maps, Visualizing geographic clusters with Google Earth, Geocoordinates: A Common Thread for Just About Anything
Google Structured Data Testing Tool, Accessing LinkedIn’s 200 Million Online RésumésFrom Semantic Markup to Semantic Web: A Brief Interlude
Google+ accounts, Exploring the Google+ API
Google+ API
about, Exploring the Google+ APIExploring the Google+ API
making requests, Making Google+ API RequestsMaking Google+ API Requests
online resources, Online Resources
querying human data language, Querying Human Language Data with TF-IDFReflections on Analyzing Human Language Data
recommended exercises, Recommended Exercises
terminology, Exploring the Google+ API
TF-IDF and, A Whiz-Bang Introduction to TF-IDFTF-IDF
google-api-python-client package, Making Google+ API Requests
Graph API (Facebook) (see Social Graph API (Facebook))
Graph API Explorer app, Exploring Facebook’s Social Graph API, Understanding the Social Graph APIUnderstanding the Social Graph API
Graph Search project (Facebook), Understanding the Open Graph Protocol
Graph Your Inbox Chrome extension, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome ExtensionVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
GraphAPI class (facebook Python package)
get_connections() method, Analyzing Social Graph Connections
get_object() method, Analyzing Social Graph Connections, Analyzing this book’s Facebook page, Analyzing things your friends “like”
get_objects() method, Analyzing Social Graph Connections
request() method, Analyzing Social Graph Connections
Graphviz, Visualizing Interest Graphs
greedy clustering, Greedy clusteringRuntime analysis

H

hangouts (Google+), Exploring the Google+ API
hashtags (tweets)
about, Fundamental Twitter Terminology, Searching for Tweets
extracting, Extracting Tweet Entities
frequency data in histograms, Visualizing Frequency Data with HistogramsVisualizing Frequency Data with Histograms
lexical diversity of, Computing the Lexical Diversity of Tweets
hCalendar microformat, Microformats: Easy-to-Implement Metadata, Accessing LinkedIn’s 200 Million Online Résumés
hCard microformat, Microformats: Easy-to-Implement Metadata, Accessing LinkedIn’s 200 Million Online Résumés
help Python function, Creating a Twitter API Connection, Making Google+ API Requests, Introducing the Natural Language Toolkit, Making GitHub API Requests
hierarchical clustering, Hierarchical clusteringk-means clustering
HierarchicalClustering Python class, Hierarchical clustering
histograms
frequency data for tweets, Visualizing Frequency Data with HistogramsClosing Remarks
generating with IPython Notebook, Visualizing Frequency Data with HistogramsClosing Remarks
recommended exercises, Recommended Exercises
home timeline (tweets), Fundamental Twitter Terminology
homographs, Discovering Semantics by Decoding Syntax
homonyms, Discovering Semantics by Decoding Syntax
Horrocks, Ian, Open-world versus closed-world assumptions
hRecipe microformat, Microformats: Easy-to-Implement Metadata, Using Recipe Data to Improve Online MatchmakingAccessing LinkedIn’s 200 Million Online Résumés
hResume microformat, Microformats: Easy-to-Implement Metadata, Accessing LinkedIn’s 200 Million Online RésumésFrom Semantic Markup to Semantic Web: A Brief Interlude
hReview microformat, Using Recipe Data to Improve Online MatchmakingAccessing LinkedIn’s 200 Million Online Résumés
hReview-aggregate microformat, Retrieving recipe reviewsAccessing LinkedIn’s 200 Million Online Résumés
HTML format, Scraping, Parsing, and Crawling the Web
HTTP API, Making Google+ API Requests
HTTP requests
Facebook Social Graph API, Understanding the Social Graph API
GitHub API, Creating a GitHub API Connection
requests Python package, Understanding the Social Graph API
Twitter, ProblemProblem
human language data, Quality of Analytics for Processing Human Language Data
(see also NLP)
analyzing bigrams, Analyzing Bigrams in Human LanguageContingency tables and scoring functions
applying TF-IDF to, Applying TF-IDF to Human LanguageApplying TF-IDF to Human Language
chunking, Natural Language Processing Illustrated Step-by-Step
document summarization, Document SummarizationAnalysis of Luhn’s summarization algorithm
end of sentence detection in, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language Data
entity resolution, Analyzing this book’s Facebook page
extraction, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
Facebook example, Analyzing Coke vs Pepsi Facebook pages
finding similar documents, Finding Similar DocumentsAnalyzing Bigrams in Human Language
measuring quality of analytics for, Quality of Analytics for Processing Human Language DataQuality of Analytics for Processing Human Language Data
part of speech assignment, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
querying with TF-IDF, Querying Human Language Data with TF-IDFReflections on Analyzing Human Language Data
reflections on, Reflections on Analyzing Human Language Data
tokenization, Introducing the Natural Language Toolkit, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language DataSentence Detection in Human Language Data
hyperedges, Modeling Data with Property Graphs
hypergraphs, Modeling Data with Property Graphs

I

I/O bound code, Breadth-First Search in Web Crawling
ID field (tweets), Analyzing the 140 Characters
IDF (inverse document frequency), Inverse Document Frequency
IMAP (Internet message access protocol), Analyzing Your Own Mail Data, Fetching and Parsing Email Messages with IMAPFetching and Parsing Email Messages with IMAP
importing mail corpus into MongoDB, Importing a JSONified Mail Corpus into MongoDBThe MongoDB shell
In-Reply-To email header, A Primer on Unix Mailboxes
Indie Web, Microformats: Easy-to-Implement Metadata, Microformats: Easy-to-Implement Metadata
inference, Inferencing About an Open WorldInferencing About an Open World
information retrieval theory
about, A Whiz-Bang Introduction to TF-IDF, Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
additional resources, A Whiz-Bang Introduction to TF-IDF
cosine similarity, Finding Similar DocumentsAnalyzing Bigrams in Human Language
inverse document frequency, Inverse Document Frequency
term frequency, Term FrequencyTerm Frequency
TF-IDF example, TF-IDFTF-IDF
vector space models and, The theory behind vector space models and cosine similarityThe theory behind vector space models and cosine similarity
interactions between entities, Gisting Human Language DataGisting Human Language Data
interest graphs
about, Examining Patterns in Retweets, Overview, Analyzing GitHub Interest Graphs
adding repositories to, Adding more repositories to the interest graphComputational Considerations
centrality measures and, Computing Graph Centrality MeasuresComputing Graph Centrality Measures, Application of centrality measuresApplication of centrality measures
extending for GitHub users, Extending the Interest Graph with “Follows” Edges for UsersComputational Considerations
Facebook and, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Seeding an Interest Graph
nodes as query pivots, Using Nodes as Pivots for More Efficient QueriesUsing Nodes as Pivots for More Efficient Queries
online resources, Online Resources
seeding, Seeding an Interest GraphSeeding an Interest Graph
Twitter and, Examining Patterns in Retweets, Seeding an Interest Graph
visualizing, Visualizing Interest GraphsClosing Remarks
Internet message access protocol (IMAP), Analyzing Your Own Mail Data, Fetching and Parsing Email Messages with IMAPFetching and Parsing Email Messages with IMAP
Internet usage statistics, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
inverse document frequency (IDF), Inverse Document Frequency
io Python package, Discussion

L

Levenshtein distance, Measuring Similarity
lexical diversity of tweets, Computing the Lexical Diversity of TweetsComputing the Lexical Diversity of Tweets, Discussion
likelihood ratio, Contingency tables and scoring functions
likes (Facebook), Understanding the Social Graph API, Analyzing things your friends “like”Analyzing things your friends “like”
LinkedIn
about, Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and MoreOverview
clustering data, Crash Course on Clustering DataVisualizing geographic clusters with Google Earth
hResume micoformat, Accessing LinkedIn’s 200 Million Online RésumésFrom Semantic Markup to Semantic Web: A Brief Interlude
online resources, Online Resources
recommended exercises, Recommended Exercises
LinkedIn API
about, Exploring the LinkedIn API
clustering data, Crash Course on Clustering DataVisualizing geographic clusters with Google Earth
downloading connections as CSV files, Downloading LinkedIn Connections as a CSV File
making requests, Making LinkedIn API RequestsMaking LinkedIn API Requests
online resources, Online Resources
recommended exercises, Recommended Exercises
LinkedInApplication Python class, Making LinkedIn API RequestsMaking LinkedIn API Requests
list comprehensions, Exploring Trending Topics, Extracting Tweet Entities
locations (LinkedIn data)
counting, Normalizing and counting locationsNormalizing and counting locations
KML and, Visualizing geographic clusters with Google Earth
visualizing with cartograms, Visualizing locations with cartogramsMeasuring Similarity
visualizing with Google Earth, Visualizing geographic clusters with Google EarthVisualizing geographic clusters with Google Earth
Luhn’s algorithm, Document Summarization, Analysis of Luhn’s summarization algorithmAnalysis of Luhn’s summarization algorithm

M

mail corpus
analyzing Enron data, Analyzing the Enron CorpusSearching Emails by Keywords
converting to mailbox, Converting a Mail Corpus to a Unix MailboxConverting a Mail Corpus to a Unix Mailbox
getting Enron data, Getting the Enron DataGetting the Enron Data
importing into MongoDB, Importing a JSONified Mail Corpus into MongoDBThe MongoDB shell
programmatically accessing MongoDB, Programmatically Accessing MongoDB with PythonProgrammatically Accessing MongoDB with Python
mailbox Python package, A Primer on Unix Mailboxes
mailboxes
about, A Primer on Unix MailboxesA Primer on Unix Mailboxes
analyzing Enron corpus, Analyzing the Enron CorpusSearching Emails by Keywords
analyzing mail data, Analyzing Your Own Mail DataVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
converting mail corpus to, Converting a Mail Corpus to a Unix MailboxConverting a Mail Corpus to a Unix Mailbox
converting to JSON, Converting Unix Mailboxes to JSONConverting Unix Mailboxes to JSON
online resources, Online Resources
parsing email messages with IMAP, Fetching and Parsing Email Messages with IMAPFetching and Parsing Email Messages with IMAP
processing mail corpus, Obtaining and Processing a Mail CorpusProgrammatically Accessing MongoDB with Python
recommended exercises, Recommended Exercises
searching by keywords, Searching Emails by KeywordsSearching Emails by Keywords
visualizing patterns in Gmail, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome ExtensionVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
visualizing time-series trends, Discovering and Visualizing Time-Series TrendsDiscovering and Visualizing Time-Series Trends
Manning, Christopher, Contingency tables and scoring functions
map function, Programmatically Accessing MongoDB with Python
map-reduce computing paradigm, Programmatically Accessing MongoDB with Python
matplotlib Python package, Visualizing Frequency Data with HistogramsVisualizing Frequency Data with Histograms, Analyzing things your friends “like”
matrix diagrams, Visualizing document similarity with a matrix diagram
maximal clique, Analyzing mutual friendships with directed graphs
maximum clique, Analyzing mutual friendships with directed graphs
mbox (see Unix mailboxes)
Message-ID email header, A Primer on Unix Mailboxes
metadata
email headers, Getting the Enron Data
Google+, Exploring the Google+ API
OGP example, Understanding the Open Graph ProtocolUnderstanding the Open Graph Protocol
RDFa, Understanding the Open Graph Protocol
semantic web, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More
Twitter-related, Fundamental Twitter Terminology
microdata (HTML), Scraping, Parsing, and Crawling the Web, Microformats: Easy-to-Implement Metadata
microform.at service, Geocoordinates: A Common Thread for Just About Anything
microformats
about, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and MoreMicroformats: Easy-to-Implement Metadata
geocoordinates, Microformats: Easy-to-Implement Metadata, Geocoordinates: A Common Thread for Just About AnythingGeocoordinates: A Common Thread for Just About Anything
hResume, Accessing LinkedIn’s 200 Million Online RésumésFrom Semantic Markup to Semantic Web: A Brief Interlude
list of popular, Microformats: Easy-to-Implement Metadata
online matchmaking, Using Recipe Data to Improve Online MatchmakingAccessing LinkedIn’s 200 Million Online Résumés
recommended exercises, Recommended Exercises
minimum spanning tree, Computational Considerations
modeling data with property graphs, Modeling Data with Property GraphsModeling Data with Property Graphs
moments (Google+), Exploring the Google+ API
MongoDB
$addToSet operator, Writing Advanced Queries, Writing Advanced Queries
advanced queries, Writing Advanced QueriesWriting Advanced Queries
analyzing sender/recipient patterns, Analyzing Patterns in Sender/Recipient CommunicationsAnalyzing Patterns in Sender/Recipient Communications
ensureIndex command, Searching Emails by Keywords
find Python function, Programmatically Accessing MongoDB with Python, Analyzing Patterns in Sender/Recipient Communications
$group operator, Writing Advanced Queries, Writing Advanced Queries
$gt operator, Discovering and Visualizing Time-Series Trends
importing JSON mailbox data into, Converting Unix Mailboxes to JSON
importing mail corpus into, Importing a JSONified Mail Corpus into MongoDBThe MongoDB shell
$in operator, Analyzing Patterns in Sender/Recipient Communications, Writing Advanced Queries
JSON and, Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More, Solution
$lt operator, Discovering and Visualizing Time-Series Trends
$match operator, Writing Advanced Queries
online resources, Online Resources
programmatically accessing, Programmatically Accessing MongoDB with PythonProgrammatically Accessing MongoDB with Python
querying by date/time range, Querying by Date/Time RangeQuerying by Date/Time Range
recommended exercises, Recommended Exercises
searching emails by keywords, Searching Emails by KeywordsSearching Emails by Keywords
$sum function, Discovering and Visualizing Time-Series Trends
time-series trends, Discovering and Visualizing Time-Series TrendsDiscovering and Visualizing Time-Series Trends, Discussion
$unwind operator, Writing Advanced Queries
MongoDB shell, The MongoDB shellThe MongoDB shell, Searching Emails by Keywords
mongoimport MongoDB command, Importing a JSONified Mail Corpus into MongoDB, Importing a JSONified Mail Corpus into MongoDB
mutualfriends API (Facebook), Analyzing mutual friendships with directed graphsClosing Remarks

N

n-gram similarity, Measuring Similarity, Analyzing Bigrams in Human Language
n-squared problems, Crash Course on Clustering Data
N3 (Notation3), Inferencing About an Open World
named entity recognition, Entity-Centric Analysis: A Paradigm Shift
natural language processing (see NLP)
Natural Language Toolkit (see NLTK)
nested list comprehension, Extracting Tweet Entities
NetworkX Python package
about, Analyzing mutual friendships with directed graphsClosing Remarks, Modeling Data with Property Graphs, Modeling Data with Property Graphs
add_edge method, Modeling Data with Property Graphs, Seeding an Interest Graph
add_node method, Seeding an Interest Graph
betweenness_centrality function, Computing Graph Centrality Measures
clique detection, Using Nodes as Pivots for More Efficient Queries
closeness_centrality function, Computing Graph Centrality Measures
degree_centrality function, Computing Graph Centrality Measures
DiGraph class, Computing Graph Centrality Measures
find_cliques method, Analyzing mutual friendships with directed graphs
Graph class, Computing Graph Centrality Measures
recommended exercises, Recommended Exercises, Recommended Exercises
NLP (natural language processing), A Whiz-Bang Introduction to TF-IDF
(see also human language data)
about, A Whiz-Bang Introduction to TF-IDF, Discovering Semantics by Decoding Syntax
additional resources, Contingency tables and scoring functions
document summarization, Document SummarizationAnalysis of Luhn’s summarization algorithm
sentence detection, Sentence Detection in Human Language DataSentence Detection in Human Language Data
step-by-step illustration, Natural Language Processing Illustrated Step-by-StepNatural Language Processing Illustrated Step-by-Step
NLTK (Natural Language Toolkit)
about, Introducing the Natural Language ToolkitIntroducing the Natural Language Toolkit
additional resources, Mining Google+: Computing Document Similarity, Extracting Collocations, and More, Natural Language Processing Illustrated Step-by-Step
chunking, Natural Language Processing Illustrated Step-by-Step
computing bigrams and collocations for sentences, Analyzing Bigrams in Human LanguageAnalyzing Bigrams in Human Language
EOS detection, Natural Language Processing Illustrated Step-by-Step
extraction, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
measuring similarity, Measuring SimilarityMeasuring Similarity
POS tagging, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
stopword lists, Inverse Document Frequency
tokenization, Introducing the Natural Language Toolkit, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language DataSentence Detection in Human Language Data
nltk Python package
batch_ne_chunk function, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
clean_html function, Making Google+ API Requests
collocations function, Analyzing Bigrams in Human Language
concordance method, Introducing the Natural Language Toolkit
cosine_distance function, Clustering posts with cosine similarity
demo function, Introducing the Natural Language Toolkit
download function, Measuring Similarity
edit_distance function, Measuring Similarity
FreqDist class, Measuring Similarity, Making GitHub API Requests
jaccard_distance function, Measuring Similarity
sent_tokenize method, Sentence Detection in Human Language Data, Sentence Detection in Human Language Data
word_tokenize method, Sentence Detection in Human Language Data, Sentence Detection in Human Language Data
node IDs (Social Graph API), Understanding the Social Graph API
Node.js platform, Using Recipe Data to Improve Online Matchmaking
nodes
betweenness centrality, Computing Graph Centrality Measures
closeness centrality, Computing Graph Centrality Measures
degree centrality, Computing Graph Centrality Measures
as query pivots, Using Nodes as Pivots for More Efficient QueriesUsing Nodes as Pivots for More Efficient Queries
normal distribution, Contingency tables and scoring functions
normalizing LinkedIn data
about, Crash Course on Clustering Data, Normalizing Data to Enable Analysis
counting companies, Normalizing and counting companiesNormalizing and counting companies
counting job titles, Normalizing and counting job titlesNormalizing and counting job titles
counting locations, Normalizing and counting locationsNormalizing and counting locations
visualizing locations with cartograms, Visualizing locations with cartogramsMeasuring Similarity
Norvig, Peter, Inferencing About an Open World
NoSQL databases, Modeling Data with Property Graphs
Notation3 (N3), Inferencing About an Open World
NP-complete problems, Analyzing mutual friendships with directed graphsClosing Remarks
numpy Python package, Document Summarization

P

parsing
email messages with IMAP, Fetching and Parsing Email Messages with IMAPFetching and Parsing Email Messages with IMAP
feeds, Scraping, Parsing, and Crawling the WebScraping, Parsing, and Crawling the Web, Sentence Detection in Human Language DataSentence Detection in Human Language Data
part-of-speech (POS) tagging, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
Patel-Schneider, Peter, Open-world versus closed-world assumptions
patterns
in retweets, Examining Patterns in RetweetsExamining Patterns in Retweets, ProblemProblem
in sender/recipient communications, Analyzing Patterns in Sender/Recipient CommunicationsAnalyzing Patterns in Sender/Recipient Communications
visualizing in Gmail, Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome ExtensionVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension
PaySwarm, Recommended Exercises
Pearson’s chi-square test, Contingency tables and scoring functions
Penn Treebank Project, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
people (Google+), Exploring the Google+ API, Making Google+ API RequestsMaking Google+ API Requests
People API (Google+), Making Google+ API Requests
personal API access token (OAuth), Creating a GitHub API Connection
pip instal command
google-api-python-client Python package, Making Google+ API Requests
pip install command
beautifulsoup Python package, Making Google+ API Requests
cluster Python package, Hierarchical clustering
envoy Python package, Importing a JSONified Mail Corpus into MongoDB
facebook-sdk Python package, Analyzing Social Graph Connections
feedparser Python package, Scraping, Parsing, and Crawling the Web
geopy Python package, Normalizing and counting locations
networkx Python package, Analyzing mutual friendships with directed graphs, Modeling Data with Property Graphs
nltk Python package, Measuring Similarity
numpy Python package, Document Summarization
oauth2 Python package, Accessing Your Gmail with OAuth
prettytable Python package, Analyzing things your friends “like”, Making LinkedIn API Requests, Discussion
PyGithub Python package, Creating a GitHub API Connection
pymongo Python package, Importing a JSONified Mail Corpus into MongoDB, Programmatically Accessing MongoDB with Python
python-boilerpipe package, Scraping, Parsing, and Crawling the Web
python-linkedin Python package, Making LinkedIn API Requests
python_dateutil Python package, Converting a Mail Corpus to a Unix Mailbox
requests Python package, Understanding the Social Graph API, Creating a GitHub API Connection
twitter Python package, Creating a Twitter API Connection, Twitter Cookbook
twitter-text-py Python package, Discussion
places (Twitter), Fundamental Twitter Terminology, Discussion
PMI (Pointwise Mutual Information), Contingency tables and scoring functions
Pointwise Mutual Information (PMI), Contingency tables and scoring functions
POS (part-of-speech) tagging, Natural Language Processing Illustrated Step-by-Step, Entity-Centric Analysis: A Paradigm Shift
prettytable Python package, Analyzing things your friends “like”, Making LinkedIn API Requests, Discovering and Visualizing Time-Series Trends, Solution
privacy controls
Facebook and, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More, Exploring Facebook’s Social Graph API, Analyzing things your friends “like”
LinkedIn and, Making LinkedIn API Requests
projects (GitHub), Exploring GitHub’s API
Prolog programming language, Open-world versus closed-world assumptions
property graphs, modeling data with, Modeling Data with Property GraphsModeling Data with Property Graphs
public firehose (tweets), Fundamental Twitter Terminology
public streams API, Fundamental Twitter Terminology
pull requests (Git), Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
PunktSentenceTokenizer Python class, Sentence Detection in Human Language Data
PunktWordTokenizer Python class, Sentence Detection in Human Language Data
PuTTY (Windows SSH client), The MongoDB shell
pydoc Python package, Creating a Twitter API Connection, Making Google+ API Requests, Sentence Detection in Human Language Data, Making GitHub API Requests
PyGithub Python package, Creating a GitHub API Connection, Making GitHub API RequestsMaking GitHub API Requests, Adding more repositories to the interest graph
PyLab, Visualizing Frequency Data with Histograms, Analyzing things your friends “like”
pymongo Python package, Importing a JSONified Mail Corpus into MongoDB, Programmatically Accessing MongoDB with PythonProgrammatically Accessing MongoDB with Python, Searching Emails by Keywords
python-boilerpipe Python package, Scraping, Parsing, and Crawling the Web
python-oauth2 Python package, Accessing Your Gmail with OAuth
PYTHONPATH environment variable, Creating a Twitter API Connection

R

rate limits
Facebook Social Graph API, Analyzing Social Graph Connections
GitHub API, Creating a GitHub API Connection, Extending the Interest Graph with “Follows” Edges for Users
LinkedIn API, Making LinkedIn API Requests
Twitter API, Exploring Trending Topics
raw frequency, Contingency tables and scoring functions
RDF (Resource Description Framework), Man Cannot Live on Facts AloneInferencing About an Open World
RDF Schema language, Modeling Data with Property Graphs, Inferencing About an Open World
RDFa
about, Microformats: Easy-to-Implement Metadata
metadata and, Understanding the Open Graph Protocol
web scraping and, Scraping, Parsing, and Crawling the Web
re Python package, Visualizing locations with cartograms
Really Simple Syndication (RSS), Scraping, Parsing, and Crawling the Web
reduce function, Programmatically Accessing MongoDB with Python
References email header, A Primer on Unix Mailboxes
regular expressions, Visualizing locations with cartograms, Discovering Semantics by Decoding Syntax, Converting a Mail Corpus to a Unix Mailbox, Discussion
RelMeAuth Indie Web initiative, Microformats: Easy-to-Implement Metadata, Recommended Exercises
repositories, adding to interest graphs, Adding more repositories to the interest graphComputational Considerations
requests Python package, Understanding the Social Graph API, Making GitHub API Requests
Resource Description Framework (RDF), Man Cannot Live on Facts AloneInferencing About an Open World
RESTful API, Creating a Twitter API Connection, Exploring Trending Topics
retweeted field (tweets), Analyzing the 140 Characters, Discussion
retweeted_status field (tweets), Analyzing the 140 Characters, Examining Patterns in Retweets
retweets
extracting attribution, Problem
frequency data in histograms for, Visualizing Frequency Data with Histograms
patterns in, Examining Patterns in RetweetsExamining Patterns in Retweets, ProblemProblem
retweet_count field (tweets), Analyzing the 140 Characters, Examining Patterns in Retweets, Solution, Discussion
RFC 822, Fetching and Parsing Email Messages with IMAP
RFC 2045, Converting Unix Mailboxes to JSON, Online Resources
RFC 3501, Fetching and Parsing Email Messages with IMAP
RFC 5849, Overview
RFC 6749, Overview
Riak database, Programmatically Accessing MongoDB with Python
RIAs (rich internet applications), The Semantic Web: An Evolutionary Revolution
RSS (Really Simple Syndication), Scraping, Parsing, and Crawling the Web
Russell, Stuart, Inferencing About an Open World

S

schema.org site, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More, Microformats: Easy-to-Implement Metadata
Schütze, Hinrich, Contingency tables and scoring functions
scoring functions, Analyzing Bigrams in Human LanguageContingency tables and scoring functions
Scrapy Python framework, Recommended Exercises, Scraping, Parsing, and Crawling the Web
screen names (Twitter)
extracting from tweets, Extracting Tweet Entities
frequency data for tweets with histograms, Visualizing Frequency Data with HistogramsVisualizing Frequency Data with Histograms
lexical diversity of, Computing the Lexical Diversity of Tweets
Search API, Making LinkedIn API Requests, Solution
searching
bounded breadth-first, Breadth-First Search in Web Crawling
breadth-first, Breadth-First Search in Web CrawlingBreadth-First Search in Web Crawling
depth-first, Breadth-First Search in Web Crawling
email by keywords, Searching Emails by KeywordsSearching Emails by Keywords
Facebook Graph Search project, Understanding the Open Graph Protocol
Google+ data, Exploring the Google+ APIMaking Google+ API Requests
LinkedIn data, Making LinkedIn API Requests, Clustering Enhances User Experiences
for tweets, Fundamental Twitter Terminology, Searching for TweetsAnalyzing the 140 Characters, Problem, Problem
secret key (OAuth), Making LinkedIn API Requests
seeding interest graphs, Seeding an Interest Graph
semantic web
about, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More
as evolutionary revolution, The Semantic Web: An Evolutionary RevolutionInferencing About an Open World
microformats, Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and MoreFrom Semantic Markup to Semantic Web: A Brief Interlude
online resources, Online Resources
recommended exercises, Recommended Exercises
technologies supporting, Scraping, Parsing, and Crawling the Web, Modeling Data with Property Graphs
transitioning to, From Semantic Markup to Semantic Web: A Brief Interlude
semantic web stack, Modeling Data with Property Graphs
setwise operations
about, Exploring Trending Topics
difference, Analyzing Patterns in Sender/Recipient Communications, Solution
intersection, Analyzing things your friends “like”, Measuring Similarity, Analyzing Patterns in Sender/Recipient Communications, Solution
union, Analyzing Patterns in Sender/Recipient Communications
similarity
cosine, Finding Similar DocumentsAnalyzing Bigrams in Human Language
measuring in LinkedIn data, Crash Course on Clustering Data, Measuring SimilarityMeasuring Similarity
slicing technique, Extracting Tweet Entities
Snowball stemmer, Recommended Exercises
social coding, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
Social Graph API (Facebook)
about, Exploring Facebook’s Social Graph APIUnderstanding the Social Graph API
analyzing connections, Analyzing Social Graph ConnectionsAnalyzing Social Graph Connections
analyzing Facebook pages, Analyzing Facebook PagesAnalyzing Coke vs Pepsi Facebook pages
examining friendships, Examining FriendshipsClosing Remarks
field expansion feature, Understanding the Social Graph API
online resources, Online Resources
Open Graph protocol and, Understanding the Open Graph ProtocolUnderstanding the Open Graph Protocol
rate limits, Analyzing Social Graph Connections
recommended exercises, Recommended Exercises
XFN and, Microformats: Easy-to-Implement Metadata
social graphs, Seeding an Interest Graph
social interest graphs (see interest graphs)
SPARQL language, Modeling Data with Property Graphs
SSH client, The MongoDB shell
stargazing (GitHub), Exploring GitHub’s API, Making GitHub API Requests, Seeding an Interest GraphSeeding an Interest Graph
statistics, Internet usage, Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
stopwords
about, Term Frequency, Introducing the Natural Language Toolkit
lists of, Inverse Document Frequency, Analysis of Luhn’s summarization algorithm
Streaming API (Twitter), Problem
Strunk, Jan, Sentence Detection in Human Language Data
Student’s t-score, Contingency tables and scoring functions
subject-verb-object form, Gisting Human Language Data
Subversion version control system, Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
supernodes, Adding more repositories to the interest graph, Discussion
supervised learning, Scraping, Parsing, and Crawling the Web, Quality of Analytics for Processing Human Language Data
syllogisms, Inferencing About an Open World

T

tag clouds, Entity-Centric Analysis: A Paradigm Shift, Recommended Exercises
taxonomies, Why Is Twitter All the Rage?
term frequency (TF), Term FrequencyTerm Frequency
Term Frequency–Inverse Document Frequency (see TF-IDF)
text field (tweets), Analyzing the 140 Characters
TF (term frequency), Term FrequencyTerm Frequency
TF-IDF (Term Frequency–Inverse Document Frequency)
about, Overview, A Whiz-Bang Introduction to TF-IDF
applying to human language, Applying TF-IDF to Human LanguageApplying TF-IDF to Human Language
finding similar documents, Finding Similar DocumentsAnalyzing Bigrams in Human Language
inverse document frequency, Inverse Document Frequency
querying human language data with, Querying Human Language Data with TF-IDFReflections on Analyzing Human Language Data
running on sample data, TF-IDFTF-IDF
term frequency, Term FrequencyTerm Frequency
thread pool, Breadth-First Search in Web Crawling, Extending the Interest Graph with “Follows” Edges for Users
time-series trends, Discovering and Visualizing Time-Series TrendsDiscovering and Visualizing Time-Series Trends, Problem
time.sleep Python function, Solution
timelines (Twitter), Fundamental Twitter TerminologyFundamental Twitter Terminology, Solution
timestamps, A Primer on Unix Mailboxes
Titan big graph database, Modeling Data with Property Graphs
tokenization, Introducing the Natural Language Toolkit, Natural Language Processing Illustrated Step-by-Step, Sentence Detection in Human Language DataSentence Detection in Human Language Data
Travelling Salesman probems, Visualizing geographic clusters with Google Earth
TreebankWordTokenizer Python class, Sentence Detection in Human Language Data
trends (Twitter), Exploring Trending TopicsExploring Trending Topics, Problem
TrigramAssociationMeasures Python class, Measuring Similarity
trigrams, Measuring Similarity
true error, Quality of Analytics for Processing Human Language Data
true negatives, Quality of Analytics for Processing Human Language Data
true positives, Quality of Analytics for Processing Human Language Data
Turing Test, Discovering Semantics by Decoding Syntax
tweet entities
analyzing, Analyzing the 140 CharactersAnalyzing the 140 Characters, Analyzing Tweets and Tweet Entities with Frequency AnalysisAnalyzing Tweets and Tweet Entities with Frequency Analysis
composition of, Fundamental Twitter Terminology
extracting, Extracting Tweet Entities, Problem, Problem, Problem
finding most popular, Problem
searching for, Fundamental Twitter Terminology, Searching for TweetsAnalyzing the 140 Characters
TweetDeck, Fundamental Twitter Terminology
tweets
about, Fundamental Twitter TerminologyFundamental Twitter Terminology
analyzing, Analyzing the 140 CharactersAnalyzing the 140 Characters, Analyzing Tweets and Tweet Entities with Frequency AnalysisAnalyzing Tweets and Tweet Entities with Frequency Analysis, Problem
composition of, Fundamental Twitter Terminology
finding most popular, Problem
harvesting, Problem
lexical diversity of, Computing the Lexical Diversity of TweetsComputing the Lexical Diversity of Tweets, Discussion
quoting, Examining Patterns in Retweets
retweeting, Examining Patterns in RetweetsExamining Patterns in Retweets, Visualizing Frequency Data with Histograms, ProblemProblem
searching for, Fundamental Twitter Terminology, Searching for TweetsAnalyzing the 140 Characters, Problem, Problem
timelines and, Fundamental Twitter TerminologyFundamental Twitter Terminology, Solution
Twitter
about, Why Is Twitter All the Rage?Why Is Twitter All the Rage?
fundamental terminology, Fundamental Twitter TerminologyFundamental Twitter Terminology
interest graphs and, Examining Patterns in Retweets, Seeding an Interest Graph
recommended exercises, Recommended Exercises
Twitter accounts
creating, Fundamental Twitter Terminology
governance of, Why Is Twitter All the Rage?
logging into, Fundamental Twitter Terminology
recommended exercises, Recommended Exercises, Recommended Exercises
resolving user profile information, Problem
Twitter API
accessing for development purposes, ProblemDiscussion
collecting time-series data, Problem
convenient function calls, Problem
creating connections, Creating a Twitter API ConnectionCreating a Twitter API Connection
fundamental terminology, Fundamental Twitter TerminologyFundamental Twitter Terminology
making robust requests, ProblemProblem
online resources, Online Resources, Online Resources
rate limits, Exploring Trending Topics
recommended exercises, Recommended Exercises, Recommended Exercises
sampling public data, Problem
saving and restoring JSON data with text files, ProblemDiscussion
searching for tweets, Searching for TweetsAnalyzing the 140 Characters, Problem, Problem
trending topics, Exploring Trending TopicsExploring Trending Topics, Problem
Twitter platform objects
about, Fundamental Twitter TerminologyFundamental Twitter Terminology
analyzing tweets, Analyzing the 140 CharactersClosing Remarks
searching for tweets, Searching for TweetsAnalyzing the 140 Characters, Problem, Problem
Twitter Python class, Exploring Trending Topics
twitter Python package, Creating a Twitter API Connection, Twitter Cookbook
twitter_text Python package, Solution
Twurl tool (Twitter API), Fundamental Twitter Terminology

Y

Yahoo! GeoPlanet, Exploring Trending Topics

Get Mining the Social Web, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.