Case study - AlphaGo tweets analytics
Now that we have a good understanding of GraphX, let's apply our newly gained knowledge to analyze a retweet network. Like any big data project, the first task is to define a pipeline, figure out the data elements, the source, transformations, mapping, and processing.
Data pipeline
For this case study, I collected Twitter data pertaining to the AlphaGo project:
While the full mechanics of data collection from Twitter is out of scope, I will quickly mention the main steps:
- Using Python and the tweepy framework, you can download the tweets mentioning the hashtag #alphago. Initially, pull all the tweets that Twitter ...
Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.