O'Reilly logo

Mining the Social Web by Matthew A. Russell

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet

Tweet and RT were sitting on a fence. Tweet fell off. Who was left?

In this chapter, we’ll largely use CouchDB’s map/reduce capabilities to exploit the entities in tweets (@mentions, #hashtags, etc.) to try to answer the question, “What’s everyone talking about?” With overall throughput now far exceeding 50 million tweets per day and occasional peak velocities in excess of 3,000 tweets per second, there’s vast potential in mining tweet content, and this is the chapter where we’ll finally dig in. Whereas the previous chapter primarily focused on the social graph linkages that exist among friends and followers, this chapter focuses on learning as much as possible about Twitterers by inspecting the entities that appear in their tweets. You’ll also see ties back to Redis for accessing user data you have harvested from Chapter 4 and NetworkX for graph analytics. So many tweets, so little time to mine them—let’s get started!

Note

It is highly recommended that you read Chapters 3 and 4 before reading this chapter. Much of its discussion builds upon the foundation those chapters established, including Redis and CouchDB, which are again used in this chapter.

Pen : Sword :: Tweet : Machine Gun (?!?)

If the pen is mightier than the sword, what does that say about the tweet? There are a number of interesting incidents in which Twitter has saved lives, one of the most notorious being James Karl Buck’s famous “Arrested” tweet ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required