Chapter 4

Collecting and Managing Social Media Data

Analyzing social media first requires collecting and managing enormous amounts of social media and other types of data. The widespread use of social media globally has produced petabytes of data, most of which is not relevant to your purpose and analysis. The most important part of conducting social media analysis is adequately and intelligently finding and manipulating relevant data without becoming overwhelmed. To this end, this chapter explains what constitutes social media and related data; details the process to determine your data needs; and describes how to collect the data, filter the data, and store and manage the data. The chapter also discusses the benefits and drawbacks of building your own data management system and buying a commercially available one. We do not expect you to have the technical acumen to actually build the data collection apparatus. However, knowing the technical concepts behind data collection will greatly inform your analysis and expectations, and help you select and use the appropriate data collection technologies.

Understanding Social Media Data

Social media data is all the user-generated content and corresponding metadata on social media platforms.

User-generated content includes the pictures on Facebook, the diaries on Qzone, the videos on YouTube, the tweets on Twitter, and much more. Most of it is unstructured, which means that for the most part it does not follow predefined rules. A 15-year-old ...

Get Using Social Media for Global Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.