Posted on by & filed under Content - Highlights and Reviews, Information Technology, Learning & Development.

One week after details of NSA’s PRISM program were made public, revealing the government’s direct access to the data of social media giants like Google and Facebook, a parody account on Twitter posted this status update:


This post isn’t about the government accessing our social media data; at least, not in any way that differs from anybody else accessing it. Furthermore, the NSA isn’t a very good yardstick for understanding the computational power and data access available to marketers, potential employers, exes, or stalkers.

The general concerns raised by the PRISM controversy, however, are reasonable, and should in fact be applied much more broadly. Large amounts of our personal information is available to many people, and the ability to computationally infer additional user attributes from it is widely available.

When it comes to social media, “security” is no longer about passwords and encryption. Users are supplying a lot of data, and that data is going to make it into the hands of people they never intended to see it. It might happen through poor privacy settings, through a social media website selling the data, or through porous APIs that provide access to information, even when a user has cranked up their privacy settings to prevent access.

The true challenge is in communicating the vastness of what can be done with the information users have freely chosen to share online. Someone may have no problem with the world knowing every explicit like, comment, and shared link he has posted on Facebook. On the other hand, he may feel his privacy has been seriously invaded if algorithms leverage that same information to infer he is a heavy drinker who does not attend church – particularly if that information is used as part of a decision making process.

Computationally, there are two challenges raised here: first, how do we compute the value of certain shared information in terms of the predictive power it provides, and second, how do we communicate risk to the user.

Inferring User Traits from Social Media

I began my research career by studying ways to infer trust between strangers in social networks. Other work on social media relationships has looked at inferring tie strength. Eventually, I began to investigate how to infer information about individuals, not just their relationships. With students and colleagues, we developed a range of techniques for inferring personality traits from Facebook and Twitter accounts, thus predicting users’ political leanings.

A recent study published in the Proceedings of the National Academy of Sciences carried this further, showing that a wide range of attributes—from intelligence to drug use, from race to religion, and from gender to whether someone’s parents were still married when she turned 21—could be predicted using only Facebook “Likes.”

These types of inferences, from seemingly useless information (e.g. liking the page for “Curly Fries” on Facebook was one of the strongest indicators of high intelligence) are the real story of big data right now. We have seen the implications of this in other settings when companies have access to a large volume of customer’s purchasing data. The difference with social media is that so much information leaks out, whether through users neglecting to change the ridiculously public default Facebook privacy settings, or through apps and social media APIs. It is not just the companies to whom we supply our data that can use it for whatever purpose they see fit. With the right tools, anyone can start collecting data and building these models.

Communicating Risk

How do we communicate to users, both what information of theirs is accessible, and how it can be used?  I see this as an important research challenge that falls in the overlap between human-computer interaction and cybersecurity. In my lab, we have projects underway to study this. However, that is not to say there are not some interesting existing methods to communicate this information to users.

My favorite, which I often use in presentations and which is very effective, is Take This Lollipop. It is part experimental art piece, part public service message about oversharing, and you should go try it right now.

I confess that I was shocked when I first watched it because my privacy settings on Facebook forbid app access to my profile and limited access to my data to a small list of people. I consider myself highly skilled at using social media privacy setting, so when this video showed everything on my profile – including posts by friends, comments, etc. – I was surprised. It shows just how little those privacy settings are used when third parties are accessing your data.

Its creepy setting and threatening message are effective ways to scare people about the volume of information they share, but the task of informing people about exactly what is shared and how it can be used is a much bigger problem. If I lost interest in being a professor (and lost some of my moral hang-ups in the process), I might consider starting a business where I aggregated as much profile information as possible, developed the best models for inferring sensitive personal attributes about people, and sold access to my reports to potential employers, mortgage and rental companies, religious institutions, credit bureaus, and, sure, if they needed it, government agencies around the world.

So how do we show people that certain combinations of data can reveal much more than intended? ZIP code, gender, and birth date are often enough to uniquely identify a person. Certainly, particular combinations of “Likes,” shares, and comments are especially useful in identifying a person’s sexual orientation, political preferences, or income level.  And what about the social network implications? Homophily tells us that people are friends with people like them. Even if a user is careful with her own data sharing, her friends’ sharing may reveal much about her as well.

I can’t end this post with a prescription for solving these problems. Rather, I hope to have laid out an interesting HCI and algorithmic problem that surrounds sharing on social media. We share a lot, and it turns out that a lot more people are reading our Facebook posts than we suspect. They can do a lot more with those posts than most people ever imagined, too. But before we can make informed choices about using systems and about related policy, people need to understand the implications of their actions. That will require smart computing and well-designed interfaces to help users decide what to share and inform them of the risks.

Analyzing the Social Web Analyzing the Social Web provides a framework for the analysis of public data currently available and being generated by social networks and social media, like Facebook, Twitter, and Foursquare. Access and analysis of this public data about people and their connections to one another allows for new applications of traditional social network analysis techniques that let us identify things like who are the most important or influential people in a network, how things will spread through the network, and the nature of peoples’ relationships. Analyzing the Social Web introduces you to these techniques, shows you their application to many different types of social media, and discusses how social media can be used as a tool for interacting with the online public.
Data Mining in Dynamic Social Networks and Fuzzy Systems brings together research on the latest trends and patterns of data mining tools and techniques in dynamic social networks and fuzzy systems. With these improved modern techniques of data mining, this publication aims to provide insight and support to researchers and professionals concerned with the management of expertise, knowledge, information, and organizational development.
This synthesis lecture provides a survey of work on privacy in online social networks (OSNs). This work encompasses concerns of users as well as service providers and third parties. Our goal is to approach such concerns from a computer-science perspective, and building upon existing work on privacy, security, statistical modeling and databases to provide an overview of the technical and algorithmic issues related to privacy in OSNs.

About the author

Jen Golbeck Jen Golbeck studies social networks. She is the author of “Analyzing the Social Web” and works on developing ways to compute with social networks and integrate that information into applications to improve the user experience. She is director of the Human-Computer Interaction lab at the University of Maryland, College Park and an associate professor in the College of Information studies.

Tags: facebook, prism, privacy, security, social media, twitter,

Comments are closed.