O'Reilly logo

Clean Data by Megan Squire

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Step seven – separate user mentions, hashtags, and URLs

Another problem with this data right now is that there are lots of interesting pieces of information hidden inside the tweet_text column, for example, consider all the times that a person directs a tweet to the attention of another person using the @ symbol before their username. This is called a mention on Twitter. It might be interesting to count how many times a particular person is mentioned or how many times they are mentioned in conjunction with a particular keyword. Another interesting piece of data hidden in some of the tweets is hashtags; for example, the tweet with ID 2165 discusses the concepts of jobs and babysitting using the #jobs and #sittercity hashtags.

This same tweet also ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required