At this point, we are ready to begin cleaning the JSON file, extracting the details of each tweet that we want to keep in our long-term storage.
Since our motivating question only asks about URLs, we really only need to extract those, along with the tweet IDs. However, for the sake of practice in cleaning, and so that we can compare this exercise to what we did earlier in Chapter 7, RDBMS Cleaning Techniques, with the
sentiment140 data set, let's design a small set of database tables as follows:
tweettable, which only holds information about the tweets
hashtagtable, which holds information about which tweets referenced which hashtags
URLtable, which holds information about which tweets referenced ...