Chapter 3. The Application of Data: Products and Processes

How the Library of Congress is building the Twitter archive

Checking in on the Library of Congress’ Twitter archive, one year later.

by Audrey Watters

In April 2010, Twitter announced it was donating its entire archive of public tweets to the Library of Congress. Every tweet since Twitter’s inception in 2006 would be preserved. The donation of the archive to the Library of Congress may have been in part a symbolic act, a recognition of the cultural significance of Twitter. Although several important historical moments had already been captured on Twitter when the announcement was made last year (the first tweet from space, for example, Barack Obama’s first tweet as President, or news of Michael Jackson’s death), since then our awareness of the significance of the communication channel has certainly grown.

That’s led to a flood of inquiries to the Library of Congress about how and when researchers will be able to gain access to the Twitter archive. These research requests were perhaps heightened by some of the changes that Twitter has made to its API and firehose access.

But creating a Twitter archive is a major undertaking for the Library of Congress, and the process isn’t as simple as merely cracking open a file for researchers to peruse. I spoke with Martha Anderson, the head of the library’s National Digital Information Infrastructure and Preservation Program (NDIIP), and Leslie Johnston, the manager of the ...

Get Big Data Now: Current Perspectives from O'Reilly Radar now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.