Founded in 2002, Last.fm is an Internet radio and music community website that offers many services to its users, such as free music streams and downloads, music and event recommendations, personalized charts, and much more. There are about 25 million people who use Last.fm every month, generating huge amounts of data that need to be processed. One example of this is users transmitting information indicating which songs they are listening to (this is known as “scrobbling”). This data is processed and stored by Last.fm, so the user can access it directly (in the form of charts), and it is also used to make decisions about users’ musical tastes and compatibility, and artist and track similarity.
As Last.fm’s service developed and the number of users grew from thousands to millions, storing, processing and managing all the incoming data became increasingly challenging. Fortunately, Hadoop was quickly becoming stable enough and was enthusiastically adopted as it became clear how many problems it solved. It was first used at Last.fm in early 2006 and was put into production a few months later. There were several reasons for adopting Hadoop at Last.fm:
The distributed filesystem provided redundant backups for the data stored on it (e.g., web logs, user listening data) at no extra cost.
Scalability was simplified through the ability to add cheap, commodity hardware when required.
The cost was ...