Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by Sam Millman, an active participant on both the official MongoDB user group at mongodb-user and on Stackoverflow MongoDB.

The aggregation framework came with the release of MongoDB 2.2 (stable). The aggregation framework is designed to be MongoDB’s own replacement to the SQL aggregation framework, and provides many of the same features, with the exclusion of JOINs.

It is recommended that anyone starting out with the MongoDB aggregation framework start by looking at the SQL mapping chart, which will show numerous examples of how queries in SQL are handled in the aggregation framework. Read Chapter 7. Aggregation in MongoDB: The Definitive Guide, 2nd Edition to learn more about the aggregation framework.

The Why of Aggregation?

Aggregation within any database is extremely important. Users of SQL will understand this. Imagine if you have one table of users and one table of clicks. Let’s say you wish to aggregate the clicks per user and groups on certain pages where those clicks occurred.

You could, of course, perform actions on the client side in this scenario, but you would not get the benefit of speed from the database’s own native programming (which might be better than what you have). In addition, you would need to pull all of the records needed for aggregation, even if you didn’t intend to use them all. Pulling out those records can result in a huge amount of bandwidth usage and traffic on the “wire,” preventing other users from filling requests.

So, data aggregation within the database is extremely important, yet it was one of the vital features missing from MongoDB. Now with MongoDB 2.2 and going forward, we can take advantage of the aggregation framework.

Matching

The $match operator is very important for any aggregation query. It is essentially the WHERE in any SQL query and takes the same shape as the find and findOne functions within the normal MongoDB shell. As examples, consider these $match pipelines:

The first $match is used to match only where name is sammaye, while the second $match is used to match where name is either sammaye or john.

Grouping

Grouping is probably the most commonly used base function when aggregating. MongoDB supports quite a few group operators including $max, $min, $avg and $sum. If you are familar with SQL, you will instantly recognize those operators from your own queries.

Let’s take a look at getting the average balance for all customers in a bank:

All groupings are done on the _id field. The value of _id is the field you are looking to group on, in this case $name. This should produce a grouping in this case, since a person may have more than one account with our bank.

Unwinding

Unwinding is another important operator to take into account. As you may well know, MongoDB favors embedding certain data. Unwinding this embedded data allows you to use it dynamically with the other operators.

It will essentially “unwind” the subdocument to create multiples of the parent document for each element in the subdocument. A better explanation would be an example, take this document:

If you were to $unwind addresses, you would be looking at this result:

This then makes it possible to use $limit and $sort type operators to filter a subdocument, and get back only the results you want.

To provide a brief example, take this document:

If for instance you want all addresses for sammaye that are in the UK. You can then perform this aggregation:

The $push operator within the $group will $push each $unwind‘ed subdocument back into the root document.

A good point to note is that each pipeline within an aggregation query is independent of the other and takes the full results of the previous pipeline into itself. To explain better let’s take an example:

This aggregation will do a double group. First it will group each customer by branch gettting the total of that customer’s balance in that branch. It will then sort by the highest balance per customer, per branch, take those results, and then group again by the branch to get the highest customer balance in that branch, along with the customer’s name.

Lets take one last example, this time written in SQL first:

And in the aggregation framework:

Note how the $cmp operator is used to get back a negative number if col1 is less than col2, and $match only those that result in a negative value from the $cmp operator.

This means that you can build up complex aggregation queries with multiple $match, $group, $project and other operators. Have a look around, and see what you can accomplish with the framework.

Conclusion

In this post we learned about MongoDB’s aggregation framework. We learned about grouping and unwinding, and we also looked at some other common place aggregation examples.

Safari Books Online has the content you need

Below are some MongoDB books with all sorts of tips and information.

MongoDB: The Definitive Guide, 2nd Edition shows you the many advantages of using document-oriented databases, and demonstrates how this reliable, high-performance system allows for almost infinite horizontal scalability.
MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications.
MongoDB and Python is a cookbook-style text to help Python programmers work with MongoDB. It is full of useful, practical recipes for solving real-world problems ranging from how to do fast geo queries for location-based apps to efficiently indexing your user documents for social-graph lookups to how best to integrate MongoDB with the Pyramid Web framework.
Learn how to create large MongoDB clusters! Scaling MongoDB shows you how to use MongoDB efficiently for very large databases. It Covers sharding, cluster setup, and administration.

About the author

sammillman Sam Millman has been using MongoDB for almost 4 years now starting with MongoDB when it had just been released. He is an active participant on both the official MongoDB user group at mongodb-user and on Stackoverflow MongoDB. He has a love for all things web based and enjoys actively building web awesomeness in jQuery and PHP with a little Python on the side. You can contact him either at his blog as http://www.sammaye.wordpress.com/ or on Twitter as @sam_millman.

Tags: Aggregation, Aggregation Framework, JOIN, MongoDB, MongoDB 2.2, SQL,

Comments are closed.