Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post John Sullivan, a 15-year Java veteran who has been programming in Scala for 2+ years. He enjoys posting in-depth articles on his popular Scala-oriented blog, and he is currently employed as a Principal Sofware Engineer at the Broad Institute.

In this post, I will expand upon a couple of Scala subjects that were touched upon in my previous blog post Scala Collections API: For People and their Pets. First, we’ll look closer at groupBy, and then discuss some performance enhancing techniques for using Scala collections.

More on groupBy

The groupBy method allows you to group the values in a Seq according to a key for those values. You end up with a Map from those keys, and a Seq of values with those keys. It’s common to think of groupBy as grouping values according to some sub-attribute of the values. For instance, in our previous example, we grouped by Person.address.zipcode:

The function passed to groupBy can be any arbitrary function, and does not have to pull out a sub-attribute of the values. For instance, we could have a function that classifies people by their age bracket:

Now, we can use groupBy in conjunction with ageBracket to get statistics for people based upon their age bracket:

This will produce a map that will look something like this:

Performance Techniques for Multiple Transformations

Careful readers of the previous post may have had some concerns about the performance implications of chaining multiple methods calls from the Scala Collections API. For instance, when we gather addresses from West Hollywood:

This is going to create an intermediate sequence containing the people from West Hollywood. Wouldn’t it be better to filter the list and map to addresses in a single pass? How would we accomplish this?


There are two approaches I would consider here. The first is to use the collect method from the Seq API. This method is like map, but it takes a partial function instead of a function. A partial function is only defined for a subset of its input values. When the partial function is not defined, the collect method discards the value. So, the following method is equivalent to the above example, but it will perform better because it does not create an intermediate result:

Seq.view and Seq.force

A more general approach to avoid building intermediate results uses the methods view and force. The view method will create a version of the sequence that applies methods like filter and map as lazily as possible. The force method will force these lazy operations to occur, and we can perform the transformations in a single pass. In the following example, we again avoid computing an intermediate result:


The Scala Collections API is a goldmine of powerful utilities for manipulating collections. As you gain familiarity with the API, you will continue to learn new ways to manipulate collections. But even a baseline understanding of the Scala collections will give you a very powerful toolset. So, you can leverage Scala collections right away, and continue to hone your skills over time.

Safari Books Online has the content you need

Check out these Scala books available from Safari Books Online:

Scala in Action is a comprehensive tutorial that introduces Scala through clear explanations and numerous hands-on examples. Because Scala is a rich and deep language, it can be daunting to absorb all the new concepts at once. This book takes a “how-to” approach, explaining language concepts as you explore familiar programming challenges that you face in your day-to-day work.
This book takes a step-by-step tutorial approach to teaching you Scala. Starting with the fundamental elements of the language, Programming in Scala introduces functional programming from the practitioner’s perspective, and describes advanced language features that can make you a better, more productive developer.
Scala in Depth is a unique new book designed to help you integrate Scala effectively into your development process. By presenting the emerging best practices and designs from the Scala community, it guides you though dozens of powerful techniques example by example.

About the author

john_sullivan John Sullivan is a professional software engineer and technical blogger. A 15-year Java veteran, he has been happily programming in Scala for 2+ years. He enjoys posting in-depth articles on his popular Scala-oriented blog His major interests outside of Scala include software engineering best practices, the agile development process, and Domain Driven Design. John has a Master of Science in Computer Science from UMass Boston. He is currently employed as a Principal Software Engineer at the Broad Institute.

Tags: groupBy, Performance Techniques, pets, Scala Collections API, Seq,

Comments are closed.