10.21. Extracting Unique Elements from a Sequence

Problem

You have a collection that contains duplicate elements, and you want to remove the duplicates.

Solution

Call the distinct method on the collection:

scala> val x = Vector(1, 1, 2, 3, 3, 4)
x: scala.collection.immutable.Vector[Int] = Vector(1, 1, 2, 3, 3, 4)

scala> val y = x.distinct
y: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4)

The distinct method returns a new collection with the duplicate values removed. Remember to assign the result to a new variable. This is required for both immutable and mutable collections.

If you happen to need a Set, converting the collection to a Set is another way to remove the duplicate elements:

scala> val s = x.toSet
s: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)

By definition a Set can only contain unique elements, so converting an Array, List, Vector, or other sequence to a Set removes the duplicates. In fact, this is how distinct works. The source code for the distinct method in GenSeqLike shows that it uses an instance of mutable.HashSet.

Using distinct with your own classes

To use distinct with your own class, you’ll need to implement the equals and hashCode methods. For example, the following class will work with distinct because it implements those methods:

class Person(firstName: String, lastName: String) {

  override def toString = s"$firstName $lastName"

  def canEqual(a: Any) = a.isInstanceOf[Person]

  override def equals(that: Any): Boolean =
    that match {
      case that: Person ...

Get Scala Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.