Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by Vlad Patryshev, who was born in Russia, has an education in Math (including now popular category theory), and works at HealthExpense.com. A functional programmer and a Scala fan since 2008, he is now an organizer of Scala BASE and Bay Area Categories and Types meetups.

In this post, I explore the problem of mixing variance in Scala collections and I propose a solution derived from category theory, thus making Scala collections more type-safe.

## The Problem

If you have written code in Java, you know that although you have a `List<String>` list, to which you can’t legally add anything but a String, you are allowed to check `list.contains(123)`, since the method `contains()` takes an Object as a parameter type.

The explanation lies in the collections covariance in Java: having class A, and class B extends A, you get `List<B>` as a subtype of `List<A>`. So, you can pass a `List<B>`, where a `List<A>` is required. This is convenient, and this is logical. Yet, if `List<B>` has a method `contains(B b)`, how can we possibly use it as a `List<A>`, which is supposed to have a method `contains(A a)`? This method, `contains(A x)`, accepts any value of type `A`. Our `List<B>` has a method `contains(B x)` – it only accepts values of type `B`. We cannot magically extend a function’s domain from `B` to `A`.

The problem with `contains()` is that it makes the class contravariant. You see, `contains(A a)` should also be able to take any `B b` as a parameter, as soon as `B` is a subtype of A. From here it follows that any class `Coll<B>` that has this method, should have this method applicable to any supertype of `B`. That’s what contravariance is – the opposite of covariance. And this is why, in Java collections, `contains()` is applicable to any Object.

We know that variance in particular, and generics in general are implemented wrongly in Java. This was the reason for creating Scala, according to Martin Odersky – to have a language with the right variance implementation. Unfortunately, in Scala too we can have `List(“a”, “b”, “c”).contains(42)` compile without error messages.

This is weird. In Scala, you would expect better type safety, but this is easy to explain. While we can declare variance in Scala, and `List` is covariant (defined as trait `List[+A]`), `contains()` requires contravariance (as in classes like `Map[-X,+Y]`, and there is no reasonable way to combine variance in one parameterized type. This is why Scala `Set` has eventually become an invariant: `Set[Dog]` is not a subtype of `Set[Animal].`, which is bad, but as a result, `Set("a", "b", "c").contains(42)` will not compile, and this is good.

In an ideal world, `Set` could be contravariant, which, in the real world, would leave us wondering why `Set[Animal]` is a subtype of `Set[Dog]`?

Having method `contains() take Any` is even more confusing in Scala, since we can pass a closure. The statement `listOfInts.contains(_.isPrime)` looks innocent, but what does it do? It checks if the function `(n:Int)=>n.isPrime()` is a member of the list. It can’t be, if `listOfInts` is declared as `List[Int]`; so we have a cognitive dissonance here. If we were in the typeless world, we could not care, and we would agree with Senior Engineers telling us that we should be more careful. In Scala, we expect the compiler to be careful for us.

Can we fix this? I think we can. But first let’s take a look at it from the point of view of category theory. You can skip this part and go directly to the code solution below if you prefer.

## The Solution, Theory

Say we have a (covariant) functor `F: A → B` and a contravariant functor `G:Aop → B`. We can have a natural transformation from `F` to `G`, defined as a collection `tX:F(X) → G(X)`, such that the following square commutes for each `f`: `X → Y`:

If you aren’t familiar with the idea, you can find the definition on Stackexchange.

So, how does this help us? First, let’s specify the category. Since we are talking about subtyping, in our category types will be objects, and the relationship of being a subtype will be arrows – so we actually have a poset. Now, in this category, we separate the covariant aspects into one, covariant, functor, and contravariant aspects in a contravariant functor, and have a natural transformation from one to another. In our case, we define List, or Tree, or any similar collection kind (or, one could say, algebraic functor), and have a natural transformation from this algebraic functor to the exponent functor, `F(X) → 2X`. This is all that you need to do. See below, where I do this in plain Scala.

## The Solution, Practice

Instead of going ahead with the heavy generic case, we will see how we can solve the problem of `_.contains(Any)`. We have a covariant collection, let’s call it `MyList[+A]`, and let’s define it, as a demonstration, like as follows:

Now we separate the method `contains()`, defining it in a contravariant trait, like this:

To call this method on instances of `MyList[T]`, we need an implicit natural transformation from `MyList[T]` to `Container[T]`, like this:

Now we can write `“Hello”::”World”::EmptyList contains “42”`, but we can’t write `“Hello”::”World”::EmptyList contains 42`.

Here are the tests:

## Conclusion

I hope in time what I’ve covered in this post will be adopted in Scala, although probably not right away, but perhaps some time in the future. At least now we know the solution exists. Since new languages pop up all the time, if you have variance (and implicits), here’s what you can do to deal with them.

See below for some Scala resources from Safari Books Online.